摘要
为了解决上网行为信息描述维度过高,且在海量用户上网流量行为数据中分析用户流量行为的偏好特征较为困难的问题,提出一种基于大数据分析技术的上网行为信息特征快速挖掘方法。结合爬虫技术从导航网站和分类目录子网页获取网站类别标签集,并对用户登录端的操作系统进行识别,结合统计分析技术和网络流量特性构造一个完备的特征集,对用户上网流量行为进行全面描述。选取适用于用户上网流量行为分析的优化特征集,通过关联分析方法挖掘出用户上网流量行为的偏好特征。实验结果表明,所提方法能够快速挖掘出上网行为信息特征,且能量消耗较少。
At present,the information description dimension of online behavior is too high.It is difficult to analyze the preference characteristics of user traffic behavior in the massive online traffic behavior data.In this article,a method for fast mining online behavior information features based on big data analysis was proposed.Combined with the crawler technology,we obtained the website category label set from the navigation website and classified catalogue sub-page.Meanwhile,we identified the operating system of user login end.Combined with statistical analysis and network traffic characteristic,we constructed a complete feature set to fully describe the online traffic behavior.Moreover,we selected the optimized feature set which was suitable for user online traffic behavior analysis.Finally,we mined the preference characteristics of user online traffic behavior by association analysis method.Simulation results prove that the proposed method can quickly mine the information characteristics of online behavior.In the meantime,this method consumes less energy.
作者
韩龙龙
姜金卿
王花清
HAN Long-long;JIANG Gan-qing;WANG Hua-qing(National Computer Network and Information Security Management Center,Henan Branch,Zhengzhou,Henan 450000,China)
出处
《计算机仿真》
北大核心
2019年第6期346-349,共4页
Computer Simulation
基金
国家计算机网络与信息安全技术研究专项(242研究计划)(2018Q12)
关键词
大数据分析
上网行为信息
特征挖掘
流量监控
Big data analysis
Online behavior information
Feature mining
Flow characteristics