摘要
用户行为画像分析是实现网络智能化的关键手段之一,而点击目标识别是构建用户行为画像的重要依据和基础。已有的工作主要为系统端设计,其只能反映用户对特定服务域的行为特征,不适合网络端的检测和管理。网络端用户行为分析面临的主要挑战是处于协议栈底层的网络管道无法获取应用层及系统端信息,只能依赖IP数据流,因此难以构建有效的网络端用户行为画像。因此,提出了一种新的面向中间网络的用户点击目标识别方法,该方法融合了隐马尔可夫模型(Hidden Markov Model,HMM)和神经网络(Neural Networks,NN)。HMM框架从IP流的角度描述点击流与非点击流的动态行为过程;NN用于建立HMM中的隐状态与复杂网络流行为特征之间的关系。通过评估待测请求序列与HMM-NN模型的拟合度来实现用户点击目标的识别。该方案的主要优点在于它继承了HMM的可解析性,并利用NN增强了HMM对复杂数据的描述能力;而且该方案不涉及IP流所承载的数据内容,适用于加密与非加密场景下网络端的点击行为识别,有效解决了网络端用户行为画像分析所面临的困难。基于多个实际数据集进行实验,结果表明该方案的3个常用评价指标F1,Kappa及AUC的数值分别超过已有方法0.91,0.83,0.96,证明该方法比已有的方法具有更好的性能表现。
User behavior profile analysis is one of the key means to realize network intelligence,while click-object recognition is an important basis and foundation for constructing user behavior profile.Most existing works are mainly designed for the system-side,and their limitation is that they can only reflect the behavior characteristics of users in a specific service domain and are not suitable for the network-side detection and management.The main challenge for network-side user behavior analysis is that the network channel at the bottom of protocol stack cannot obtain the information of both application-layer and system-side,and can only rely on IP data flows,which makes it difficult to build an effective network-side user behavior profile.In this paper,a new method of user click-object recognition for intermediate network is proposed.The proposed method combines hidden Markov model(HMM)and neural networks(NN).The HMM framework describes the dynamic behavior of click streams and non-click streams from the perspective of IP flows,while NN is used to establish the relationship between the hidden states of HMMs and complex network behavior characteristics.The attribute of a request sequence is determined by the fitting degree between the sequence and the behavior models.The main advantages of this scheme are that it inherits the parse ability of HMM,and enhances the ability of HMM to describe complex data by the embedding NN.The proposed scheme does not involve the data content carried by IP flows,which makes it suitable for click behavior recognition in network-side encryption and non-encryption scenarios,and effectively solve the challenges faced by network-side user behavior profile analysis.Experimental results based on multiple actual data sets show that the three commonly used evaluation indicators F1,Kappa and AUC exceed 0.91,0.83 and 0.96 respectively.These results indicate that the performance of the proposed scheme is better than that of existing methods.
作者
费星瑞
谢逸
FEI Xing-rui;XIE Yi(Guangdong Province Key Laboratory of Information Security Technology,School of Computer Science and Engineering,Sun Yat-senUniversity,Guangzhou 510006,China)
出处
《计算机科学》
CSCD
北大核心
2022年第7期340-349,共10页
Computer Science
基金
国家自然科学基金(61972431)
广东省自然科学基金(2018A030313303)
教育部科技发展基金项目(2018A06002)。