期刊文献+

基于对称KL距离的用户行为时序聚类方法 被引量:4

Time Series Method Clustering in User Behavior Based on Symmetric Kullback-Leibler Distance
下载PDF
导出
摘要 网络用户随时间变化的行为分析是近年来用户行为分析的热点,通常为了发现用户行为的特征需要对用户做聚类处理。针对用户时序数据的聚类问题,现有研究方法存在计算性能差,距离度量不准确的缺点,无法处理大规模数据。为了解决上述问题,该文提出基于对称KL距离的用户行为时序聚类方法。首先将时序数据转化为概率模型,从划分聚类的角度出发,在距离度量中引入KL距离,用以衡量不同用户间的时间分布差异。针对实网数据中数据规模大的特点,该方法在聚类的各个环节针对KL距离的特点做了优化,并证明了一种高效率的聚类质心求解办法。实验结果证明,该算法相比采用欧式距离和DTW距离度量的聚类算法能提高4%的准确度,与采用medoids聚类质心的聚类算法相比计算时间少了一个量级。采用该算法对实网环境中获取的用户流量数据处理证明了该算法拥有可行的应用价值。 Behavioral analysis of Internet users over time is a hot spot in user behavior analysis in recent years, usually clustering users is a way to find the feature of user behavior. Problems like poor computing performance or inaccurate distance metric exist in present research about clustering user time series data, which is unable to deal with large scale data. To solve this problem, a method for clustering time series in user behavior is proposed based on symmetric Kullback-Leibler (KL) distance. First time series data is transformed into probability models, and then a distance metric named KL distance is introduce, using partition clustering method, the different time distribution between different users. For the Large-scale feature of physical network data, each process of clustering is optimized based on the characteristics of KL distance. It also proves an efficient solution for finding the clustering centroids. The experimental results show that this method can improve the accuracy of 4% compared with clustering algorithm using the Euclidean distance metric or DTW metric, and the calculation time of this method is less a quantity degree than clustering algorithm using medoids centroids. This method is used to deal with user traffic data obtained in physical network which proves its application value.
作者 李文璟 曾祥健 李梦 喻鹏 LI Wenjing, ZENG Xiangjian, LI Meng ,YU Peng(State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China)
出处 《电子与信息学报》 EI CSCD 北大核心 2018年第10期2365-2372,共8页 Journal of Electronics & Information Technology
基金 国家电网公司科技项目(52010116000W)~~
关键词 时序聚类 用户分析 Kullback—Leibler距离 Time series clustering User analysis Kullback-Leibler distance
  • 相关文献

参考文献2

二级参考文献19

  • 1Lazarsfeld P F, Berelson B, Gaudet H. The People's Choice: How the Voter Makes up His Mind in a Presidential Campaign. New York Columbia University Press, 1944.
  • 2Granovetter M. The strength of weak ties. American Journal of Sociology, 1973, 78 1360 1380.
  • 3Krackhardt D. The strength of strong ties: The importance of philos in organizations//Nohria N, Eccles R G eds. Networks and Organizations: Structure, Form, and Action. Boston: Harvard Business School Press, 1992:216-239.
  • 4Burt R S. The social structure of competition//Nohria N, Eccles R G eds. Networks and Organizations.. Structure, Form, and Action. Boston Harvard Business School Press, 1992.- 57-91.
  • 5Weng Jian-Shu, Lim Ee-Peng, Jiang Jing, He Qi, Twitterrank .. Finding topic-sensitive influential twitterers//Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. New York, USA, 2010:261-270.
  • 6Pal A, Counts S. Identifying topical authorities in microblogs //Proceedings of the 4th ACM International Conference on Web Search and Data Mining. Hong Kong, China, 2011: 45-54.
  • 7CNNIC. The 30th statistical report on Internet development in China, 2013(in Chinese).
  • 8Kwak H, Lee C, Park H, Moon S. What is Twitter, asocial network or a news media?//Proceedings of the 19th International Conference on World Wide Web. Raleig, USA, 2010:591-600.
  • 9Kempe D, Kleinberg J, Tardos E. Maximizing the spread of influence through a social network//Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, USA, 2003= 137-146.
  • 10Goyal A, Bonchi F, Lakshmanan L V S. Learning influence probabilities in social networks//Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. New York, USA, 2010:241 250.

共引文献78

同被引文献45

引证文献4

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部