
一种用于网站用户行为分析数据的可扩展协同聚类算法 被引量:5

Scalable Co-clustering Algorithm Application in Behavior Analysis of Website Users
摘要 网站通常从用户中分析挖掘出其中隐含的规律,为其创造更多的价值。随着互联网的普及,互联网的用户成指数级增长给互联网传统的分析算法带来了极大地挑战。本文针对网站中存在的海量用户数据,设计了基于MapReduce分布式编程框架的协同聚类算法。该算法是分布式并行地统计聚类信息,更加高效地分析处理用户数据,完成网站中的用户行为分析工作。实验表明,本文提出的算法不仅具有很高的加速比,而且具有很好的可扩展性。 Websites usually discover the hidden information from the users,and make more value for them.With the internet becoming more and more popular,the number of the internet users is growing exponentially.The increase of the data coming from the internet takes a lot of challenge to the traditional algorithms.In this paper,focusing on the huge scale user data in the websites,we design a co-clustering algorithm based on MapReduce distributed programming framework.The algorithm could analyze the user data effectively and complete the behavior analysis of users.The experiments show that the algorithm in this paper not only has good speed-up,but also has good scalability.
作者 库波 晁学鹏
出处 《科技通报》 北大核心 2013年第2期67-69,共3页 Bulletin of Science and Technology
关键词 数据挖掘 可扩展 HADOOP 协同聚类 用户行为 data mining scalable hadoop Co-clustering behavior of user
  • 相关文献


  • 1宋淑彩,祁爱华,王剑雄.面向Web的数据挖掘技术在网站优化中的个性化推荐方法的研究与应用[J].科技通报,2012,28(2):117-119. 被引量:49
  • 2王明文,付剑波,罗远胜,陆旭.基于协同聚类的两阶段文本聚类方法[J].模式识别与人工智能,2009,22(6):848-853. 被引量:5
  • 3王爱华,张铭,杨冬青,唐世渭.PCCS部分聚类分类:一种快速的Web文档聚类方法[J].计算机研究与发展,2001,38(4):415-421. 被引量:23
  • 4J.Hartigan,Direct clustering of a data matrix[J].Journal ofthe American Statistical Association,1972,7:123–129.
  • 5Y.Cheng and G.Church,Biclustering of expression data[C]//.in Proceedings of the eighth international conferenceon intelligent systems for molecular biology,2000,8.
  • 6I.Dhillon,S.Mallela,and D.Modha,“Information-theo-retic coclustering[C]//.in Proceedings of the ninth ACMSIGKDD international conference on Knowledge discoveryand data mining.ACM,2003.
  • 7Apache.Welcome to apache hadoop[EB/OL].(2010-10-15)[2010-11-02]http://hadoop.apache.org/.
  • 8C.Lam and J.Warren,“Hadoop in action,”2010.
  • 9Dean J,Ghemawat S.MapReduce:simplified data pro-cessing on large clusters[C]//.Proc of the 6th Symposiumon Operating System Design and Implementation.SanFrancisco:CA,2004.


  • 1刘涛,吴功宜,陈正.一种高效的用于文本聚类的无监督特征选择算法[J].计算机研究与发展,2005,42(3):381-386. 被引量:37
  • 2陈晓红,秦杨.基于Web数据挖掘的高效关联规则研究[J].计算机工程与科学,2005,27(11):48-51. 被引量:9
  • 3严莉莉,张燕平.基于类信息的文本聚类中特征选择算法[J].计算机工程与应用,2007,43(12):144-146. 被引量:7
  • 4Salton G, McGill M J. An Introduction to Modern Information Retrieval. New York, USA: McGraw-Hill, 1983.
  • 5Yang Yiming, Pedersen J O. A Comparative Study on Feature Selection in Text Categorization// Proc of the 14th International Conference on Machine Learning. Nashville, USA, 1997:412 -420.
  • 6Liu Yuanehao, Wang Xiaolong, Liu Bingquan. A Feature Selection Algorithm for Document Clustering Based on Word Co-Occurrence Frequency// Proc of the 3rd International Conference on Machine Learning and Cybernetics. Shanghai, China, 2004, Ⅴ: 2963 -2968.
  • 7Cheng Yizong, Church G M. Biclustering of Expression Data// Proc of the 8th International Conference on Intelligent Systems for Molecular Biology. Vienna, Austria, 2000: 93- 103.
  • 8Xue Guirong, Zeng Huajun, Chen Zheng, et al. Optimizing Web Search Using Web Click-through Data// Proc of the 13th ACM International Conference on Information and Knowledge Management. Washington, USA, 2004:118-126.
  • 9Chakrabarti S,DomB E.Mining the Web's link structure.ComputeF,1999,32(18):60~67.
  • 10Yang Yiming,Proc ACMSIGIR Conf Research Development Information Retrieval(SIGIR),1999年,42页



  • 1张莹.从商务网站用户行为数据提取用户兴趣[J].潍坊学院学报,2005,5(4):21-23. 被引量:6
  • 2J H Lee, S Y Kim, J Lee. Parallel algorithm for calculation of theComm.,2011,182:1027 -1033.
  • 3D Garcia - Alvarez, M J Fuente,G I Sainz. Fault detection and i-solation in transient states using principal component analysis [ J ].Journal of Process Control, 2012,22(3) : 551 -563.
  • 4A Bachir, et al. MAC essentials for wireless sensor networks[ J].Communications Surveys & Tutorials, IEEE, 2010,12(2) : 222 -248.
  • 5MS Stankovic,K H Johansson, D M Stipanovic. Distributed see-king of Nash equilibria with applications to mobile sensor networks[J]. IEEE Transaction on Automatic Control, 2012,57(4) :904-919.
  • 6BREESE J S, HECKERMAN D, KADIE C. Empirical analysis of predictive algorithms for collaborative filtering [ C]// UAI'98: Pro- ceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence. San Francisco, CA: Morgan Kaufmann Publishers, 1998:43-52.
  • 7KARYPIS G. Evaluation of item-based top-n recommendation algo- rithms [ C]//CIKM'01 : Proceedings of the Tenth International Con- ference on Information and Knowledge Management. New York: ACM, 2001:247-254.
  • 8ZHOU T C, MA H, KING I, et al. TagRec: leveraging tagging wis- dom for recommendation [ C]// CSE'09: Proceedings of the 2009 International Conference on Computational Science and Engineering. Piscataway, NJ: IEEE, 2009, 4: 194- 199.
  • 9SYMEONIDIS P, NANOPOULOS A, MANOLOPOULOS Y. A uni- fied framework for providing recommendations in social tagging sys- tems based on ternary semantic analysis [ J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 22(2): 179 -192.
  • 10HITCHCOCK F L. The expression of a tensor or a polyadic as a sum of products [ J]. Journal of Mathematics and Physics, 1927, 6( 1): 164 - 189.










使用帮助 返回顶部