摘要
网站通常从用户中分析挖掘出其中隐含的规律,为其创造更多的价值。随着互联网的普及,互联网的用户成指数级增长给互联网传统的分析算法带来了极大地挑战。本文针对网站中存在的海量用户数据,设计了基于MapReduce分布式编程框架的协同聚类算法。该算法是分布式并行地统计聚类信息,更加高效地分析处理用户数据,完成网站中的用户行为分析工作。实验表明,本文提出的算法不仅具有很高的加速比,而且具有很好的可扩展性。
Websites usually discover the hidden information from the users,and make more value for them.With the internet becoming more and more popular,the number of the internet users is growing exponentially.The increase of the data coming from the internet takes a lot of challenge to the traditional algorithms.In this paper,focusing on the huge scale user data in the websites,we design a co-clustering algorithm based on MapReduce distributed programming framework.The algorithm could analyze the user data effectively and complete the behavior analysis of users.The experiments show that the algorithm in this paper not only has good speed-up,but also has good scalability.
出处
《科技通报》
北大核心
2013年第2期67-69,共3页
Bulletin of Science and Technology