摘要
在推荐系统中应用K-means算法聚类可有效降维,然而聚类效果往往依赖于选定的初始中心,并且一旦选定目标簇后,推荐过程只针对目标簇进行,与其他簇无关。针对上述两个问题,提出一种基于满二叉树的二分K-means聚类并行推荐算法。该算法首先反复迭代二分K-means算法,迭代过程中使用簇内凝聚度作为分裂阈值,形成一颗满二叉树;然后通过层次遍历将用户归入到K个叶子节点(簇);最后针对K个簇,应用MapReduce框架进行并行推荐预测。MovieLens上的实验结果表明,该算法可大幅度提高推荐系统准确性,同时增强系统可扩展性。
K-means clustering algorithms can effectively reduce dimensions when they are applied to recommendation systems. However, the clustering effect is often dependent on the initial centers. And once the target cluster is selected, the recommendation process is executed only according to the target cluster and has nothing to do with other clusters. To solve these problems, we present a bisecting K- means clustering parallel recommendation algorithm based on full binary tree. Firstly, the bisecting Kmeans clustering algorithm is iterated, and during the iterative process the cluster cohesion level serves as the split threshold to form a full binary tree. Then the active users are classified into k leaf nodes (clusters) using the method of level traversal. Lastly, via the MapReduce framework, the process of recommendation prediction can be parallelized onto the k clusters. Experimental results on the MovieLens show that the proposed algorithm can not only greatly improve the accuracy of the recommendation results hut also enhance the system scalability.
出处
《计算机工程与科学》
CSCD
北大核心
2015年第8期1450-1457,共8页
Computer Engineering & Science
基金
广东省教育部产学研结合项目(2012B091100003
2012B091000058)
广东省专业镇中小微企业服务平台建设项目(2012B040500034)