摘要
为了提高微博推荐算法的准确率和解决社交网络平台海量信息的问题。基于LDA(Latent Dirichlet Allocation)三层模型提出C-LDA(Collaborative Latent Dirichlet Allocation)四层模型,该模型不仅考虑被转发者对转发者的影响而且考虑了关注者产生的影响。综合特征词热度、用户负样本反馈以及遗忘曲线等因素改进吉布斯采样算法,使用改进后的吉布斯采样算法近似求解C-LDA模型。然后运用模型中用户与微博的主题概率向量计算相似度,进行Top-K微博推荐。与以往方法相比,该方法适用于具有时效性和互动性的微博应用场景,推荐的效果更加理想。最后基于Hadoop平台实现了吉布斯采样算法以及词汇热度算法的分布式处理,提高了处理微博海量数据的能力。实验结果表明,C-LDA算法的Perplexity值相较于传统的LDA算法降低了9.45%。基于C-LDA算法的Top-10推荐结果相较于RT-LDA算法准确率提高了11.23%,召回率提高了14.56%,F_(mearsure)提高了12.53%。在5个节点的集群上分布式处理任务的时间比单机减少了68%。
In order to improve the accuracy in micoblog recommendation algorithm and solve the problem of massive data mining in social networking platform,C-LDA four-tier model,based on LDA three-tier model,takes into account not only the influence of a microblog author on forwarders but the influence arising from followers. With a combination of popularity words' feature,user feedback of negative samples,and forgetting curve,etc.,Gibbs sampling algorithm gets improved so that it can be used to approximately solve C-LDA model. Then,user and microblog-related theme probability vectors in this model are used to calculate similarity so as to make Top-K microblog recommendation. Compared with previous methods,this method is suitable for time- sensitive and interactive microblog applications,with better effects from the recommendation. Finally,on Hadoop platform,distributed processing is implemented for Gibbs sampling algorithm and word popularity algorithm,thus making it easier to process massive microblog data. Experimental results suggest that Top-10 recommendations based on C-LDA algorithm,compared with those on RT-LDA algorithm,are 11. 23% higher in accuracy rate,14. 56% higher in recall rate,and 12. 53% higher in F_(mearsure); C-LDA's perplexity value is 9. 45% less than LDA; on a cluster of 5 nodes,the time needed for distributed processing is 68% less than on a stand-alone computer.
出处
《激光杂志》
北大核心
2016年第6期1-6,共6页
Laser Journal
基金
国家"九七三"重点基础研究计划基金项目(2014CB340506)
关键词
数据挖掘
社交网络
并行计算
推荐系统
data mining
social networking
parallel algorithms
recommendation system