期刊文献+

大数据下微博推荐算法 被引量:4

Microblog Recommendation Algorithm in Big Data
下载PDF
导出
摘要 为了提高微博推荐算法的准确率和解决社交网络平台海量信息的问题。基于LDA(Latent Dirichlet Allocation)三层模型提出C-LDA(Collaborative Latent Dirichlet Allocation)四层模型,该模型不仅考虑被转发者对转发者的影响而且考虑了关注者产生的影响。综合特征词热度、用户负样本反馈以及遗忘曲线等因素改进吉布斯采样算法,使用改进后的吉布斯采样算法近似求解C-LDA模型。然后运用模型中用户与微博的主题概率向量计算相似度,进行Top-K微博推荐。与以往方法相比,该方法适用于具有时效性和互动性的微博应用场景,推荐的效果更加理想。最后基于Hadoop平台实现了吉布斯采样算法以及词汇热度算法的分布式处理,提高了处理微博海量数据的能力。实验结果表明,C-LDA算法的Perplexity值相较于传统的LDA算法降低了9.45%。基于C-LDA算法的Top-10推荐结果相较于RT-LDA算法准确率提高了11.23%,召回率提高了14.56%,F_(mearsure)提高了12.53%。在5个节点的集群上分布式处理任务的时间比单机减少了68%。 In order to improve the accuracy in micoblog recommendation algorithm and solve the problem of massive data mining in social networking platform,C-LDA four-tier model,based on LDA three-tier model,takes into account not only the influence of a microblog author on forwarders but the influence arising from followers. With a combination of popularity words' feature,user feedback of negative samples,and forgetting curve,etc.,Gibbs sampling algorithm gets improved so that it can be used to approximately solve C-LDA model. Then,user and microblog-related theme probability vectors in this model are used to calculate similarity so as to make Top-K microblog recommendation. Compared with previous methods,this method is suitable for time- sensitive and interactive microblog applications,with better effects from the recommendation. Finally,on Hadoop platform,distributed processing is implemented for Gibbs sampling algorithm and word popularity algorithm,thus making it easier to process massive microblog data. Experimental results suggest that Top-10 recommendations based on C-LDA algorithm,compared with those on RT-LDA algorithm,are 11. 23% higher in accuracy rate,14. 56% higher in recall rate,and 12. 53% higher in F_(mearsure); C-LDA's perplexity value is 9. 45% less than LDA; on a cluster of 5 nodes,the time needed for distributed processing is 68% less than on a stand-alone computer.
出处 《激光杂志》 北大核心 2016年第6期1-6,共6页 Laser Journal
基金 国家"九七三"重点基础研究计划基金项目(2014CB340506)
关键词 数据挖掘 社交网络 并行计算 推荐系统 data mining social networking parallel algorithms recommendation system
  • 相关文献

参考文献17

  • 1WU S, HOFMAN JM, MASON WA, WAT1S DJ. Who says what to whom on twitter/Proceedings of the 20th Inter- national Conference on World Wide Web [ J ]. Hyderabad, 2011,795-714.
  • 2AM KAPLAN, M HAENLEIN, The early bird catch-es the news:Nine things you should know about micro- blogging [ J]. Business Horizons, 2011,106-113.
  • 3KWAK H, LEE C, PARK H. What is Twitter, a s-ocial network or a news media [ J ]. Proceedings of the 19thinternational conference on World wide web. ACM, 2010,591-600.
  • 4DCCI互联网数据中心,2015年新浪微博第三季度财报[J].DCCI数据中心,2015.
  • 5KIM, YOUNGHOON, AND K SHIM. TWITOBI: A Rec- ommendation System for Twitter Using Probabilistic Model- ing[ J]. 2013 IEEE 13th International Conference on Data Mining IEEE, 2011,340-349.
  • 6HOFMANN T. Probabilistic Latent Semantic Indexing [ J ]. Proc of Annual Acm Conference on Research & Develop- ment in Information Retrieval Berkeley California August, 1999, 42(1) :56-73.
  • 7RAMAGE D, DUMAIS S, DAN L. Characterizing microb- logs with Topic models[ J]. In Proceedings of the 4th Inter- national AAAI Conference on Weblogs and Social Media IC- WSM 2010, 2010,130-137.
  • 8PHELAN O, MCCARTHY K, SMYTH B. Using twitter to recommend real-time Topical news [ J ]. Proceedings of the Third Acm Conference on Recommender Systems, 2009, 385 -388.
  • 9DAVID M BLEI, ANDREW Y NG, MICHAEL I JORDAN. Latent Dirichlet Allocation [ J ]. Journal of Machine Learn- ing Research, 2003, 993-1022.
  • 10ZHAO, W X, et al. Comparing Twitter and Traditional Media Using Topic Models[ J]. In ECIR,2011:338-349.

同被引文献41

引证文献4

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部