大数据下微博推荐算法被引量：4

Microblog Recommendation Algorithm in Big Data

下载PDF

导出

摘要为了提高微博推荐算法的准确率和解决社交网络平台海量信息的问题。基于LDA(Latent Dirichlet Allocation)三层模型提出C-LDA(Collaborative Latent Dirichlet Allocation)四层模型,该模型不仅考虑被转发者对转发者的影响而且考虑了关注者产生的影响。综合特征词热度、用户负样本反馈以及遗忘曲线等因素改进吉布斯采样算法,使用改进后的吉布斯采样算法近似求解C-LDA模型。然后运用模型中用户与微博的主题概率向量计算相似度,进行Top-K微博推荐。与以往方法相比,该方法适用于具有时效性和互动性的微博应用场景,推荐的效果更加理想。最后基于Hadoop平台实现了吉布斯采样算法以及词汇热度算法的分布式处理,提高了处理微博海量数据的能力。实验结果表明,C-LDA算法的Perplexity值相较于传统的LDA算法降低了9.45%。基于C-LDA算法的Top-10推荐结果相较于RT-LDA算法准确率提高了11.23%,召回率提高了14.56%,F_(mearsure)提高了12.53%。在5个节点的集群上分布式处理任务的时间比单机减少了68%。 In order to improve the accuracy in micoblog recommendation algorithm and solve the problem of massive data mining in social networking platform,C-LDA four-tier model,based on LDA three-tier model,takes into account not only the influence of a microblog author on forwarders but the influence arising from followers. With a combination of popularity words＇ feature,user feedback of negative samples,and forgetting curve,etc.,Gibbs sampling algorithm gets improved so that it can be used to approximately solve C-LDA model. Then,user and microblog-related theme probability vectors in this model are used to calculate similarity so as to make Top-K microblog recommendation. Compared with previous methods,this method is suitable for time- sensitive and interactive microblog applications,with better effects from the recommendation. Finally,on Hadoop platform,distributed processing is implemented for Gibbs sampling algorithm and word popularity algorithm,thus making it easier to process massive microblog data. Experimental results suggest that Top-10 recommendations based on C-LDA algorithm,compared with those on RT-LDA algorithm,are 11. 23% higher in accuracy rate,14. 56% higher in recall rate,and 12. 53% higher in F_（mearsure）; C-LDA＇s perplexity value is 9. 45% less than LDA; on a cluster of 5 nodes,the time needed for distributed processing is 68% less than on a stand-alone computer.

作者张磊吾守尔.斯拉木买买提依明.哈斯木于清

机构地区新疆大学多语种信息技术实验室

出处《激光杂志》北大核心 2016年第6期1-6,共6页 Laser Journal

基金国家"九七三"重点基础研究计划基金项目(2014CB340506)

关键词数据挖掘社交网络并行计算推荐系统 data mining social networking parallel algorithms recommendation system

分类号 TN911 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献17

1WU S, HOFMAN JM, MASON WA, WAT1S DJ. Who says what to whom on twitter/Proceedings of the 20th Inter- national Conference on World Wide Web [ J ]. Hyderabad, 2011,795-714.
2AM KAPLAN, M HAENLEIN, The early bird catch-es the news:Nine things you should know about micro- blogging [ J]. Business Horizons, 2011,106-113.
3KWAK H, LEE C, PARK H. What is Twitter, a s-ocial network or a news media [ J ]. Proceedings of the 19thinternational conference on World wide web. ACM, 2010,591-600.
4DCCI互联网数据中心,2015年新浪微博第三季度财报[J].DCCI数据中心,2015.
5KIM, YOUNGHOON, AND K SHIM. TWITOBI: A Rec- ommendation System for Twitter Using Probabilistic Model- ing[ J]. 2013 IEEE 13th International Conference on Data Mining IEEE, 2011,340-349.
6HOFMANN T. Probabilistic Latent Semantic Indexing [ J ]. Proc of Annual Acm Conference on Research & Develop- ment in Information Retrieval Berkeley California August, 1999, 42(1) :56-73.
7RAMAGE D, DUMAIS S, DAN L. Characterizing microb- logs with Topic models[ J]. In Proceedings of the 4th Inter- national AAAI Conference on Weblogs and Social Media IC- WSM 2010, 2010,130-137.
8PHELAN O, MCCARTHY K, SMYTH B. Using twitter to recommend real-time Topical news [ J ]. Proceedings of the Third Acm Conference on Recommender Systems, 2009, 385 -388.
9DAVID M BLEI, ANDREW Y NG, MICHAEL I JORDAN. Latent Dirichlet Allocation [ J ]. Journal of Machine Learn- ing Research, 2003, 993-1022.
10ZHAO, W X, et al. Comparing Twitter and Traditional Media Using Topic Models[ J]. In ECIR,2011:338-349.

同被引文献41

1贾虹.数字图书馆个性化服务技术述略[J].现代情报,2006,26(3):71-74. 被引量：59
2张晨逸,孙建伶,丁轶群.基于MB-LDA模型的微博主题挖掘[J].计算机研究与发展,2011,48(10):1795-1802. 被引量：166
3唐晓波,房小可.基于文本聚类与LDA相融合的微博主题检索模型研究[J].情报理论与实践,2013,36(8):85-90. 被引量：44
4马雯雯,邓一贵.新的短文本特征权重计算方法[J].计算机应用,2013,33(8):2280-2282. 被引量：8
5崔金栋,徐宝祥,王新媛.基于微本体构建的微博信息管理机理研究[J].情报资料工作,2013,34(5):50-54. 被引量：7
6徐彬,杨丹,张昱,李封,高克宁.面向微博用户标签推荐的关系约束主题模型[J].计算机科学与探索,2014,8(3):288-295. 被引量：8
7高明,金澈清,钱卫宁,王晓玲,周傲英.面向微博系统的实时个性化推荐[J].计算机学报,2014,37(4):963-975. 被引量：53
8邸亮,杜永萍.LDA模型在微博用户推荐中的应用[J].计算机工程,2014,40(5):1-6. 被引量：29
9唐晓波,祝黎,谢力.基于主题的微博二级好友推荐模型研究[J].图书情报工作,2014,58(9):105-113. 被引量：25
10米文丽,孙曰昕.利用概率主题模型的微博热点话题发现方法[J].计算机系统应用,2014,23(8):163-167. 被引量：7

引证文献4

1吴淑凡.大数据环境下的移动社交网络推荐算法[J].安阳师范学院学报,2017(2):61-64. 被引量：1
2崔金栋,杜文强,关杨,罗文达.微博用户信息个性化推荐主题模型LDA演化分析研究[J].情报科学,2017,35(8):3-10. 被引量：12
3王永才.基于Hadoop平台的用电行为数据特征挖掘方法[J].自动化与仪器仪表,2020(11):227-230. 被引量：5
4刘沛中,戴晴宜.数字人文平台个性化服务的功能设计研究--以上海图书馆为例[J].图书情报工作,2021,65(24):53-60. 被引量：5

二级引证文献23

1易心.CIS的意义与作用[J].湖南包装,2000,15(1):41-43.
2赵乐,张兴旺.面向LDA主题模型的文本分类研究进展与趋势[J].计算机系统应用,2018,27(8):10-18. 被引量：8
3崔金栋,杜文强,关杨.基于大数据与LDA融合的微博信息推荐方法研究[J].情报科学,2018,36(9):27-31. 被引量：17
4王蓉,李小青,刘军兰,严晓梅,陈瑜.基于大数据网络用户兴趣个性化推荐模型分析[J].电子设计工程,2019,27(21):5-8. 被引量：5
5田世海,董月文,王健.网民舆情偏好挖掘及应用研究——以EGE推荐模型为例[J].情报杂志,2020,39(2):108-115.
6孙雨生,朱金宏,李亚奇.国内基于大数据的信息推荐研究进展:核心内容[J].现代情报,2020,40(8):156-165. 被引量：10
7李月.突发公共卫生事件中公共政策主题演化研究——以国家中心城市官方微信为例[J].情报杂志,2020,39(9):143-149. 被引量：19
8马延珂.基于Hadoop进行原型搭建的交通大数据处理平台分析[J].通讯世界,2020,27(12):247-248.
9董晋.基于云架构的地质测绘管理系统构建[J].粘接,2021,45(3):166-170. 被引量：1
10沈子垚,袁晓玲.基于并行化K-means的综合能源服务客户识别[J].电力工程技术,2021,40(2):107-113. 被引量：6

1王文艺,王友钊.高速串行总线在DSP系统中的开发与研究[J].自动化仪表,2003,24(8):16-19.
2华国刚,戴蓓倩.滤波器的相似度及其在基于分析-合成语音编码中的应用[J].信号处理,2001,17(6):558-562. 被引量：2
3三星i8320[J].数字通信,2010,37(4):14-14.
4Jufu FENG,Xiao MA,Wenjing ZHUANG.Collaborative representation Bayesian face recognition[J].Science China(Information Sciences),2017,60(4):228-230.
5赵华,金铎,宋平波.中国电信SNS构想——百事群平台[J].电信科学,2008,24(12):83-87. 被引量：1
6王灿.有关《电子技术》教学的一点体会[J].才智,2010,0(11):78-78.
7段景山,刘强,毛玉明.基于三层模型的网络协议实验平台设计[J].实验科学与技术,2012,10(S1):194-197.
8徐杰.未来网络层次模型及路由技术的研究[J].数据通信,2002(4):14-16.
9Jianxin Ma,Shuo Shi,Si Tian,Xuemai Gu.An Event-Triggered Energy-Efficient Clustering in Collaborative Beamforming for Wireless Sensor Networks[J].Journal of Harbin Institute of Technology(New Series),2016,23(5):8-14.
10Xin Liu,Xianbin Wang,Yanan Liu.Power Allocation and Performance Analysis of the Collaborative NOMA Assisted Relaying Systems in 5G[J].China Communications,2017,14(1):50-60. 被引量：8

激光杂志

2016年第6期

浏览历史

内容加载中请稍等...

大数据下微博推荐算法被引量：4

参考文献17

同被引文献41

引证文献4

二级引证文献23

相关作者

相关机构

相关主题

浏览历史

大数据下微博推荐算法 被引量：4

参考文献17

同被引文献41

引证文献4

二级引证文献23

相关作者

相关机构

相关主题

浏览历史

大数据下微博推荐算法被引量：4