基于多任务学习的微博流行度预测

Predicting Popularity Based on Multi-Task Learning on Twitter

导出

摘要以短文本内容发布为主要特点的微博,已经成为重要的信息传播媒介,预测微博流行度对舆情监测、企业营销、热点推送等都具有重要意义.当前对微博流行度预测的研究主要侧重于对所有用户的微博数据进行统一建模预测,鲜有研究考虑不同影响力用户之间的差异.而微博数据的分析显示标签、提及和微博长度等对微博流行度的影响会随发布者的影响力变化显示出明显差异,在流行度预测中充分考虑这些差异,有助于取得更好的预测结果.为此,在流行度预测中引入多任务学习(Multi-Task Learning,简称MTL),并结合SVM构建SVM+MTL模型,此模型通过同时考虑所有用户的共同特性和不同用户的具体特性来提高预测性能.此外,除了预测常用的用户属性和微博发布行为等特征外,还引入微博内容相似性这一新特征,该特征能明显提高预测准确率.基于微博数据的实验表明,SVM+MTL模型可以有效提高微博流行度预测性能. Micro-blog has become a new information media,and predicting popularity of micro-blog is of great significance to public opinion monitoring,company marketing and hot content recommendation.The current work related to the popularity prediction of micro-blog mainly focuses on building the unified model based on data of all users.However few studies consider the differences among users with different influence.Our analyses of micro-blog show characteristics of tweets（such as the presence of hashtags and mentions,as well as tweet length）exert different impacts on users with different influence levels for obtaining the click number.Therefore the predictive model should take into account these different impacts to achieve higher accuracy.To this end,we introduce the Multi-Task Learning（MTL）,and build the SVM＋MTL model to predict popularity of micro-blog.Specifically,we divide users into different groups based on their influence levels and treat prediction of each group as a task.The SVM＋MTL model seeks to simultaneously learn the commonality as well as the differences between the multiple tasks.This model can improve the predictive performance by considering both the common properties of all users and specific characters of users with different influence levels.In addition,to further improve the predictive accuracy,we also explore a new feature about micro-blog content similarity,which is computed based on its similar posts.Here its similar posts refer to the top k similar posts from the same author and are selected by the Word Mover＇s Distance（WMD）.Based on a large number of data from Twitter,the experiments show,compared with the models of Naive Bayes,Support Vector Machine（SVM）,Logistic regression and J48 decision tree,the SVM＋MTL model can effectively improve the predictive performance.

作者韩凤娟肖春静王欢 HAN Fengjuan XIAO Chunjing WANG Huan(School of Computer and Information Engineering, Henan University, Henan Kaifeng 475004, Chin)

机构地区河南大学计算机与信息工程学院

出处《河南大学学报（自然科学版）》 CAS 2017年第5期544-551,共8页 Journal of Henan University:Natural Science

基金国家自然科学基金资助项目(61402151) 河南省科技攻关计划(162102410010)

关键词微博流行度预测多任务学习 Twitter popularity prediction Multi-task learning

分类号 TP393 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献3

1翟晓芳,刘全明,程耀东,胡庆宝,李海波.基于转发层次分析的新浪微博热度预测研究[J].计算机工程,2015,41(7):31-35. 被引量：7
2孔庆超,毛文吉.基于动态演化的讨论帖流行度预测[J].软件学报,2014,25(12):2767-2776. 被引量：11
3李洋,陈毅恒,刘挺.微博信息传播预测研究综述[J].软件学报,2016,27(2):247-263. 被引量：42

二级参考文献23

1Bandari R,Asur S,Huberman B A.The Pulse of News in Social Media:Forecasting Popularity[C]//Proceedings of the 6th International AAAI Conference on Weblogs and Social Media.Palo Alto,USA:AAAI Press,2012:26-33.
2Weng J,Lim E,Jiang J,et al.TwitterRank:Finding Topic-sensitive Influential Tw itterers[C]//Proceedings of International Conference on Web Search and Data Mining.New York,USA:ACM Press,2010:261-270.
3Naveed N,Gottron T,Kunegis J,et al.Bad News Travel Fast:A Content-based Analysis of Interestingness on Tw itter[C]//Proceedings of the 3rd International Web Science Conference.New York,USA:ACM Press,2011:45-53.
4Suh B,Hong L,Pirolli P,et al.Want to be Retweeted Large Scale Analytics on Factors Impacting Retw eet in Tw itter Netw ork[C]//Proceedings of the 2nd International Conference on Social Computing.Washington D.C.,USA:IEEE Press,2010:177-184.
5Yang J,Counts S.Predicting the Speed,Scale,and Range of Information Diffusion in Tw itter[C]//Proceedings of the 4th International AAAI Conference on Weblogs and Social Media.Palo Alto,USA:AAAI Press,2010:355-358.
6Szabo G,Huberman B.Predicting the Popularity of Online Content[J].Communications of the ACM,2010,53(8):80-88.
7Petrovic S,Osborne M,Lavrenko V.RT to Win!Predicting Message Propagation in Tw itter[C]//Proceedings of the 5th International AAAI Conference on Weblogs and Social Media.Palo Alto,USA:AAAI Press,2011:586-589.
8Hong Liangjie,Dan O,Davison B D.Predicting Popular Messages in Tw itter[C]//Proceedings of the 20th International Conference Companion on World Wide Web.New York,USA:ACM Press,2011:57-58.
9许晓东,肖银涛,朱士瑞.微博社区的谣言传播仿真研究[J].计算机工程,2011,37(10):272-274. 被引量：55
10张旸,路荣,杨青.微博客中转发行为的预测研究[J].中文信息学报,2012,26(4):109-114. 被引量：70

共引文献56

1艾擎,张凤荔,陈学勤,邓一娇.在线社交网络信息流行度预测综述[J].计算机应用研究,2020,37(S01):1-5. 被引量：3
2王龙.基于分层社区的社交网络异常事件检测模型研究[J].电脑知识与技术,2020,0(4):17-19.
3李洋,陈毅恒,刘挺.微博信息传播预测研究综述[J].软件学报,2016,27(2):247-263. 被引量：42
4陈春燕,张钰,常标,吕俊龙.基于ARMA模型的在线电视剧流行度预测[J].计算机科学与探索,2016,10(3):425-432. 被引量：6
5朱琛刚,程光,胡一非,王玉祥.基于流行度预测的互联网+电视节目缓存调度算法[J].计算机研究与发展,2016,53(4):742-751. 被引量：3
6马晓峰,王磊,陈观淡.基于混合特征学习的微博转发预测方法[J].计算机应用与软件,2016,33(11):249-252. 被引量：4
7饶浩,林育曼,陈海媚.基于粒子群算法的微博热点话题发现分析[J].情报科学,2016,34(12):51-54. 被引量：6
8张博,李竹君.微博信息传播效果研究综述[J].现代情报,2017,37(1):165-171. 被引量：25
9李倩,赵中英.基于逻辑回归的信息转发预测模型研究[J].软件导刊,2017,0(2):4-6. 被引量：3
10张亚楠,陈德运,王莹洁,刘宇鹏.基于增量图形模式匹配的动态冷启动推荐方法[J].浙江大学学报（工学版）,2017,51(2):408-415. 被引量：1

1马恺.基于树结构的多任务学习算法[J].福建电脑,2017,33(9):114-115. 被引量：1
2景永霞,苟和平,符传谊.基于文本内容相似性的网络用户群分析[J].佳木斯大学学报（自然科学版）,2017,35(5):843-845.
3属地网站及自媒体账号要遏制炒作明星[J].中国信息安全,2017,0(9):26-26.
4孙志强.关于大数据时代下高校辅导员预警能力研究[J].吉林农业科技学院学报,2017,26(3):61-63.
5田旭东.浅谈物理模型[J].小品文选刊（下）,2017,0(1):221-221.
6李聪,温东新.基于突发集中性访问模式的缓存替换算法[J].计算机工程,2017,43(1):105-108. 被引量：4
7张进,蔡文生,邵学广.近红外光谱模型转移新算法[J].化学进展,2017,29(8):902-910. 被引量：31
8李晓娟.基于网络媒体的社会舆情监测体系研究[J].经济研究导刊,2017(31):169-170. 被引量：2
9杨东东,刘洪亮.大数据视阈下雾霾舆情监测与应对策略研究——以河北省为例[J].西部广播电视,2017,38(16):77-78. 被引量：1
10朱琛刚,程光.基于收视行为的互联网电视节目流行度预测模型[J].电子与信息学报,2017,39(10):2504-2512. 被引量：3

河南大学学报（自然科学版）

2017年第5期

浏览历史

内容加载中请稍等...

基于多任务学习的微博流行度预测

参考文献3

二级参考文献23

共引文献56

相关作者

相关机构

相关主题

浏览历史