摘要
以短文本内容发布为主要特点的微博,已经成为重要的信息传播媒介,预测微博流行度对舆情监测、企业营销、热点推送等都具有重要意义.当前对微博流行度预测的研究主要侧重于对所有用户的微博数据进行统一建模预测,鲜有研究考虑不同影响力用户之间的差异.而微博数据的分析显示标签、提及和微博长度等对微博流行度的影响会随发布者的影响力变化显示出明显差异,在流行度预测中充分考虑这些差异,有助于取得更好的预测结果.为此,在流行度预测中引入多任务学习(Multi-Task Learning,简称MTL),并结合SVM构建SVM+MTL模型,此模型通过同时考虑所有用户的共同特性和不同用户的具体特性来提高预测性能.此外,除了预测常用的用户属性和微博发布行为等特征外,还引入微博内容相似性这一新特征,该特征能明显提高预测准确率.基于微博数据的实验表明,SVM+MTL模型可以有效提高微博流行度预测性能.
Micro-blog has become a new information media,and predicting popularity of micro-blog is of great significance to public opinion monitoring,company marketing and hot content recommendation.The current work related to the popularity prediction of micro-blog mainly focuses on building the unified model based on data of all users.However few studies consider the differences among users with different influence.Our analyses of micro-blog show characteristics of tweets(such as the presence of hashtags and mentions,as well as tweet length)exert different impacts on users with different influence levels for obtaining the click number.Therefore the predictive model should take into account these different impacts to achieve higher accuracy.To this end,we introduce the Multi-Task Learning(MTL),and build the SVM+MTL model to predict popularity of micro-blog.Specifically,we divide users into different groups based on their influence levels and treat prediction of each group as a task.The SVM+MTL model seeks to simultaneously learn the commonality as well as the differences between the multiple tasks.This model can improve the predictive performance by considering both the common properties of all users and specific characters of users with different influence levels.In addition,to further improve the predictive accuracy,we also explore a new feature about micro-blog content similarity,which is computed based on its similar posts.Here its similar posts refer to the top k similar posts from the same author and are selected by the Word Mover's Distance(WMD).Based on a large number of data from Twitter,the experiments show,compared with the models of Naive Bayes,Support Vector Machine(SVM),Logistic regression and J48 decision tree,the SVM+MTL model can effectively improve the predictive performance.
作者
韩凤娟
肖春静
王欢
HAN Fengjuan XIAO Chunjing WANG Huan(School of Computer and Information Engineering, Henan University, Henan Kaifeng 475004, Chin)
出处
《河南大学学报(自然科学版)》
CAS
2017年第5期544-551,共8页
Journal of Henan University:Natural Science
基金
国家自然科学基金资助项目(61402151)
河南省科技攻关计划(162102410010)
关键词
微博
流行度
预测
多任务学习
Twitter
popularity
prediction
Multi-task learning