摘要
微博作为一种新的在线社会网形式,逐渐成为人们获取和共享信息的重要平台。以我国最大的微博网站——新浪微博为对象,重点研究了微博话题的流行度预测问题。收集了大约40G的微博话题信息作为研究数据集,从中提取出与话题流行度相关的微博用户属性和话题内容属性,在对这些属性相关性分析的基础上,提出了一种兼顾用户属性和内容属性的话题流行度定量描述方法。文章对影响话题流行度的各属性进行了详细的主成分分析,总结出4种属性作为话题流行度预测的依据,并建立了流行度的线性预测模型。该模型能较好地预测话题流行度,模型指标R2达到0.89。
The two-year old Sina weibo is the most famous micro-blogging platform in China. The goal of this paper is to predict the popularity of a newly submitted tweet timely and accurately. By analyzing the correlations of each feature of the user and tweet content, a quantitative description of tweet' s popularity is presented. Principle components analysis is used to reduce the feature dimen- sions by performing a covariance analysis between factors that affect tweet' s popularity, and some most important features are extracted. Then, a PCA-based linear predicating model to predict the popularity of a newly submitted tweet is built. A validation is made on Sina micro-blogging network. The result shows that the model works well on predicting the popularity of a new tweet, and the eval-uation index R^2 reaches 0.89.
出处
《信息工程大学学报》
2012年第4期496-502,共7页
Journal of Information Engineering University
基金
国家重点实验室开放课题资助项目(SKLSDE-2011KF-06)
关键词
微博
话题流行度
预测
主成分分析
microblogging
popularity of tweet
prediction
principle components analysis