摘要
利用新浪微博数据对用户行为进行分析,在此基础上构建了基于用户行为的微博网络信息扩散模型SIRUB,同时计算了模型中各用户阅读微博和转发微博的概率,在微博网络中的实验表明,只有同时考虑阅读和转发概率时模型才能较准确地预测用户的转发行为.SIRUB模型对用户转发行为预测的F-score最高为0.228,高于经典SIR模型和SICR模型,此外该模型对微博扩散范围的预测其误差的均值和标准差也均小于SIR模型和SICR模型.
Online social networks, such as Facebook, Twitter and YouTube, play a vital role in information sharing and diffusion, and recently many dynamics models on social networks have been proposed to model information diffusion. However most models are theoretical, their parameters do not come from realistic data and their validity and reliability have not been evaluated empirically. In the paper we first analyze the users’ behaviors of reading and reposting microblog in Sina Weibo, a Twitter-like website in China, and find that users’ number of fans, the average reposted number of users’ microblog, the intensity of users’ interaction and the similarity between microblog topics and users’ topic interests can significantly influence reposting behavior. Then we propose an information diffusion model Susceptible-Infected-Recovered based on Users’ Behaviors (SIRUB) on microblog networks, compute the users’ probability of reading microblog in the model according to the probability of their logging on microblog in a day, and obtain the reposting probability utilizing the logistic regression which considers 16 possible factors influencing users’ reposting behavior. The 16 factors can be divided into three categories: the characteristics of microblog publishers, microblog text features and social relationship characteristics. We utilize the beginning 2/3 microblog data to obtain model parameters and logistic regression coe?cients, and the remaining 1/3 data to examine the validity of the model. The experiments on Sina Weibo network show that the model can predict users’ reposting behavior accurately only when it considers both reading and reposting probabilities. F-score which considers precision and recall is used to assess prediction effect of the model. The highest F-score for the prediction of SIRUB model on users’ reposting behavior is 0.228 which is much larger than those of classical Susceptible-Infected-Recovered (SIR, F-score=0.039) and Susceptible-Infected-Contacted-Recovered (SICR, F-score=0.037) models. The prediction on the spreading scope of microblog for SIR and SICR models is related with users’ number of fans while for SIRUB model not. For SIRUB model the mean and standard deviation of the errors of prediction on spreading scope are smaller than those of SIR and SICR models. These results indicate that users’ behaviors of reading and reposting microblog should be appropriately taken in account when modeling information diffusion on microblog networks, and that, in general, the prediction performance of the data-driven SIRUB model proposed in the paper is better than those of SIR and SICR models regardless of the prediction of users’ reposting behavior or diffusion scope of microblog.
出处
《物理学报》
SCIE
EI
CAS
CSCD
北大核心
2016年第15期277-288,共12页
Acta Physica Sinica
基金
国家自然科学基金(批准号:61473119
61104139)
中央高校基本科研业务费专项资金(批准号:WN1524301)资助的课题~~
关键词
微博网络
用户行为
信息扩散
microblog network, user behavior, information diffusion