摘要
事件以话题形式在微博中迅速传播,并能够产生巨大的影响力。因此,对参与话题传播过程的用户进行分析以及发现具有不同主题兴趣情感倾向性的群体受到政府和企业的广泛关注。现阶段,绝大多数应用到微博的群体发现算法都是从单个用户出发,仅考虑了用户社会联系,与用户共享内容相隔离,其群体发现的结果不具有语义信息。少数算法综合了用户社会联系与内容,却忽略了微博本身的结构特性。因此从微博话题的角度出发,综合考虑话题传播过程中的用户交互、微博文本内容以及情感极性,同时结合用户的行为信息,提出了一个基于概率生成模型的微博话题传播群体划分方法 BP-STG。采用吉布斯抽样对模型进行推导,不仅能够挖掘出具有不同主题倾向性的群体,同时还能够挖掘出群体的情感倾向分布以及用户在群体中的活跃度及其行为表现。此外,模型还能够推广到许多带有社交网络性质的媒体中。在获取的新浪微博两个话题数据集上的实验表明,BP-STG模型不仅能够有效地对微博话题传播群体进行划分,而且能够发现群体内部活跃用户以及用户在群体中的行为模式。
Event can spread rapidly in the form of topic microblog and make enormous influence. Therefore, the analysis for the users and discovering groups with different interesting and sentiments in the topic discussion obtain the concern of the government and enterprises. The generated content and relationship between the users are often separated in the current methods on community detection, which have no semantic information. Though some methods have combined the two factors, they fail to take account of the behavior information and sentiment information which exist in microb- log,and they are not well to mine the groups in the microblog topic discussion. We proposed a group partition model called BP-STG which takes the text information, social contacts, text sentiment information and the users' behavior into consideration. We presented a Gibbs sampling implementation for inference of our model, mining only different interest groups, but also the sentiment distribution and participants~ activeness and behavior information in a group. Besides, our model can be extended to many texts associated with a group of people such as E-mails and forum posts. Experimental results on actual dataset show that BP-STG model can offer an effective solution to group partition in topic-related mi- croblogging spreading and provide more meaningful semantic information than the state-of-the-art model.
作者
陈静
刘琰
王煦中
CHEN Jing LIU Yan WANG Xu-zhong(State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China)
出处
《计算机科学》
CSCD
北大核心
2016年第8期223-228,239,共7页
Computer Science
基金
国家自然科学基金(61309007)
国家863计划(2012AA012902)资助
关键词
微博话题
概率生成模型
群体划分
情感元素
行为模式
Microblogging topic, Probability generation model, Groups partition, Sentiment information, Behavior pattern