摘要
由于微博文本短、词量少、语法风格随意的特点,因此微博中包含大量因缺少主题词汇而无法分析话题归属的微博,即隐式微博。提出改进的基于LDA的生成模型考虑评论组和转发微博的CGRMB-LDA模型,利用微博间评论关系、转发关系和上下文关系扩展隐式微博,明确隐式微博的主题归属,采用吉布斯采样的方法来求解模型从而得到主题集和微博所属主题。在真实数据集上的实验表明,CGRMB-LDA模型能有效地对微博特别是隐式微博进行主题挖掘。
Microblog is too short and grammatically casual so many microblogs can not be analyzed and divided into topics for lack of theme words,which are called implict microblogs.This paper proposed Commnet Group-Retransmission Microblog( CGRMB)-Latent Dirichlet Allocation( LDA) model which can explicitly divide implicit microblogs to topics considering comment group and retransmission relationship,using comment,retransmission and context relationship in microblogs to expand implicit microblog,and using Gibbs sampling in order to get theme sets and their belonged microblog topics.Experimental results on actual dataset show that CGRMB-LDA model can effectively mine the topics of microblogs.
出处
《计算机应用》
CSCD
北大核心
2016年第A01期67-71,共5页
journal of Computer Applications
关键词
微博
主题挖掘
评论组
转发微博
潜在Dirichlet分配
隐式微博
microblog
topic mining
comment group
retransmission microblog
Latent Dirichlet Allocation(LDA)
implict microblog