摘要
现有微博热点话题发现模型对微博数量规模较敏感,发现速度较慢。为此,提出一种基于热度矩阵的主题模型。通过热度矩阵获取各潜在主题的热度和主题-词概率分布,并以词间的共有热度来挖掘其语义关系,进而准确识别数据中的热点话题及热点词汇。在真实微博数据上的实验结果表明,与潜在狄利克雷分布模型相比,该模型的效率和准确率较高,发现的热点话题与实时事件保持一致,具有较好的热点识别效果。
Existing methods or models of microblog hot topics detection are sensitive to the quantity and the scale of microblog,and the detection process is slow.Hence,this paper proposes a topic model based on heat matrix.It uses the heat matrix to obtain heat and the topic-word probability distribution of every latent topic,and uses the common heat of words to extract the semantic relationship between words.Then the hot topics and hot words can be identified accurately.Experimental results on real microblog show that,compared with Latent Dirichlet Allocation(LDA) model,the proposed model has higher efficiency and accuracy rate.It can detect the hot topics which are consistent with real-time events,so that it has better effect in hot spot identification.
出处
《计算机工程》
CAS
CSCD
北大核心
2017年第2期57-62,共6页
Computer Engineering
基金
国家自然科学基金重点项目(U1135005)