期刊文献+

基于LDA模型的中文微博热点话题发现 被引量:6

A Hot Topic Identification based on LDA for Chinese Microblog
下载PDF
导出
摘要 针对微博文本数量增加速度快、信息量繁杂等问题,将LDA模型应用到热点话题的挖掘中,构建出微博热点话题的识别过程。首先应用LDA模型对微博语料库进行主题建模,采用困惑度方法确定最佳主题个数,通过Gibbs抽样算法实现参数推理,获得语料库的主题-词汇概率分布和文本-主题概率分布,在此基础上计算并识别出微博中的热点话题、热点词汇和热点话题微博。实验结果显示该模型与人工挑选的结果基本一致,表明该模型具有较好的热点识别效果。 In order to solve the problem that the number of microblog text is increasing quickly and the amount of microblog information is very complicated, LDA model is applied to mine the hot topic, and the identification process of mieroblog hot topic is constructed. Firstly, we use LDA to model microblog corpus, determine the best number of topics by the perplexity, and achieve parameters estimation with Gibbs sampling algorithm, then we obtain the probability distribution of the topic and the word and the probability distribution of the text and the topic, on the basis of this, we calculate and identify hot topics, hot words and hot topics microblog. Experimental results show that this model is consistent with the results of artificial selection, indicating that the model has better recognition performance on hotspots.
出处 《宿州学院学报》 2014年第4期71-73,77,共4页 Journal of Suzhou University
基金 宿州学院校级科研平台开放课题项目"问答社区中基于LDA的问题推荐机制研究"(2013YKF14) 安徽省大学生创新创业训练计划项目"基于微博的网络舆情挖掘研究"(AH201310379082) 安徽省大学生创新创业训练计划项目"改进的BP神经网络在ERP实施风险评价中的应用"(AH201310379078) 安徽省高校省级自然科学研究项目"基于本体的直搜索研究及应用"(KJ2012Z395)
关键词 LDA 微博 热点话题 latent dirichlet allocation microblog hot Topic
  • 相关文献

参考文献3

二级参考文献33

  • 1YE Hui-min,CHENG Wei,DAI Guan-zhong.Design and Implementation of On-Line Hot Topic Discovery Model[J].Wuhan University Journal of Natural Sciences,2006,11(1):21-26. 被引量:14
  • 2Allan J. Topic Detection and Tracking: Event-based Information Organization[M]. [S.l.]: KluwerAcademic Publishers, 2002: 1-16.
  • 3Ault T G, Yang Yiming. Information Filtering in TREC-9 and TDT-3: A Comparative Analysis[J]. Information Retrieval, 2002, 5(2/3): 159-187.
  • 4Wei Chih-Ping, Chang Yu-Hsiu. Discovering Event Evolution Patterns from Document Sequences[J]. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans, 2007, 32(2): 12-13.
  • 5ALLAN J, CARBONELL J, DODDINGTON G, et al. Topic detection and tracking pilot study : final report [ C ] // Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. Virginia: Lansdowne, 1998: 194-218.
  • 6LEEK T, SCHWARTZ R M, SISTA S. Probabilistic approaches to topic detection and tracking [ C ] //Topic Detection and Tracking: Event-based Information Organization. Kluwer Academic : Massachusetts, 2002 : 67-83.
  • 7CHEN K Y, LUESUKPRASERT L, CHOU S C T. Hot topic extraction based on timeline analysis and multidimensional sentence modeling [ J ]. IEEE Transactions on Knowledge Data Engineering, 2007 (19) : 1016-1025.
  • 8罗亚平,王枞,周延泉.基于关注度的热点话题发现模型[M]//萧国政,何炎祥,孙茂松.中文计算技术与语言问题研究.北京:电子工业出版社,2007:402-408.
  • 9OKA M, ABE H, KATO K. Extracting topics from Weblogs through frequency segments [ C ] // Proceedings of the WWW2006 Workshop on Web Intelligence, 2006: 22-26.
  • 10BLEI D M, NG A Y, JORDAN M I. Latent difichlet allocation[J]. Journal of Machine Learning Research, 2003 (3).

共引文献53

同被引文献40

引证文献6

二级引证文献45

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部