期刊文献+

基于动态LDA主题模型的内容主题挖掘与演化 被引量:73

Mining and Evolution of Content Topics Based on Dynamic LDA
原文传递
导出
摘要 指出文本内容主题的挖掘和演化研究对于文本建模和分类及推荐效果提升具有重要作用。从分析基于LDA主题模型的文本内容主题挖掘原理入手,针对当前网络环境下的文本内容特点,构建适用于动态文内容本主题挖掘的LDA模型,并通过改进的Gibbs抽样估计提高主题挖掘的准确性,进而从主题相似度和强度两个方面研究内容主题随时间的演化问题。实验表明,所提方法可行且有效,对后续有关文本语义建模和分类研究等具有重要的实践意义。 The study of mining and evolution of text topics is of important significance for text modeling and classification, as well as the recommendation service. Starting from the analysis of theory of text topic modeling based on LDA, aiming at dynamic characters of text contents under social networking environment, this article constructed a dynamic LDA model for mining of text topics. Subsequently, the accuracy degree of topic mining was improved by incremental Gibbs sampling and estimation. Furthermore, the evolution of dynamic topics of text contents was achieved from the aspects of topic similarity and intensity. The experiment demonstrated that methods proposed in this article were feasible and effective, which will be the foundation of further study about semantic modeling and classification text.
作者 胡吉明 陈果
出处 《图书情报工作》 CSSCI 北大核心 2014年第2期138-142,共5页 Library and Information Service
基金 教育部人文社会科学青年基金项目"社会网络环境下信息内容主题挖掘与语义分类研究"(项目编号:13YJC870008) 国家自然科学青年基金项目"社会网络环境下基于用户-资源关联的信息推荐研究(项目编号:71303178)"研究成果之一
关键词 主题挖掘 主题演化 动态LDA模型 topics mining topics evolution dynamic LDA model
  • 相关文献

参考文献22

  • 1Deerwester S,Dumais S T,Furnas G W,et al.Indexing by latent semantic analysis[J].Journal of the American Society for Information Science,1990,114(2):211-244.
  • 2Hofmann T.Probabilistic latent semantic analysis[C]//Proceedings of the Twenty-Second Annual International SIGIR,Conference on Research and Development in Information Retrieval.New York:ACM,1999:50-57.
  • 3Blei D M,Ng A Y,Jordan M L,et al.Latent Dirichlet allocation[J].Journal of Machine Learning Research,2003,3(2):993-1022.
  • 4Blei D M.Probabilistic topic models[J].Communications of the ACM,2012,55(4):77-84.
  • 5Barbieri N,Manco G,Ritacco E,et al.Probabilistic topic models for sequence data[J].Machine Learning,2013,93(1):5-29.
  • 6Isaly L,Trias E,Peterson G.Improving the latent Dirichlet allocation document model with WordNet[C]//Proceedings of the 5th International Conference on Information Warfare and Security.London:Academic Conferences Ltd,2010:163-170.
  • 7Hofmann T.Unsupervised learning by probabilistic latent semantic analysis[J].Machine Learning,2001,42(1):177-196.
  • 8Du Lan,Buntine W,Jin Huidong,et al.Sequential latent Dirichlet allocation[J].Knowledge and Information Systems,2012,31(3):475-503.
  • 9Mohd M,Crestani F,Ruthven I.Evaluation of an interactive topic detection and tracking interface[J].Journal of Information Science,2012,38(4):383-398.
  • 10Aksoy C,Can F,Kocberber S.Novelty detection for topic tracking[J].Journal of The American Society for Information Science and Technology,2012,63(4):777-795.

二级参考文献66

  • 1冯长远,普杰信.Web文本特征选择算法的研究[J].计算机应用研究,2005,22(7):36-38. 被引量:8
  • 2YE Hui-min,CHENG Wei,DAI Guan-zhong.Design and Implementation of On-Line Hot Topic Discovery Model[J].Wuhan University Journal of Natural Sciences,2006,11(1):21-26. 被引量:14
  • 3郭志鑫,金海,陈汉华.SemreX中基于语义的文档参考文献元数据信息提取[J].计算机研究与发展,2006,43(8):1368-1374. 被引量:8
  • 4吴渝,周凯,刘群,等.突现计算的研究进展[C].中国人工智能学会第12次全国学术年会论文集,(CAAI-12),哈尔滨,2007(12).
  • 5ALLAN J, CARBONELL J, DODDINGTON G, et al. Topic detection and tracking pilot study : final report [ C ] // Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. Virginia: Lansdowne, 1998: 194-218.
  • 6LEEK T, SCHWARTZ R M, SISTA S. Probabilistic approaches to topic detection and tracking [ C ] //Topic Detection and Tracking: Event-based Information Organization. Kluwer Academic : Massachusetts, 2002 : 67-83.
  • 7CHEN K Y, LUESUKPRASERT L, CHOU S C T. Hot topic extraction based on timeline analysis and multidimensional sentence modeling [ J ]. IEEE Transactions on Knowledge Data Engineering, 2007 (19) : 1016-1025.
  • 8罗亚平,王枞,周延泉.基于关注度的热点话题发现模型[M]//萧国政,何炎祥,孙茂松.中文计算技术与语言问题研究.北京:电子工业出版社,2007:402-408.
  • 9OKA M, ABE H, KATO K. Extracting topics from Weblogs through frequency segments [ C ] // Proceedings of the WWW2006 Workshop on Web Intelligence, 2006: 22-26.
  • 10BLEI D M, NG A Y, JORDAN M I. Latent difichlet allocation[J]. Journal of Machine Learning Research, 2003 (3).

共引文献116

同被引文献953

引证文献73

二级引证文献399

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部