期刊文献+

主题概率模型在微博主题挖掘方面的研究综述 被引量:4

Research on Application of Probability Topic Model in Microblog Topic Mining
下载PDF
导出
摘要 近年来,微博凭借着自身的特点发展成为社会公共舆论的重要平台,对国家安全和社会发展产生了深远的影响,由此对微博文本主题提取显得格外重要。目前,文本主题挖掘的主流技术是主题概率模型。为此,首先对主题概率模型中LDA模型进行了详细地介绍;其次分析了微博的数据特点,从存在噪音词汇、微博文本短小以及微博的时序性等3个方面综述了主题概率模型在微博主题挖掘方面的研究;近一步又综述了利用主题模型发现基于主题的社团关系的研究;最后总结了未来主题模型在挖掘微博主题方面存在的挑战。 In recent years, microblog has become an important platform of social public opinion with its own characteristics, which can influence national security and social development. As such, mi- croblog topic mining is of particular significance. Currently, the main technology of topic mining in text is probability topic model. First, the LDA topic model was introduced briefly. Next, the paper analyzed the characteristics of the microblog data and summarized the research works on application of probability topic model in microblog topic mining from three aspects: short text, noise removal and the timing of microblog text. In addition, the application of probability topic model in mieroblog community discovery was introduced. Finally, some existing challenges were pointed out.
机构地区 信息工程大学
出处 《信息工程大学学报》 2017年第1期103-110,共8页 Journal of Information Engineering University
基金 国家自然科学基金资助项目(61309007) 国家863计划资助项目(2012AA012902) 国家科技支撑计划资助项目(2012BAH47B01)
关键词 微博 主题概率模型 主题 主题提取 社团发现 microblog probability topic model topic topic mining community discovery
  • 相关文献

参考文献4

二级参考文献106

  • 1于满泉,骆卫华,许洪波,白硕.话题识别与跟踪中的层次化话题识别技术研究[J].计算机研究与发展,2006,43(3):489-495. 被引量:49
  • 2石晶,戴国忠.基于PLSA模型的文本分割[J].计算机研究与发展,2007,44(2):242-248. 被引量:25
  • 3Deerwester S C, Dumais S T, Landauer T K, et al. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990.
  • 4Hofmann T. Probabilistic latent semantic indexing//Proceedings of the 22nd Annual International SIGIR Conference. New York: ACM Press, 1999:50-57.
  • 5Blei D, Ng A, Jordan M. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993-1022.
  • 6Griffiths T L, Steyvers M. Finding scientific topics//Proceedings of the National Academy of Sciences, 2004, 101: 5228 5235.
  • 7Steyvers M, Gritfiths T. Probabilistic topic models. Latent Semantic Analysis= A Road to Meaning. Laurence Erlbaum, 2006.
  • 8Teh Y W, Jordan M I, Beal M J, Blei D M. Hierarchical dirichlet processes. Technical Report 653. UC Berkeley Statistics, 2004.
  • 9Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 1977, B39(1): 1-38.
  • 10Bishop C M. Pattern Recognition and Machine Learning. New York, USA: Springer, 2006.

共引文献432

同被引文献45

引证文献4

二级引证文献45

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部