期刊文献+

抑制背景噪声的LDA子话题挖掘算法 被引量:1

LDA Subtopic Detection Algorithm with Background Noise Restraint
下载PDF
导出
摘要 专题文章集合是一些拥有相似背景知识的文章集合.为了更好地从专题文章集合内部的复杂信息关联中高效挖掘子话题信息,文中提出了抑制背景噪声的线性判别分析(LDA)子话题挖掘算法BLDA,通过预先抽取专题文档集合的共同背景知识、在迭代过程中重设关键词的产生等方式提高子话题抽取的准确程度.在微信公众账号文章上的系列实验证明,BLDA算法针对有共同背景的专题文章集合的聚类结果显著优于传统的LDA算法,其中主题召回率提高了170%,Purity聚类指标提高了143%,NMI聚类指标提高了160%. Special article set is a collection of articles with common background knowledge. In order to more effec-tively detect the subtopics form special article set with complex information correlation, an LDA subtopic detection algorithm with background noise restraintnamed BLDA is proposed, which improves the precision of subtopic detec-tion from article set by firstly extracting the common background knowledge and then reproducing the keywords in each iteration step. By a series of experiments on a set of WeChat documents from public accounts, it is proved that the detection results obtained by BLDA are much better than those obtained by LDA, with a topic recall rate incre-ment of about 170% , a Purity index increment of 143% and a NMI index increment of 160%.
作者 李静远 丘志杰 刘悦 程学旗 任彦 LI Jing-yuan QIU Zhi-jie LIU Yue CHENG Xue-qi REN Yan(Institute of Computing Technology//Key Laboratory of Network Data Science and Technology, Chinese Academy of Sciences, Beijing 100190, China National Computer Network Emergency Response Technical Team CoordinationCenter of China, Beijing 100029, China)
出处 《华南理工大学学报(自然科学版)》 EI CAS CSCD 北大核心 2017年第3期54-60,共7页 Journal of South China University of Technology(Natural Science Edition)
基金 国家自然科学基金资助项目(61303244 61572473 61572469 61402442 61402022 61370132) 国家242信息安全计划项目(2015F114)~~
关键词 子话题挖掘 线性判别分析 背景噪声抑制 subtopic mining linear discriminant analysis background noise restraint
  • 相关文献

参考文献2

二级参考文献28

  • 1ALLAN J, PAPKA R, LAVRENKO V. On-line new event detection and tracking[C] // SIGIR '98: Proceedings of the 21th ACM SIGIR International Conference on Research and Development in Information Retrieval. New York: ACM, 1998:37-45.
  • 2路荣,项亮,刘明荣,等.基于隐主题分析和文本聚类的微博客新闻话题发现研究[C] // 第六届全国信息检索学术会议论文集. 北京:中国中文信息学会,2010.
  • 3RAMAGE D,DUMAIS S T,LIEBLING D J.Characterizing microblogs with topic models[C] // Proceedings of the Fourth International Conference on Weblogs and Social Media.Menlo Park: AAAI Press,2010:130-137.
  • 4ASUNCION A, SMYTH P, WELLING M. Asynchronous distributed learning of topic models[C] // NIPS 2008: Proceedings of the 22th Annual Conference on Neural Information Processing Systems. Atlanta: Curran Associates Inc, 2008: 81-88.
  • 5BLEI D M, LAFFERTY J D. A correlated topic model of science[J].Annals of Applied Statistics, 2007, 1(1):17-35.
  • 6SANKARANARAYANAN J, SAMET H, BENJAMIN E T, et al. TwitterStand: news in Tweets[C] // Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. New York: ACM, 2009:42-51.
  • 7SHARIFI B M, HUTTON A, KALITA J K. Automatic microblog classification and summarization[C] // Proceedings of Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics. Stroudsburg: The Association for Computational Linguistics, 2010: 685-688.
  • 8INOUYE D. Multiple post microblog summarization[R]. Colorado Springs, GA: University of Colorado at Colorado Springs, 2010.
  • 9YEUNG C-M A, IWATA T. Capturing implicit user influencein online social sharing[C] // Proceedings of the 21th ACM Conference on Hypertext and Hypermedia. New York: ACM, 2010:245-254.
  • 10ANAGNOSTOPOULOS A, KUMAR R, MAHDIAN M. Influence and correlation in social networks[C] // KDD'08: Proceeding of the 14th ACM International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2008: 7-15.

共引文献34

同被引文献22

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部