期刊文献+

基于LDA模型的研究领域热点及趋势分析 被引量:13

LDA-based Research Domain Hotspots and Trend Analysis
下载PDF
导出
摘要 随着研究的不断深入以及信息传播手段的进步,与某个研究领域相关的科学文献越来越多,也越来越容易得到,然而要阅读和分析这些数以千计的文献,仅凭人力已经难于实现对该领域研究重点、研究热点以及趋势进行全面系统地分析。鉴于此,提出一种基于LDA模型对某研究领域在一定时期内的热点及趋势进行自动识别的方法。该方法利用Gibbs抽样计算模型参数,获取领域热点主题以及热点词语,通过按时间后离散的主题演化方法分析热点主题在时间轴上的强度演化。以中文信息处理领域为例,通过对《中文信息学报》2001—2010十年间发表的学术论文进行分析,自动获取中文信息处理领域十年内的研究热点以及热点主题在时间轴上的演化趋势。实验结果初步证明了该方法的有效性。 Along with continuing in-depth research and the advancement of modem information dissemination technologies,more and more papers in a research domain are becoming available. Obviously,it' s quite difficult for researchers to read and analyze the huge amounts of papers for thoroughly detecting the research hotspots and trend of a domain. Targeting at solving the above problem,a LDA- based approach is proposed to automatically recognize the hotspots and trend of a research domain. Gibbs sampling is used to calculate the LDA model parameters and determine the research hotspots as well as their representative words. The trend analysis is achieved by post discretizing research topics over time. In the experiments,Chinese information processing is chosen as the target research domain. The research hotspots and trend over the ten year period from 2001 to 2010 were obtained by automatically analyzing all the papers published on the journal of Chinese information processing during that period. Preliminary experiments demonstrate the effectiveness of the proposed approach.
出处 《计算机技术与发展》 2012年第10期66-69,74,共5页 Computer Technology and Development
基金 河南省基础与前沿技术研究项目(112300410007) 河南省教育自然科学研究计划(2011A120002)
关键词 研究热点 LDA模型 GIBBS抽样 主题数目 主题演化 research hotspots LDA model Gibbs sampling topic number topic evolution
  • 相关文献

参考文献5

二级参考文献51

  • 1王泽彬,金飞,李夏,王冠.Web数据挖掘技术及实现[J].哈尔滨工业大学学报,2005,37(10):1403-1405. 被引量:11
  • 2于满泉,骆卫华,许洪波,白硕.话题识别与跟踪中的层次化话题识别技术研究[J].计算机研究与发展,2006,43(3):489-495. 被引量:49
  • 3Jain A K, Farrokhnia F. Unsupervised texture segmentation using Gabor filters [J ]. Pattern Recognition, 1991,24 ( 13 ) : 1167 - 1186.
  • 4Han Jiawei, Kamber M. Data Mining Concepts and Techniques[M].范明,孟小峰,等译.北京:机械工业出版社,2006.
  • 5Jain A K, Murty M N, Flynn P J. Data Clustering: A Review [ J ]. ACM Computing Surveys, 1999: 31 (3) : 264 - 323.
  • 6Thomas Hofmann. Probabilistic latent semantic indexing[C]//Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Berkeley, CA, USA, 1999,50-57.
  • 7David M. Blei, Andrew Y. Ng, Michael I. Jordan. Latent dirichlet allocation[J]. The Journal of Machine Learning Research,2003,3: 993-1022.
  • 8T. Griffiths,M. Steyvers. A probabilistic approach to semantic representation [C]//Proceedings of the 24th Annual Conference of the Congnitive Science Society. Mahwah, NJ : Erlbaum, 2002,381-386.
  • 9M. Steyvers,T. Griffiths. Probabilistic topic models In: T. Landauer, D. S. McNamara, S. Dennis, W Kintsch (Eds.), handbook of Latent Semantic Analysis[M]. Hillsdale, NJ.. Erlbaum. 2007.
  • 10X. Wang, A. McCallum. Topic over time: A non-mark ov continuous-time model of topical trends[C]//Pro ceedings of the 12th ACM SIGKDD International Con ference on Knowledge Discovery and Data Mining Philadelphia, PA, USA, 2006: 424-433.

共引文献139

同被引文献178

引证文献13

二级引证文献308

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部