摘要
随着研究的不断深入以及信息传播手段的进步,与某个研究领域相关的科学文献越来越多,也越来越容易得到,然而要阅读和分析这些数以千计的文献,仅凭人力已经难于实现对该领域研究重点、研究热点以及趋势进行全面系统地分析。鉴于此,提出一种基于LDA模型对某研究领域在一定时期内的热点及趋势进行自动识别的方法。该方法利用Gibbs抽样计算模型参数,获取领域热点主题以及热点词语,通过按时间后离散的主题演化方法分析热点主题在时间轴上的强度演化。以中文信息处理领域为例,通过对《中文信息学报》2001—2010十年间发表的学术论文进行分析,自动获取中文信息处理领域十年内的研究热点以及热点主题在时间轴上的演化趋势。实验结果初步证明了该方法的有效性。
Along with continuing in-depth research and the advancement of modem information dissemination technologies,more and more papers in a research domain are becoming available. Obviously,it' s quite difficult for researchers to read and analyze the huge amounts of papers for thoroughly detecting the research hotspots and trend of a domain. Targeting at solving the above problem,a LDA- based approach is proposed to automatically recognize the hotspots and trend of a research domain. Gibbs sampling is used to calculate the LDA model parameters and determine the research hotspots as well as their representative words. The trend analysis is achieved by post discretizing research topics over time. In the experiments,Chinese information processing is chosen as the target research domain. The research hotspots and trend over the ten year period from 2001 to 2010 were obtained by automatically analyzing all the papers published on the journal of Chinese information processing during that period. Preliminary experiments demonstrate the effectiveness of the proposed approach.
出处
《计算机技术与发展》
2012年第10期66-69,74,共5页
Computer Technology and Development
基金
河南省基础与前沿技术研究项目(112300410007)
河南省教育自然科学研究计划(2011A120002)