摘要
本文尝试基于LDA主题模型探测文档集中的新兴主题.本文采用主题的新颖度、发文量指标,并引入被引量,得到新兴主题的特征指标,在此基础上对主题在进入成熟阶段前各个时期的特征进行了分析.并提出了针对上述新兴主题探测指标,基于LDA主题模型抽取文档的语义主题词,利用文档-主题矩阵建立主题和文档的映射,得到主题的新颖度指标和发文量指标、被引量指标,并形成新兴主题探测表格和探测曲线VDP,从而探测出新兴主题,并对新兴主题VDP与基线VDP距离的发展趋势进行预测,根据拟合的曲线对其进行分析,得到最值得关注的新兴主题.
This paper proposes one method to detect the emerging topics of the courpus based on the LDA model. In this paper, the topic novelty index, published volume index, and the cited volume index are all used to get the feature index of the emerging topic ; and based on it,this paper analyses the feature of each period before the topics enter into the mature period. This paper also proposes the detection index for the emerging topic, and extracts the semantic topical words of the documents using the LDA model,and construct the mapping from topic to document using the document-topic matrix, and based on the mapping,gets the novelty index,published volumn index and cited volume index of the topic respectively,and forms the detection table of the emerging topic and detection curve VDP,and gets the emerging topics furtherly. And based on the distance between the VDP of the emerging topics and the baseline VDP, this paper analysises and predicted the trend using the fitted curve to find most attractive emerging topics.
出处
《情报学报》
CSSCI
北大核心
2014年第7期698-711,共14页
Journal of the China Society for Scientific and Technical Information
基金
中国科学院西部之光联合学者项目“基于计算情报方法的甘肃省战略新兴产业技术创新竞争与发展研究”
国家自然科学基金项目(项目编号:71373260)的研究成果之一
关键词
隐狄利克雷分布
主题模型
新兴主题
主题特征
新颖度指标
发文量指标
被引量指标
生命周期
LDA( Latent Diriehlet Allocation) , topic model, emerging topic, topic feature, novelty index, published volume index, citation volume index, life cycle