摘要
为了抽取出更能反映文本主题的关键词,也为了解决文本关键短语抽取任务中主题信息缺失的问题,提出一种基于LDA和TextRank的单文本关键短语抽取方法。该方法利用LDA模型对语料库中的文本进行主题挖掘,并融入目标文本中的主题覆盖度和词语共现关系构建无向加权词图;引入节点词汇主题影响力因素根据词语主题相关性来修改节点间的随机跳转概率,在词图的基础上运用TextRank算法获取候选关键词排序;再利用bootstraping算法的思想迭代生成表意性更强的关键短语。实验表明,该方法可有效提取出表意性强且涵盖文本主题信息的关键短语。
In order to extract the key words which can better reflect the theme of the text and to solve the problem of lacking of the theme information in the task of extracting the key phrases of the text,a key phrase extraction method based on LDA and TextRank was proposed.This method firstly used the LDA model for topic mining of the text in the corpus and constructed the undirected weighted word graph integrating the subject coverage and words co-occurrence in the target text.Then,by introducing the influence factors of subject nouns,the probability of random jumps between nodes was modified according to the topic relevance of words,and the TextRank algorithm was used to obtain the ranking of candidate keywords based on the word map.Finally,we used the idea of bootstrapping algorithm to iteratively generate more expressive key phrases.Experimental results showed that the method could effectively extract the key phrases which expressed well and covered text subject information.
作者
郎冬冬
刘晨晨
冯旭鹏
刘利军
黄青松
Lang Dongdong;Liu Chenchen;Feng Xupeng;Liu Lijun;Huang Qingsong(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,Yunnan,Chin;Yunnan Key Laboratory of Computer Technology Applications,Kunming 650500,Yunnan,China)
出处
《计算机应用与软件》
北大核心
2018年第3期54-60,共7页
Computer Applications and Software
基金
国家自然科学基金项目(81360230)