期刊文献+

一种基于LDA和TextRank的文本关键短语抽取方案的设计与实现 被引量:6

DESIGN AND IMPLEMENTION OF A KEY PHRASES EXTRACTION SCHEME IN THE TEXT BASED ON LDA AND TEXTRANK
下载PDF
导出
摘要 为了抽取出更能反映文本主题的关键词,也为了解决文本关键短语抽取任务中主题信息缺失的问题,提出一种基于LDA和TextRank的单文本关键短语抽取方法。该方法利用LDA模型对语料库中的文本进行主题挖掘,并融入目标文本中的主题覆盖度和词语共现关系构建无向加权词图;引入节点词汇主题影响力因素根据词语主题相关性来修改节点间的随机跳转概率,在词图的基础上运用TextRank算法获取候选关键词排序;再利用bootstraping算法的思想迭代生成表意性更强的关键短语。实验表明,该方法可有效提取出表意性强且涵盖文本主题信息的关键短语。 In order to extract the key words which can better reflect the theme of the text and to solve the problem of lacking of the theme information in the task of extracting the key phrases of the text,a key phrase extraction method based on LDA and TextRank was proposed.This method firstly used the LDA model for topic mining of the text in the corpus and constructed the undirected weighted word graph integrating the subject coverage and words co-occurrence in the target text.Then,by introducing the influence factors of subject nouns,the probability of random jumps between nodes was modified according to the topic relevance of words,and the TextRank algorithm was used to obtain the ranking of candidate keywords based on the word map.Finally,we used the idea of bootstrapping algorithm to iteratively generate more expressive key phrases.Experimental results showed that the method could effectively extract the key phrases which expressed well and covered text subject information.
作者 郎冬冬 刘晨晨 冯旭鹏 刘利军 黄青松 Lang Dongdong;Liu Chenchen;Feng Xupeng;Liu Lijun;Huang Qingsong(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,Yunnan,Chin;Yunnan Key Laboratory of Computer Technology Applications,Kunming 650500,Yunnan,China)
出处 《计算机应用与软件》 北大核心 2018年第3期54-60,共7页 Computer Applications and Software
基金 国家自然科学基金项目(81360230)
关键词 关键短语抽取 LDA模型 主题挖掘 TextRank 主题影响力 Key phrase extraction LDA model Theme mining TextRank Theme influence
  • 相关文献

参考文献5

二级参考文献73

  • 1刘知远.基于文档主题结构的关键词抽取方法研究[D].北京:清华大学,2011.
  • 2Blei David,Ng Andrew,Jordan Michael.Latent Dirichlet Allocation[J].The Journal of Machine Learning Research,2003,3:993-1022.
  • 3Rosen-Zvi M,Griffiths T,Steyvers M,et al.The author-topic model for authors and documents[C]//Proceedings of the 20th conference on uncertainty in artificial intelligence.AUAI Press,2004:487-494.
  • 4Ruifeng XU,Lu YE.Reader's Emotion Prediction Based on Weighted Latent Dirichlet Allocation and Multi-label k-nearest Neighbor Model[J].Journal of Computational Information System,2013,9:6.
  • 5Johri N,Roth D,Tu Y.Experts' retrieval with multiword-enhanced author topic model.Proceedings of the NAACL HLT 2010 workshop on semantic search[C]//Proceedings of Association for Computational Linguistics,2010:10-18.
  • 6William Darling,Fei Song.Probabilistic Topic and Syntax Modeling with Part-of-Speech LDA[C]//Proceedings of Association for Computational Linguistics.2005.
  • 7Griffiths T L,Steyvers M,Blei D M,et al.Integrating topics and syntax[J].Advances in neural information processing systems,2005,17:537-544.
  • 8Allison J.B.Chaney,David M.Blei.Visualizing Topic Models[C]//Proceedings of Association for the Advancement of Artificial Intelligence.2012.
  • 9Teh Y W,Jordan M I,Beal M J,et al.Hierarchical dirichlet processes[J].Journal of the American Statistical Association,2006,101(476).
  • 10Blei D M,Lafferty J D.Visualizing topics with multiword expressions[J].arXiv preprint arXiv:0907.1013,2009.

共引文献135

同被引文献68

引证文献6

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部