期刊文献+

基于先验知识TextRank的学术文本关键词抽取 被引量:16

Keyword Extraction of Academic Text with TextRank Model Based on Prior Knowledge
原文传递
导出
摘要 【目的/意义】学术文本关键词抽取是从文本中自动抽取具有主题性、代表性的词或短语,是学术信息服务的重要环节。传统的方法大多仅依靠候选关键词有限的词频、文档频率等统计信息,没有考虑学术文本内候选关键词在对应学术领域的使用情况,使得关键词抽取的准确率受到限制。针对这一问题,本文提出一种基于先验知识TextRank的学术文本关键词抽取算法。【方法/过程】首先计算候选关键词的使用情况作为先验概率特征值,然后运用基于图排序的关键词抽取算法TextRank计算候选关键词的文本内特征值,最后结合以上两个特征计算得到候选关键词的综合权值并对关键词进行排序。【结果/结论】在计算机科学领域的多个文献集上进行了实验评估,其结果相较于传统的关键词抽取方法有了明显的提高,证明了基于先验知识TextRank的学术文本关键词抽取算法的有效性。 【Purpose/significance】Keyword extraction of academic text is the automatic extraction of thematic and representative words or phrases from the text, which is an important link in academic information service. Most of the traditional methods only rely on statistical information of candidate keywords such as word frequency, document frequency, etc., without considering the use of keywords in academic texts in corresponding academic fields, so that the accuracy of keyword extraction is limited. In this paper, a keyword extraction of academic text with TextRank model based on prior knowledge is proposed to solve the problem. 【Method/process】First, calculating usage of candidate keywords as the prior probability,and then the keyword extraction algorithm TextRank is used to calculate the intrinsic value of the candidate keywords in the text, finally, the comprehensive weights of the candidate keywords are calculated with two features above and the keywords are sorted.【Results/conclusion】The algorithm is experimentally evaluated in many literature sets in the field of computer science. The result has improved significantly compared with the traditional keyword extraction method, which proves that the keyword extraction of academic text with TextRank model based on prior knowledge is effective.
作者 方俊伟 崔浩冉 贺国秀 陆伟 FANG Jun-wei;CUI Hao-ran;HE Guo-xiu;LU Wei(Center for Studies of Information Resources,Wuhan University,Wuhan 430072,China;Institute for Information Retrieval and Knowledge Mining,Wuhan University,Wuhan 430072,China)
出处 《情报科学》 CSSCI 北大核心 2019年第3期75-80,共6页 Information Science
基金 国家自然科学基金面上项目"面向词汇功能的学术文本语义识别与知识图谱构建"(71473183)
关键词 先验知识 关键词抽取 TextRank 学术文本 prior knowledge keyword extraction TextRank academic text
  • 相关文献

参考文献3

二级参考文献24

  • 1ManningCD,RaghavanP,SchlotzeH.信息检索导论[M].王斌,译.北京:人民邮电出版社,2010:241-276.
  • 2中国科学院国际会议服务平台发布会议数突破千个[EB/OL] . (2014-07-03). [2015-03-10] . http://www. cnic. cn/xw/kydt/201407/t20140704_4149054. html.
  • 3Chumki B, Haym H, Cohen W W, et al. Technical paper recommendation:a study in combining multiple information sources[J] . Journal of Artificial Intelligence Research, 2001, 14:231-252.
  • 4Mihalcea R, Tarau P. TextRank:bring order into texts[C] //Proc of Conference on Empirical Methods in Natual Language Processing. 2004:355-369.
  • 5Lee L. Similarity-based approaches to natural language processing, TR-11-97[R] . Cambrdge:Harvard University, 1997.
  • 6Agarwal N, Haque E, Liu H, et al. Research paper recommender systems:a subspace clustering approach[C] //Proc of the 6th International Conference on Advances in Web-Age Information Management. 2005:475-491.
  • 7Pazzani M J, Billsus D. Content-based recommendation systems[M] //The Adaptive Web. Berlin:Springer, 2007:325-341.
  • 8Herlocker J L, Konstan J A, Riedl J. Explaining collaborative filtering recommendations[C] //Proc of ACM Conference on Computer Supported Cooperative Work. [S. l.] :ACM Press, 2000:241-250.
  • 9Singhal A. Modern information retrieval:a brief overview[J] . Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2001, 24(4):35-43.
  • 10Page L. The PageRank citation ranking:bringing order to the Web[J] . Stanford Infolab, 1999, 9(1):1-14.

共引文献103

同被引文献229

引证文献16

二级引证文献67

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部