摘要
【目的/意义】学术文本关键词抽取是从文本中自动抽取具有主题性、代表性的词或短语,是学术信息服务的重要环节。传统的方法大多仅依靠候选关键词有限的词频、文档频率等统计信息,没有考虑学术文本内候选关键词在对应学术领域的使用情况,使得关键词抽取的准确率受到限制。针对这一问题,本文提出一种基于先验知识TextRank的学术文本关键词抽取算法。【方法/过程】首先计算候选关键词的使用情况作为先验概率特征值,然后运用基于图排序的关键词抽取算法TextRank计算候选关键词的文本内特征值,最后结合以上两个特征计算得到候选关键词的综合权值并对关键词进行排序。【结果/结论】在计算机科学领域的多个文献集上进行了实验评估,其结果相较于传统的关键词抽取方法有了明显的提高,证明了基于先验知识TextRank的学术文本关键词抽取算法的有效性。
【Purpose/significance】Keyword extraction of academic text is the automatic extraction of thematic and representative words or phrases from the text, which is an important link in academic information service. Most of the traditional methods only rely on statistical information of candidate keywords such as word frequency, document frequency, etc., without considering the use of keywords in academic texts in corresponding academic fields, so that the accuracy of keyword extraction is limited. In this paper, a keyword extraction of academic text with TextRank model based on prior knowledge is proposed to solve the problem. 【Method/process】First, calculating usage of candidate keywords as the prior probability,and then the keyword extraction algorithm TextRank is used to calculate the intrinsic value of the candidate keywords in the text, finally, the comprehensive weights of the candidate keywords are calculated with two features above and the keywords are sorted.【Results/conclusion】The algorithm is experimentally evaluated in many literature sets in the field of computer science. The result has improved significantly compared with the traditional keyword extraction method, which proves that the keyword extraction of academic text with TextRank model based on prior knowledge is effective.
作者
方俊伟
崔浩冉
贺国秀
陆伟
FANG Jun-wei;CUI Hao-ran;HE Guo-xiu;LU Wei(Center for Studies of Information Resources,Wuhan University,Wuhan 430072,China;Institute for Information Retrieval and Knowledge Mining,Wuhan University,Wuhan 430072,China)
出处
《情报科学》
CSSCI
北大核心
2019年第3期75-80,共6页
Information Science
基金
国家自然科学基金面上项目"面向词汇功能的学术文本语义识别与知识图谱构建"(71473183)