摘要
把关键词抽取问题看作是构成文档词语的重要性排序问题,基于TextRank基本思想,构建候选关键词图,引入覆盖影响力、位置影响力和频度影响力用于计算词语之间的影响力概率转移矩阵,通过迭代法实现候选关键词分值计算,并挑选前N个作为关键词抽取结果。实验结果表明,对词语位置加权的TextRank方法优于传统的TextRank方法和基于LDA主题模型的关键词抽取方法。
The keyword extraction problem is taken as a word importance ranking problem. In this paper,candidate key- word graph is constructed based on TextRank, and the influences of word coverage, location and frequency are used to cal- culate the probability transition matrix, then, the word score is calculated by iterative method, and the top N candidate keywords are picked as the final results. Experimental results show that the proposed word position weighted TextRank method is better than the traditional TextRank method and LDA topic model method.
出处
《现代图书情报技术》
CSSCI
北大核心
2013年第9期30-34,共5页
New Technology of Library and Information Service
基金
国家社会科学基金项目"Web2.0环境下的网络舆情采集与分析"(项目编号:09CTQ027)
国家社会科学基金重大项目"云计算环境下的信息资源集成与服务研究"(项目编号:12&ZD220)的研究成果之一