期刊文献+

融合Word2vec与TextRank的关键词抽取研究 被引量:66

Using Word2vec with Text Rank to Extract Keywords
原文传递
导出
摘要 【目的】通过融合单个文档内部结构信息和文档整体的词向量关系进行关键词抽取。【方法】利用Word2vec将文档集中所有词汇进行向量表征,并且通过词向量计算词汇之间的相似度,进而对Text Rank算法进行改进,将候选关键词的权重按照词汇之间的相似度和邻接关系进行非均匀分配,并构建对应的概率转移矩阵用于词汇图模型的迭代计算以及关键词抽取。【结果】实现Word2vec与Text Rank的有效融合,且当训练文档集词汇分布合理时,关键词抽取效果较明显。【局限】需要进行成本较高的文档集训练,获取词向量以及词关系矩阵。【结论】文档集中的词关系有助于修正单文档内部的词关系,提升单文档的关键词抽取准确性。 [Objective] This study extracts keywords through combining the internal structure of each single document and the word vector of the corpus. [Methods] First, we used Word2vec to represent all words' vector from the document corpus and then calculated their similarities. Second, modified the TextRank algorithm and assigned weights to the keywords in accordance with their similarities and adjacency relations. Finally, we built a probability transfer matrix for the iterative calculation of the lexical graph model and then extracted keywords. [Results] The Word2vec and TextRank were integrated and extracted keywords effectively. [Limitations] The proposed method needs much training with the corpus to establish word vector and relation matrix. [Conclusions] The relationship among words from the document sets could help us modify the words relationship from a single document, and then increase the accuracy of extracting keywords from the individual document.
出处 《现代图书情报技术》 CSSCI 2016年第6期20-27,共8页 New Technology of Library and Information Service
关键词 抽取 Word2vec TextRank 图模型 词向量 Keyword extraction Word2vec TextRank Graphical model Word vector
  • 相关文献

参考文献10

二级参考文献96

共引文献402

同被引文献494

引证文献66

二级引证文献359

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部