期刊文献+

基于改进TextRank的文本摘要自动提取 被引量:12

AUTOMATIC EXTRACTION OF TEXT SUMMARIZATION BASED ON IMPROVED TEXTRANK
下载PDF
导出
摘要 TextRank算法在自动提取中文文本摘要时只考虑句子间的相似性,而忽略了词语间的语义相关信息及文本的重要全局信息。对此,提出一种基于改进TextRank的文本摘要自动提取算法(SW-TextRank)。通过Word2Vec训练的词向量来计算句子之间的相似度,并综合考虑句子位置、句子与标题的相似度、关键词的覆盖率、关键句子以及线索词等影响句子权重的因素,从而优化句子权重;对得到的候选摘要句群进行冗余处理,选取适量排序靠前的句子并根据其在原文中的顺序重新排列得到最终文本的摘要。实验结果表明,SW-TextRank算法生成摘要的准确性比TextRank算法更高,摘要生成质量更好。 The TextRank algorithm only considers the similarity between sentences when extracting Chinese text summarization automatically,but ignores the semantic information between words and the important global information of text.In view of this,we propose a text summarization automatic extraction algorithm(SW-TextRank)based on improved TextRank.The similarity between sentences was calculated based on word vectors trained by Word2Vec and the factors that affect sentence weight,such as position of sentences,similarity between sentences and title,coverage of keywords,key sentences and clue words,were taken into account to optimize the sentence weight.The candidate summary sentence group was redundantly processed,and the top-ranking sentences were selected and rearranged according to their order in the text to get the final summary.The experimental results show that the accuracy of the summarization generated by SW-TextRank algorithm is higher than that of the TextRank algorithm,and the quality of summarization is better.
作者 汪旭祥 韩斌 高瑞 陈鹏 Wang Xuxiang;Han Bin;Gao Rui;Chen Peng(School of Computers,Jiangsu University of Science and Technology,Zhenjiang 212003,Jiangsu,China)
出处 《计算机应用与软件》 北大核心 2021年第6期155-160,共6页 Computer Applications and Software
关键词 文本摘要 SW-TextRank算法 词向量 相似度 句子权重 Text summary SW-TextRank algorithm Word vector Similarity Sentence weight
  • 相关文献

参考文献8

二级参考文献101

  • 1秦兵,刘挺,李生.基于局部主题判定与抽取的多文档文摘技术[J].自动化学报,2004,30(6):905-910. 被引量:10
  • 2张奇,黄萱菁,吴立德.一种新的句子相似度度量及其在文本自动摘要中的应用[J].中文信息学报,2005,19(2):93-99. 被引量:34
  • 3尹存燕,戴新宇,陈家骏.Internet上文本的自动摘要技术[J].计算机工程,2006,32(3):88-90. 被引量:13
  • 4张云涛,龚玲,王永成.基于综合方法的文本主题句的自动抽取[J].上海交通大学学报,2006,40(5):771-774. 被引量:16
  • 5Salton G. Automatic Information Organization and Retrieval [M]. New York: McGraw Hill Text, 1968.
  • 6Robertson S E,van Rijsbergen C J,Porter M F. Probabilistic models of indexing and searching: Research and Development in Information Retrieval, Cambridge, 1980 [ C ]. Cambridge University Press.
  • 7Turtle H, Croft W B. Inference networks for document retrieval: Research and Development in Information Retrieval-SIGIR, 1989 [ C ]. ACM Press.
  • 8Lafferty J, Zhai C. Document language models, query models, and risk minimization for information retrieval: Research and Development in Information Retrieval- SIGIR, 2001 [ C]. ACM Press.
  • 9Ponte J M, Croft W B. A language modeling approach to information retrieval: Research and development in information retrieval-SIGIR, 1998 [ C]. ACM Press.
  • 10Minsky M L. Semantic information processing[ M ]. Massachusetts: The MIT Press, 1969.

共引文献104

同被引文献84

引证文献12

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部