摘要
TextRank算法在自动提取中文文本摘要时只考虑句子间的相似性,而忽略了词语间的语义相关信息及文本的重要全局信息。对此,提出一种基于改进TextRank的文本摘要自动提取算法(SW-TextRank)。通过Word2Vec训练的词向量来计算句子之间的相似度,并综合考虑句子位置、句子与标题的相似度、关键词的覆盖率、关键句子以及线索词等影响句子权重的因素,从而优化句子权重;对得到的候选摘要句群进行冗余处理,选取适量排序靠前的句子并根据其在原文中的顺序重新排列得到最终文本的摘要。实验结果表明,SW-TextRank算法生成摘要的准确性比TextRank算法更高,摘要生成质量更好。
The TextRank algorithm only considers the similarity between sentences when extracting Chinese text summarization automatically,but ignores the semantic information between words and the important global information of text.In view of this,we propose a text summarization automatic extraction algorithm(SW-TextRank)based on improved TextRank.The similarity between sentences was calculated based on word vectors trained by Word2Vec and the factors that affect sentence weight,such as position of sentences,similarity between sentences and title,coverage of keywords,key sentences and clue words,were taken into account to optimize the sentence weight.The candidate summary sentence group was redundantly processed,and the top-ranking sentences were selected and rearranged according to their order in the text to get the final summary.The experimental results show that the accuracy of the summarization generated by SW-TextRank algorithm is higher than that of the TextRank algorithm,and the quality of summarization is better.
作者
汪旭祥
韩斌
高瑞
陈鹏
Wang Xuxiang;Han Bin;Gao Rui;Chen Peng(School of Computers,Jiangsu University of Science and Technology,Zhenjiang 212003,Jiangsu,China)
出处
《计算机应用与软件》
北大核心
2021年第6期155-160,共6页
Computer Applications and Software