摘要
对于节录式自动摘要需要从文档中提取一定数量的重要句子,以生成涵盖原文主旨的短文的问题,提出一种基于词句协同排序的单文档自动摘要算法,将词句关系融入以图排序为基础的句子权重计算过程中。首先给出了算法中词句协同计算的框架;然后转化为简洁的矩阵表示形式,并从理论上证明了收敛性;最后进一步通过去冗余方法提高自动摘要的质量。真实数据集上的实验表明,基于词句协同排序的自动摘要算法较经典的TextRank算法在Rouge指标上提升13%~30%,能够有效提高摘要的生成质量。
Focusing on the issue that extractive summarization needs to automatically produce a short summary of a document by concatenating several sentences taken exactly from the original material. A single document automatic summarization algorithm based on word-sentence co-ranking was proposed, named WSRank for short, which integrated the word-sentence relationship into the graph-based sentences ranking model. The framework of co-ranking in WSRank was given, and then was converted to a quite concise form in the view of matrix operations, and its convergence was theoretically proved. Moreover, a redundancy elimination technique was presented as a supplement to WSRank, so that the quality of automatic summarization could be further enhanced. The experimental results on real datasets show that WSRank improves the performance of summarization by 13% to 30% in multiple Rouge metrics, which demonstrates the effectiveness of the proposed method.
出处
《计算机应用》
CSCD
北大核心
2017年第7期2100-2105,共6页
journal of Computer Applications
基金
国家自然科学基金资助项目(71571093
71372188)
国家电子商务信息处理国际联合研究中心项目(2013B01035)
江苏省高校自然科学基金资助项目(15KJB520012)
南京财经大学校预研究资助项目(YYJ201415)~~
关键词
自动摘要
节录式摘要
单文档
图排序
词句协同
automatic summarization
extractive summary
single document
graph-based ranking
word-sentence collaboration