摘要
技术旨在凝练给定文本,以篇幅较短的摘要有效反映出原文核心内容。现阶段,生成型文本摘要技术因能够以更加灵活丰富的词汇对原文进行转述,已成为文本摘要领域的研究热点。然而,现有生成型文本摘要模型在产生摘要语句时涉及对原有词汇的重组与新词的添加,易造成摘要语句不连贯、可读性低。此外,通过传统基于已标注数据的有监督训练提升摘要语句连贯性,需投入较高的数据成本,致使实际应用受限。为此,提出了一种面向连贯性强化的无真值依赖文本摘要(生成)模型(ATS;G)。该模型在仅给定原文本的限制条件下,一方面,基于原文本的编码结果,产生语句抽取标识,刻画对原文关键信息的筛选过程,由解码器对筛选后的语句编码进行解码;另一方面,基于解码器输出的原始词汇分布,分别按"概率选择"与按"Softmax-贪婪选择"产生两类摘要文本。综合语句连贯性与语句内容两方面,构建两类摘要文本的总体收益后,利用自评判策略梯度,引导模型学习关键语句筛选以及对所筛选关键语句进行解码,生成语句连贯性高、内容质量好的摘要文本。实验表明,即便不给定任何事先标注的摘要真值,所提出模型的摘要内容指标总体上仍优于现有文本摘要方法;与此同时,ATS;G生成的摘要文本在语句连贯性、内容重要性、信息冗余性、词汇新颖度和摘要困惑度方面亦优于现有方法。
Automatic text summarization aims to compress a given document,which can efficiently reflect the main idea of the source document with a short summary.At present,abstractive summarization method has become a research hotspot in the field of text summarization because it can paraphrase the source document with flexible and abundant vocabulary.However,existing abstractive summarization model reorganizes original words and adds new words when generating summary.That’s why it can easily cause the inconsistency and low readability.In addition,the traditional supervised learning based on labeled data requires high cost to improve the coherence of summary sentences,which limits the practical application.Therefore,this paper proposes an abstractive text summarization model with coherence reinforcement and no ground truth dependency(ATS_CG).On the one hand,based on the embdding of the source document,the model generates extractive label to describe the filtering process of the key information.And then,the filtered sentence embeddings are decoded by the decoder.On the other hand,based on the original word probability distribution output by the decoder,two types of summarization are generated according to“probability selection”and“Softmax-greedy selection”.And then,the model will compute the overall rewards of the two types of summarization from the two aspects of coherence and content.Next,the model will learn to filter key sentences and decode them through the self-critical policy gradient,so as to generate abstractive summarizaion with high coherence and quality.Experiments show that ATS_CG is superior to the existing text summarization methods in terms of evaluation scores on the whole,even without any ground truth.At the same time,abstractive summarization generated by ATS_CG is also better than the existing methods in coherence,relevance,redundancy,novelty and perplexity.
作者
陈共驰
荣欢
马廷淮
CHEN Gongchi;RONG Huan;MA Tinghuai(School of Artificial Intelligence(School of Future Technology),Nanjing University of Infomation Science&Technology,Nanjing 210044,China;School of Computer Science,Nanjing University of Information Science&Technology,Nanjing 210044,China)
出处
《计算机科学与探索》
CSCD
北大核心
2022年第3期621-636,共16页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金(62102187)
江苏省自然科学基金(基础研究计划)(BK20210639)
2021年江苏省省级大学生创新创业训练计划项目(202110300093Y)
国家重点研发计划(2021YFE0104400)。
关键词
自动文本摘要
自然语言处理
强化学习
信息检索与集成
automatic text summarization
natural language processing
reinforcement learning
information retrieval and integration