摘要
近年来随着大型文档-摘要语料库的公开和深度学习技术的兴起,基于Seq2Seq和注意力模型的文本摘要算法取得重大成效,然而生成的摘要在准确性方面仍存在不少问题。提出一种融合信息选择和语义关联的文本摘要模型,旨在综合改善生成摘要中存在的未登入词、句子重复、信息冗余以及生成摘要的语义与原文的语义存在偏差甚至大相径庭等问题。模型设计了一种选择网络对编码器的输出进行筛选,保留关键内容同时过滤掉无效的信息,提供给解码器高质量的编码结果,帮助减少生成摘要的冗余信息;通过将拷贝机制、覆盖度机制与语义相关性相融合,解决未登入词问题同时,减少重复信息的生成并提高摘要与原文的语义关联,提高摘要质量。在CNN/Daily Mail数据集上的实验结果表明,提出的模型在该数据集上能有效提高摘要ROUGE值,并且能更好地全面地概括文章内容。
In recent years,with the publication of the large document-summarization corpus and the development of deep learning technology,the text summarization algorithms based on the Seq2Seq and attention model have made significant achievements.However,there are still some problems about generating summarization on the accuracy.In this paper,a model which fuses information selection and semantic relevance is proposed.The model aims at synthetically improving some shortcomings about generating summarization,such as out-of-vocabulary,repetition,redundancy and the differences between the generated and the original in semantics.A selection network which filters the output of the encoder is designed.At the same time,the selection network retains key contents and filters invalid information out.In this way,the selection network provides a decoder with high-grade coding results to help reduce redundancy in generating summarization.It solves the problem of out-of-vocabulary and reduces repetitions and improves the semantic relevance between the original and the summary through combining copy mechanism,coverage mechanism with the semantic relevance.The experimental results on CNN/Daily Mail dataset shows that the proposed model can effectively improve the ROUGE and can be better to comprehensively summarize the original.
作者
陈立群
郭文忠
郭昆
张祖文
CHEN Liqun;GUO Wenzhong;GUO Kun;ZHANG Zuwen(College of Mathematics and Computer Sciences,Fuzhou University,Fuzhou 350116;Fujian Provincial Key Laboratory of Network Computing and Intelligent Information Processing,Fuzhou 350116;Key Laboratory of Spatial Data Mining&Information Sharing,Ministry of Education,Fuzhou 350116)
出处
《计算机与数字工程》
2020年第4期778-785,共8页
Computer & Digital Engineering
基金
国家自然科学基金项目(编号:61300104,61300103,61672158)
福建省高校杰出青年科学基金项目(编号:JA12016)
福建省高等学校新世纪优秀人才支持计划项目(编号:JA13021)
福建省杰出青年科学基金(编号:2014J06017,2015J06014)
福建省科技创新平台计划项目(编号:2009J1007,2014H2005)
福建省自然科学基金项目(编号:2013J01230,2014J01232)
福建省高校产学合作项目(编号:2014H6014,2017H6008)
海西政务大数据应用协同创新中心资助