摘要
长文本摘要生成一直是自动摘要领域的难题。现有方法在处理长文本的过程中,存在准确率低、冗余等问题。鉴于主题模型在多文档摘要中的突出表现,将其引入到长文本摘要任务中。另外,目前单一的抽取式或生成式方法都无法应对长文本的复杂情况。结合两种摘要方法,提出了一种针对长文本的基于主题感知的抽取式与生成式结合的混合摘要模型。并在TTNews和CNN/Daily Mail数据集上验证了模型的有效性,该模型生成摘要ROUGE分数与同类型模型相比提升了1~2个百分点,生成了可读性更高的摘要。
Summarization generation of long text is always a difficult problem in the field of automatic summarization.The existing methods have some problems such as low accuracy and redundancy in the process of processing long text.In view of the outstanding performance of the topic model in multi-document summarization,it is introduced into the long text summarization task.In addition,the current single extractive or abstractive method can not deal with the complex situ-ation of long text.It proposes a hybrid summarization model for long text based on topic aware,which combines extrac-tive and abstractive methods.The validity of the model is verified on TTNews and CNN/Daily Mail datasets.The ROUGE score of the model is 1 to 2 percentage points higher than that of the same type of model,resulting in a more readable summary.
作者
杨涛
解庆
刘永坚
刘平峰
YANG Tao;XIE Qing;LIU Yongjian;LIU Pingfeng(School of Computer Science and Technology,Wuhan University of Technology,Wuhan 430070,China;School of Economics,Wuhan University of Technology,Wuhan 430070,China)
出处
《计算机工程与应用》
CSCD
北大核心
2022年第20期165-173,共9页
Computer Engineering and Applications
基金
湖北省自然科学基金(2018CFB564)
中央高校基本科研业务经费(WUT:2020III008GX)。
关键词
主题模型
长文本摘要
混合模型
指针网络
topic model
long text summarization
hybrid model
pointer network