摘要
在互联网数据急剧扩张和深度学习技术高速发展的背景下,自动文本摘要任务作为自然语言处理领域的主要研究方向之一,其相关技术及应用被广泛研究。基于摘要任务深化研究需求,以研究过程中存在的关键问题为导向,介绍现有基于深度学习的生成式文本摘要模型,简述定义及来源、数据预处理及基本框架、常用数据集及评价标准等,指出发展优势和关键问题,并针对关键问题阐述对应的可行性解决方案。对比常用的深度预训练模型和创新方法融合模型,分析各模型的创新性和局限性,提出对部分局限性问题的解决思路。进一步地,对该技术领域的未来发展方向进行展望总结。
Boosted by the rapid expansion of Internet data and the development of deep learning technologies,automatic text summarization is now one of the main research directions in the field of natural language processing.Its related technologies and applications have been widely studied.To assist further studies required by summarization tasks,and to help solve the key problems in the earlier studies,this paper introduces the existing abstractive text summarization models based on deep learning by briefly describing their definition and source,data preprocessing and basic framework,common data sets,and evaluation standards.Additionally,the paper gives the development advantages and key problems of the models,and elaborates on the corresponding feasible solutions.Then the paper compares the commonly used deep pre-trained models and innovative methods,analyzes the innovations and limits of each model,and gives corresponding solutions.Finally,the paper discusses the future development directions in this field.
作者
朱永清
赵鹏
赵菲菲
慕晓冬
白坤
尤轩昂
ZHU Yongqing;ZHAO Peng;ZHAO Feifei;MU Xiaodong;BAI Kun;YOU Xuanang(College of Operational Support,Rocket Force University of Engineering,Xi’an 710025,China;Army Academy of Border and Coastal Defence,Xi’an 710025,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2021年第11期11-21,28,共12页
Computer Engineering
基金
国家部委基金。
关键词
深度学习
生成式文本摘要
未登录词
生成重复
长程依赖
评价标准
deep learning
abstractive text summarization
Out of Vocabulary(OOV)
generative repetition
long-term dependence
evaluation criteria