摘要
自动文摘是文本挖掘的主要任务之一。相比于抽取式自动文摘,生成式自动文摘在思想上更接近人工摘要的过程,具有重要研究意义。近几年伴随着深度学习方法的发展,基于深层神经网络模型的生成式自动文摘也有了令人瞩目的发展。为了更全面地理解该类方法的思想和研究现状,本文从生成式自动文摘的任务描述入手,梳理了基于RNN (recurrent neural network,循环神经网络)的模型、基于CNN (convolutional neural network,卷积神经网络)的模型、基于RNN+CNN的模型、融合注意力机制的模型和融合强化学习的模型共五大类生成式自动文摘的深度学习方法。这类方法表明,在深层神经网络的训练下,特别是融合注意力机制和强化学习后,摘要效果得以明显提升。在生成式自动文摘研究的未来发展中,除深度学习方法本身的不断应用和改进外,还需关注如何有效实现篇章级语义理解下的摘要、面向不同文本对象特点的摘要和摘要结果自动评价等问题。此外,如何结合传统摘要研究中的成熟方法进一步提高摘要效果,也是一个很有价值的研究方向。
ive text summarization(ATS) is a main topic of research in text mining. Compared with extractive text summarization, which extracts shallow meaning from a text, ATS more closely resembles the process of human summarization, giving it important research significance. With the development of deep learning methods and deep neural networks in recent years, remarkable progress in ATS has been made. To gain a more comprehensive understanding of the theory and state of research on ATS, this paper describes the ATS task and combines five deep learning methods to support it, namely recurrent neural networks(RNN), convolutional neural networks(CNN), RNN+CNN, attentional models, and reinforced models. These results show that ATS performance can be improved significantly through deep neural network training, especially after joining attention mechanisms and reinforcement learning. In future development of ATS, in addition to continued application and improvement of deep learning methods themselves, researchers must consider to how to effectively implement ATS with text-level semantic comprehension, ATS of more text categories, and ATS evaluation. Integration of mature traditional research methods to further improve ATS performance is also a valuable direction for future research.
作者
赵洪
Zhao Hong(Department of Information Resources Management,Business School,Nankai University,Tianjin 300071;CETC Big Data Research Institute Co.Ltd.,Guiyang 550081)
出处
《情报学报》
CSSCI
CSCD
北大核心
2020年第3期330-344,共15页
Journal of the China Society for Scientific and Technical Information
基金
提升政府治理能力大数据应用技术国家工程实验室2017-2018年度开放基金重点支持项目“基于NLP和深度学习的大规模政府公文智能处理技术研究”
国家社会科学基金重大项目“我国网络社会治理研究”(14ZDA063)。
关键词
生成式自动文摘
深度学习
循环神经网络
卷积神经网络
注意力机制
强化学习
abstractive text summarization
deep learning
recurrent neural network
convolutional neural network
attention mechanism
reinforcement learning