摘要
基于注意力机制的编解码模型在文本摘要、机器翻译等序列到序列任务上得到了广泛的应用。在深度学习框架中,深层神经网络能够提取输入数据不同的特征表示,因此传统编解码模型中通常堆叠多层解码器来提高模型性能。然而现有的模型在解码时仅利用编码器最后一层信息,而忽略编码器其余层的特征。鉴于此,提出一种基于多层循环神经网络和层级交互注意力机制的摘要生成模型,通过层级交互注意力提取编码器不同层次的特征信息来指导摘要的生成。为了处理因引入不同层次特征而带来的信息冗余问题,引入变分信息瓶颈压缩数据噪声。最后在Gigaword和DUC2004摘要数据集上进行实验,结果表明所提方法能够获得最佳性能。
Attention-based encoding and decoding models have been widely used in text abstracts, machine translation and other sequence-to-sequence tasks. In deep learning framework, multi-layer neural network can obtain different feature representations of input data. Therefore, in conventional encoding and decoding model, the performance of the model is usually improved by stacking multi-layer decoders. However, the existing models only pay attention to the output of the last layer of the encoder when decoding, and ignore the information of other layers. In view of this,this paper proposes a novel abstractive text summarization model based on recurrent neural network and multi-layer interactive attention mechanism. The multi-layer interactive attention mechanism is introduced to extract contextual information from different levels of the encoder to guide the generation of abstracts. In order to deal with the problem of information redundancy caused by introducing different levels of context, the variational information bottleneck is adopted to compress data noise. Finally, this paper conducts experiments on Gigaword and DUC2004 datasets, and the results show that the proposed method achieves state of the art performance.
作者
黄于欣
余正涛
相艳
高盛祥
郭军军
HUANG Yuxin;YU Zhengtao;XIANG Yan;GAO Shengxiang;GUO Junjun(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China)
出处
《计算机科学与探索》
CSCD
北大核心
2020年第10期1681-1692,共12页
Journal of Frontiers of Computer Science and Technology
基金
国家重点研发计划Nos.2018YFC0830105,2018YFC0830101,2018YFC0830100
国家自然科学基金Nos.61972186,61762056,61472168。
关键词
文本摘要
编解码模型
层级交互注意力机制
变分信息瓶颈
text summarization
encoding and decoding model
multi-layer interactive attention
variational information bottleneck