摘要
针对当前生成式文本摘要方法存在的语义信息利用不充分、摘要精度不够等问题,提出一种基于双编码器的文本摘要方法。首先,通过双编码器为序列映射(Seq2Seq)架构提供更丰富的语义信息,并对融入双通道语义的注意力机制和伴随经验分布的解码器进行了优化研究;然后,在词嵌入生成技术中融合位置嵌入和词嵌入,并新增词频-逆文档频率(TF-IDF)、词性(POS)、关键性得分(Soc),优化词嵌入维度。所提方法对传统序列映射Seq2Seq和词特征表示进行优化,在增强模型对语义的理解的同时,提高了摘要的质量。实验结果表明,该方法在Rouge评价体系中的表现相比传统伴随自注意力机制的递归神经网络方法(RNN+atten)和多层双向伴随自注意力机制的递归神经网络方法(Bi-MulRNN+atten)提高10~13个百分点,其文本摘要语义理解更加准确、生成效果更好,拥有更好的应用前景。
Aiming at the problems of insufficient use of semantic information and the poor summarization precision in the current generated text summarization method, a text summarization method was proposed based on dual encoder. Firstly, the dual encoder was used to provide richer semantic information for Sequence to Sequence(Seq2 Seq) architecture. And the attention mechanism with dual channel semantics and the decoder with empirical distribution were optimized. Then, position embedding and word embedding were merged in word embedding technology, and Term Frequency-Inverse Document Frequency(TF-IDF), Part Of Speech(POS), key Score(Soc) were added to word embedding, as a result, the word embedding dimension was optimized. The proposed method aims to optimize the traditional sequence mapping of Seq2 Seq and word feature representation, enhance the model’s semantic understanding, and improve the quality of the summarization. The experimental results show that the proposed method has the performance improved in the Rouge evaluation system by 10 to 13 percentage points compared with traditional Recurrent Neural Network method with attention(RNN+atten) and Multi-layer Bidirectional Recurrent Neural Network method with attention(Bi-MulRNN+atten). It can be seen that the proposed method has more accurate semantic understanding of text summarization and the generation effect better, and has a better application prospect.
作者
丁建立
李洋
王家亮
DING Jianli;LI Yang;WANG Jialiang(College of Computer Science and Technology,Civil Aviation University of China,Tianjin 300300,China)
出处
《计算机应用》
CSCD
北大核心
2019年第12期3476-3481,共6页
journal of Computer Applications
基金
民航局科技重大专项基金资助项目(MHRD20150107,MHRD20160109)
中央高校基本科研业务费专项资金资助项目(3122018C025)
中国民航大学科研启动基金资助项目(2014QD13X)~~
关键词
生成式文本摘要
序列映射(Seq2Seq)
双编码器
经验分布
词特征表示
generated text summarization
Sequence to Sequence(Seq2Seq)
double encoder
empirical distribution
word feature representation