摘要
目前的神经机器翻译方法以句子为单位作为输入,在翻译过程中不能有效利用篇章级上下文的信息,影响了机器翻译的性能.为解决现有机器翻译框架下的上下文信息缺失问题,提出一种融合主题信息的篇章级神经机器翻译方法.首先,将源语言当前句子与源语言的前一句分别独立输入到源语言句子编码器和上下文编码器中;然后,采用注意力机制将2个编码器的输出映射为最终的上下文表,结合源语言句子编码器输出通过门控机制得到具有上下文信息和当前句子融合表征,同时将词嵌入后的源语言句子输入基于Bi-GRU和卷积神经网络的主题表征编码器映射为主题表征;最后,将融合后的句子表征以及主题表征分别通过2个串联的注意力机制参与解码.实验结果表明,该方法能够提高篇章级神经机器翻译的性能,相较于基准系统,该方法在BLEU值上最高提升了0.55个百分点.
At present,Neural Machine Translation(NMT)methods take sentences as the unit to input,the context information cannot be effectively utilized in the translation process,which affects the performance of machine translation.In order to solve this problem,this study proposes a document-level neural machine translation method that integrates topic information.In this method,firstly,it takes the source and the context sentence into the source encoder and the context encoder independently,and then uses the attention mechanism to map the outputs of the two encoders into the context representation.The context representation is combined with the source encoder output to obtain a fusion representation through a gating mechanism.At the same time,the source sentence after word embedding is mapped to the topic representation through the topic encoder based on Bi-GRU and Convolutional Neural Networks.Finally,the fusion representation and topic representation are feed into decoder through two serial attention mechanisms,respectively.Experiments show that this method can improve the performance of document-level neural machine translation,and this method achieved by up to 0.55 percentage points in BLEU compared to baseline system.
作者
陈玺文
余正涛
高盛祥
王振晗
CHEN Xi-wen;YU Zheng-tao;GAO Sheng-xiang;WANG Zhen-han(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,Yunnan,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,Yunnan,China)
出处
《云南大学学报(自然科学版)》
CAS
CSCD
北大核心
2023年第6期1197-1207,共11页
Journal of Yunnan University(Natural Sciences Edition)
基金
国家自然科学基金(61972186)
云南省重大科技专项(202103AA080015,202203AA080004).
关键词
篇章翻译
神经机器翻译
主题模型
双编码器
句子表征
document translation
Neural Machine Translation(NMT)
topic model
double encoder
sentence representation