融合主题信息的篇章级神经机器翻译

Document-level neural machine translation based on topic information

下载PDF

导出

摘要目前的神经机器翻译方法以句子为单位作为输入,在翻译过程中不能有效利用篇章级上下文的信息,影响了机器翻译的性能.为解决现有机器翻译框架下的上下文信息缺失问题,提出一种融合主题信息的篇章级神经机器翻译方法.首先,将源语言当前句子与源语言的前一句分别独立输入到源语言句子编码器和上下文编码器中;然后,采用注意力机制将2个编码器的输出映射为最终的上下文表,结合源语言句子编码器输出通过门控机制得到具有上下文信息和当前句子融合表征,同时将词嵌入后的源语言句子输入基于Bi-GRU和卷积神经网络的主题表征编码器映射为主题表征;最后,将融合后的句子表征以及主题表征分别通过2个串联的注意力机制参与解码.实验结果表明,该方法能够提高篇章级神经机器翻译的性能,相较于基准系统,该方法在BLEU值上最高提升了0.55个百分点. At present,Neural Machine Translation(NMT)methods take sentences as the unit to input,the context information cannot be effectively utilized in the translation process,which affects the performance of machine translation.In order to solve this problem,this study proposes a document-level neural machine translation method that integrates topic information.In this method,firstly,it takes the source and the context sentence into the source encoder and the context encoder independently,and then uses the attention mechanism to map the outputs of the two encoders into the context representation.The context representation is combined with the source encoder output to obtain a fusion representation through a gating mechanism.At the same time,the source sentence after word embedding is mapped to the topic representation through the topic encoder based on Bi-GRU and Convolutional Neural Networks.Finally,the fusion representation and topic representation are feed into decoder through two serial attention mechanisms,respectively.Experiments show that this method can improve the performance of document-level neural machine translation,and this method achieved by up to 0.55 percentage points in BLEU compared to baseline system.

作者陈玺文余正涛高盛祥王振晗 CHEN Xi-wen;YU Zheng-tao;GAO Sheng-xiang;WANG Zhen-han(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,Yunnan,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,Yunnan,China)

机构地区昆明理工大学信息工程与自动化学院昆明理工大学云南省人工智能重点实验室

出处《云南大学学报（自然科学版）》 CAS CSCD 北大核心 2023年第6期1197-1207,共11页 Journal of Yunnan University(Natural Sciences Edition)

基金国家自然科学基金(61972186) 云南省重大科技专项(202103AA080015,202203AA080004).

关键词篇章翻译神经机器翻译主题模型双编码器句子表征 document translation Neural Machine Translation(NMT) topic model double encoder sentence representation

分类号 TP391 [自动化与计算机技术—计算机应用技术]