摘要
无监督神经机器翻译仅利用大量单语数据,无需平行数据就可以训练模型,但是很难在2种语系遥远的语言间建立联系。针对此问题,提出一种新的不使用平行句对的神经机器翻译训练方法,使用一个双语词典对单语数据进行替换,在2种语言之间建立联系,同时使用词嵌入融合初始化和双编码器融合训练2种方法强化2种语言在同一语义空间的对齐效果,以提高机器翻译系统的性能。实验表明,所提方法在中-英与英-中实验中比基线无监督翻译系统的BLEU值分别提高2.39和1.29,在英-俄和英-阿等单语实验中机器翻译效果也显著提高了。
Unsupervised neural machine translation can train models using only a large amount of monolingual data without the need of parallel data,but it is difficult to establish the connection between two linguistically distant languages.To address this problem,this paper proposes a new neural machine translation training method without parallel sentence pairs.A bilingual dictionary is used to replace words in monolingual data,so as to establish the connection between the two languages.Meanwhile,word embedding fusion initialization and dual-encoder fusion training are used to enhance the alignment of the two languages in the same semantic space,in order to improve the performance of the machine translation system.Experiments show that,compared with other unsupervised models,our method can improve the BLEU values by 2.39 and 1.29 over the baseline system on the Chinese-English and English-Chinese translation tasks,and also achieve good results on the English-Russian and English-Arabic translation tasks with monolingual data.
作者
王煦
贾浩
季佰军
段湘煜
WANG Xu;JIA Hao;JI Bai-jun;DUAN Xiang-yu(Natural Language Processing Laboratory,Soochow University,Suzhou 215006,China)
出处
《计算机工程与科学》
CSCD
北大核心
2022年第8期1481-1487,共7页
Computer Engineering & Science
基金
国家自然科学基金(61673289)。
关键词
神经网络
神经机器翻译
词典
无监督
neural network
neural machine translation
dictionary
unsupervised