摘要
针对汉语—维吾尔语的统计机器翻译系统中存在的语义无关性问题,提出基于神经网络机器翻译方法的双语关联度优化模型。该模型利用注意力机制捕获词对齐信息,引入双语短语间的语义相关性和内部词汇匹配度,预测双语短语的生成概率并将其作为双语关联度,以优化统计翻译模型中的短语翻译得分。在第十一届全国机器翻译研讨会(CWMT 2015)汉维公开机器翻译数据集上的实验结果表明,与基线系统相比,在使用较小规模的训练数据和词汇表的条件下,所提方法可以同时有效地提高短语级别和句子级别的机器翻译任务性能,分别获得最高2.49和0.59的BLEU值提升。
Focused on the issue of semantic independence in Chinese-Uyghur statistical machine translation system,this paper proposed a bilingual relatedness optimization model based on neural machine translation method.The model utilized the attention mechanism to capture word alignment information as well as introduced bilingual phrase semantic relevance and inner word correlation to predict the conditional probability of bilingual phrase pair.And it took the probability as bilingual relatedness to optimize the phrase translation scores in statistical translation model.Experimental results on the 11th China Workshop on Machine Translation(CWMT 2015)Chinese-Uyghur public machine translation datasets show that the proposed approach can achieve obvious improvements both in the phrase-level and the sentence-level machine translation tasks,which outperforms the baseline system with a relative small-scale training data and vocabulary.The highest BLEU point of the proposed algorithm gains 2.49 and 0.59 respectively.
作者
潘一荣
李晓
杨雅婷
董瑞
Pan Yirong;Li Xiao;Yang Yating;Dong Rui(Xinjiang Technical Institute of Physics&Chemistry,Chinese Academy of Sciences,Urumqi 830011,China;University of Chinese Academy of Sciences,Beijing 100049,China;Xinjiang Laboratory of Minority Speech&Language Information Processing,Urumqi 830011,China)
出处
《计算机应用研究》
CSCD
北大核心
2020年第3期726-730,共5页
Application Research of Computers
基金
国家自然科学基金资助项目(U1703133)
中科院西部之光项目(2017-XBQNXZ-A-005)
中国科学院青年创新促进会的资助项目(2017472)
新疆维吾尔自治区重大科技专项项目(2016A03007-3)
新疆维吾尔自治区高层次人才引进工程项目(Y839031201)。
关键词
维吾尔语
神经网络机器翻译
注意力机制
词对齐
生成概率
Uyghur
neural network machine translation
attention mechanism
word alignment
conditional probability