摘要
针对越南语语言特性,提出在词汇化调序模型中融合语言差异特性的汉语-越南语的统计机器翻译方法。该方法首先分析汉语与越南语语法不同,提取越南语在定语位置、状语位置及修饰词词语顺序上与汉语的差异,然后形式化定义这些差异规则,以对数线性模型的形式融入进词汇化调序模型中。在训练过程,通过融合语言差异特性的词汇化调序模型对符合特性的规则进行权重调优,从而在解码过程中指导候选翻译的选择。实验结果表明,在词汇化模型里融合语言特性的汉语-越南语的层次短语机器翻译模型比基准系统提高了0.6~2.1个BLUE值。
According to the language characteristics of Vietnamese,this paper proposed a new lexicalized reordering modelwhere language features were integrated for Chinese-Vietnamese statistical machine translation. Firstly,the grammar differences be-tween Chinese and Vietnamese were analyzed,and the sequence differences in attribute,adverbial modifier and adjuncts were ex-tracted. Secondly,the extracted difference rulers were formally defined and be integrated in the lexicalized reordering model via thelog-linear model. In the training processing,the proposed model would optimize the weight for these rules that conform to the lin-guistic features Finally,it would guide the translation selection in the decoding. The experiment had verified that our reorderingmodel achieved a 0.6-2.1 BLEU point improvements for Chinese-to-Vietnamese translation over a baseline hierarchicalphrase-based system.
出处
《计算机与数字工程》
2017年第12期2389-2392,2427,共5页
Computer & Digital Engineering
关键词
统计机器翻译
词汇化调序模型
汉语
越南语
语言特性
statistical machine translation
lexicalized reordering model
chinese
vietnamese
language features