摘要
多机器翻译系统融合技术能够对不同机器翻译系统的输出结果有效地进行融合产生更好的翻译性能,因此该技术成为机器翻译研究领域的一个热点问题。常用的多机器翻译系统融合技术可以分为句子级、短语级和词级融合。在对不同级别的系统融合技术进行分析的基础上,本文提出基于词和短语的多机器翻译系统融合方法。首先,采用词级的系统融合技术构建混淆网络,将混淆网络转化为短语表。然后,基于该短语表利用短语级的系统融合技术中的再解码方法进行混淆网络解码生成融合结果。该方法既保证了融合系统所构建的混淆网络的最大可能性,又可以利用更多的特征进行混淆网络解码。我们将基于词和短语的多机器翻译系统融合方法在两个测试集上分别实验并进行比较,获得了较为满意的翻译效果。
Multi-system combination has been a hot topic in machine translation research in recent years, which combines the outputs of different machine translation systems to get a better translation performance. The popular methods of system combination have three levels : sentence level, phrase level and word level. After the analysis of different methods of system combination, we give a new approach of system combination based on words and phrases. First, we construct confusion network using the method of word level system combination. Then we transform the confusion network into a phrase table. Based on the phrase table we implement confusion network decoding by a re-decoding method in phrase level system combination to get the final translation. In this way we can not only guarantee the greatest possibility of confusion network but also use more features to implement confusion network decoding. We test the method on two test sets and get a satisfied translation performance.
出处
《情报学报》
CSSCI
北大核心
2011年第12期1268-1273,共6页
Journal of the China Society for Scientific and Technical Information
基金
本文受中国科学技术信息研究所学科建设“自然语言处理”项目(XK2010-6)和中国科学技术信息研究所科研项目预研资金(YY-2010020)支持.
关键词
机器翻译
系统融合
混淆网络
machine translation, system combination, confusion network