摘要
形态丰富语言由于其复杂的形态变化,会导致大词汇量和数据稀疏问题,这给统计机器翻译带来了巨大挑战。该文通过将这类语言表示为不同的粒度,然后分别进行翻译;由于不同的粒度能表征语言不同层面的特点,通过对不同粒度的翻译结果进行词级系统融合,便可生成更好的译文。维吾尔语、蒙古语到汉语的两组翻译实验表明,这种多粒度系统融合方法改善了翻译效果,BLEU值比最好的单系统分别提高了+1.41%和+2.03%。
Morphologically rich language,characterized by complex morphological changes,has huge vocabulary and serious data sparseness issue,which has brought a great challenge to machine translation.In this paper,we first analyze such language and use different granularities to represent and then translate them respectively.As different granularities can catch features of such language in different levels,we integrate the translation hypotheses from different granularities by the system combination approach to generate better results.Experimental results on Uyghur-Chinese and Mongolian-Chinese translation tasks show that system combination with multiple granularities improved the performance of translation,and gained +1.41% and +2.03% compared to the best single system measured by BLEU.
出处
《中文信息学报》
CSCD
北大核心
2011年第4期75-81,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金重点资助项目(60736014),国家自然科学基金资助项目(60873167)
关键词
形态丰富语言
多粒度
系统融合
morphologically rich language
multiple granularities
system combination