摘要
目前汉藏机器翻译的研究主要集中在基于规则的方法上,主要原因在于汉藏的平行语料等基础资源相对匮乏,不方便做大规模的基于统计的汉藏机器翻译实验。该文依据汉藏辅助翻译项目的实际需求,在平行语料资源较少的情况下,提出了一种基于短语串实例的机器翻译方法,为辅助翻译提供候选译文。该方法主要利用词语对齐信息来充分挖掘现有平行语料资源信息。实验结果表明,该文提出的基于短语串实例方法优于传统基于句子实例的翻译,能够检索出任意长度的短语串翻译实例。在实验测试集上,该方法与默认参数下的Moses相比,翻译的BULE值接近Moses,短语翻译实例串的召回率提高了约9.71%。在平均句长为20个词的测试语料上,翻译速度达到平均每句0.175s,满足辅助翻译实时性的要求。
At present, the research on Chinese-Tibetan machine translation is focused on rule-based methods. Due to the lack of parallel corpus and other resources between Chinese and Tibetan, it is almost impossible to carry statisti- cal experiments on Chinese-Tibetan machine translation. According to the actual needs of the Chinese-Tibetan Com- puter Aided Translation, this paper proposes an example phrase based machine translation method. It can fully take advantage of the existing parallel corpus resources using the word-align information to improve the translation quali- ty. Allowing the retrieval of arbitrarily long phrase examples, this approach is proved for a better performance than the example based method on sentence level. On the test data, the method achieves a comparable performance with Moses. The recall of translation phrase makes an improvement of 9.71% over Moses. The translation speed is a- bout 0. 175s per sentence, which meets the requirement of the computer aided translation system.
出处
《中文信息学报》
CSCD
北大核心
2013年第3期84-90,共7页
Journal of Chinese Information Processing
基金
中国科学院西部行动计划高新技术项目(KGCX2-YW-512)
国家重大科技专项资助项目(2010ZX01036-001-002
2010ZX01037-001-002)
关键词
机器翻译
辅助翻译
基于短语的机器翻译
基于实例的机器翻译
machine translation
computer aided translation
phrase-based translation
example-based translation