期刊文献+

基于枢轴语言的平行语料构建方法 被引量:2

Approach of Constructing Parallel Corpus Based on Pivot Language
下载PDF
导出
摘要 平行语料库的规模对于统计机器翻译性能的提高具有重要作用,但是平行语料库的人工构建成本很高。针对这个问题,本文提出了一种低成本高效率的平行语料构建方法,利用枢轴语言作为桥梁,借助已有的机器翻译技术并融合主动学习方法构建目标语言对的大规模高质量平行语料库。本文通过以英语作为枢轴语言构建日汉平行语料库的实例研究,利用成熟的基于短语的统计机器翻译技术,描述了基于译文自动评测的良好译文选择方法、基于主动学习的语料选取方法、以及翻译系统的更新迭代和评价实验。实验结果表明,本文提出的方法能够快速构建日汉平行语料,并有效提高日汉翻译系统的性能。 A large scale parallel corpus plays an important role in improving the performance of machine translation. It spent highly for manually constructing a parallel corpus. This paper proposed a pivot based approach for constructing high quality parallel corpus with low cost, in which the existing machine translation technology and active learning method are combined. This paper describes the domain adaptation method based on active learning, the good translation selection method based on automatic translation evaluation, and iterative retraining of translation system. We applied the approach to the construction of Japanese-Chinese parallel corpus by taking English as pivot and conducted evaluation experiments. The experimental results showed that the proposed approach effectively obtained Japanese-Chinese parallel corpus with high quality and the constructed parallel corpus indeed improved the performance of Japanese-Chinese machine translation system.
作者 单华 张玉洁 周雯 徐金安 陈钰枫 SHAN Hua ZHANG YuJie ZHOU Wen XU JinAn CHEN YuFeng(The School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China)
出处 《情报工程》 2017年第3期29-39,共11页 Technology Intelligence Engineering
基金 国家自然科学基金(61370130 61473294)的资助
关键词 枢轴语言 机器翻译 平行语料 主动学习 Pivot language, machine translation, parallel corpus, active learning
  • 相关文献

同被引文献4

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部