摘要
该文对基于短语的统计机器翻译模型的删词问题进行研究与分析,使用人工评价的方式将删词错误分为3类。该文通过两种方法,即基于频次的方法和基于词性标注的方法,对源语言句子中关键词汇进行识别。通过对传统的短语对抽取算法中引入源语言对空关键词汇的约束来缓解删词错误问题。自动评价方法以及人工评价方法证明,该方法在汉英翻译任务以及英汉翻译任务中显著的缓解了删词错误问题,同时得到一个精简的短语翻译表。
This paper addresses the word deletion issue in phrase-based machine translation. After accounting word deletion errors for three causes from the persective of human reading, we propose to introduce constraints on unaligned words of source language in phrase extraction to deal with this issue. Two methods are presented for the design of the constraints, including a frequency-based method and a part-of-speech-based method. Automatic and human evaluations demonstrate promising improvements in translation quality on both the Chinese-to-English and the English-to-Chinese translation tasks, on the basis of a more compact phrase tables.
出处
《中文信息学报》
CSCD
北大核心
2014年第5期125-132,共8页
Journal of Chinese Information Processing
基金
国家自然科学基金(61272376
61300097)
中国博士后基金(2013M530131)
关键词
统计机器翻译
删词问题
人工评价
statistical machine translation
word deletion issue
human evaluation