统计机器翻译删词问题研究被引量：2

Research on Word Deletion Issue in Statistical Machine Translation

下载PDF

导出

摘要该文对基于短语的统计机器翻译模型的删词问题进行研究与分析,使用人工评价的方式将删词错误分为3类。该文通过两种方法,即基于频次的方法和基于词性标注的方法,对源语言句子中关键词汇进行识别。通过对传统的短语对抽取算法中引入源语言对空关键词汇的约束来缓解删词错误问题。自动评价方法以及人工评价方法证明,该方法在汉英翻译任务以及英汉翻译任务中显著的缓解了删词错误问题,同时得到一个精简的短语翻译表。 This paper addresses the word deletion issue in phrase-based machine translation. After accounting word deletion errors for three causes from the persective of human reading, we propose to introduce constraints on unaligned words of source language in phrase extraction to deal with this issue. Two methods are presented for the design of the constraints, including a frequency-based method and a part-of-speech-based method. Automatic and human evaluations demonstrate promising improvements in translation quality on both the Chinese-to-English and the English-to-Chinese translation tasks, on the basis of a more compact phrase tables.

作者李强何燕龙栾爽肖桐朱靖波

机构地区东北大学信息科学与工程学院中国民族语文翻译中心辽宁大学外国语学院杭州雅拓网络技术有限公司

出处《中文信息学报》 CSCD 北大核心 2014年第5期125-132,共8页 Journal of Chinese Information Processing

基金国家自然科学基金(61272376 61300097) 中国博士后基金(2013M530131)

关键词统计机器翻译删词问题人工评价 statistical machine translation word deletion issue human evaluation

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献17

1Philipp Koehn,Fran J Och,Daniel Marcu.Statistical phrase-based translation[C]//Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1.Association for Computational Linguistics,2003:48-54.
2Franz J Och,Heymann Ney.The alignment template approach to statistical machine translation[J].Computational Linguistics.2004,30(4):417-449.
3Franz J Och,Christoph Tillmann,Heymann Ney.Improved alignment models for statistical machine translation[C]//Proceedings of the 1999 Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics,1999:20-28.
4David Vilar,Jia Xu,Luis Fernando D'Haro,et al.Error analysis of statistical machine translation output[C]//Proceedings of International Conference on Language Resources and Evaluation.2006:697-702.
5Chi-Ho Li,Dongdong Zhang,Mu Li,et al.An empirical study in source word deletion for phrase-based statistical machine translation[C]//Proceedings of the Third Workshop on Statistical Machine Translation.Association for Computational Linguistics,2008:1-8.
6Tong Xiao,Jingbo Zhu,Hao Zhang,et al.NiuTrans:an open source toolkit for phrase-based and syntaxbased machine translation[C]//Proceedings of the ACL 2012 System Demonstrations.Association for Computational Linguistics,2012:19-24.
7Franz J Och,Hermann Ney.Improved statistical alignment models[C]//Proceedings of the 38th Annual Meeting on Association for Computation Linguistics.Association for Computational Linguistics,2000:440-447.
8Franz J Och.Minimum error rate training in statistical machine translation[C]//Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1.Association for Computational Linguistics,2003:160-167.
9Kishore Papineni,Salim Roukos,Todd Ward,et al.BLEU:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics.Association for Computational Linguistics,2002:311-318.
10Matthew Snover,Bonnie Dorr,Richard Schwartz,et al.A study of translation edit rate with targeted human annotation[C]//Proceedings of the 7th Conference of the Association for Machine Translation in the Americas.2006:223-231.

同被引文献21

1胡清平.机器翻译中的受控语言[J].中国科技翻译,2005,18(3):24-27. 被引量：15
2刘彬.英汉机译中的译后编辑及其实现[J].中国电化教育,2010(7):109-112. 被引量：8
3罗季美,李梅.机器翻译译文错误分析[J].中国翻译,2012,33(5):84-89. 被引量：89
4李梅,朱锡明.译后编辑自动化的英汉机器翻译新探索[J].中国翻译,2013,34(4):83-87. 被引量：55
5李英军.机器翻译与翻译技术研究的现状与展望——伯纳德·马克·沙特尔沃思访谈录[J].中国科技翻译,2014,27(1):24-27. 被引量：27
6杨宪泽,陈毅红.汉藏机器翻译的特点与手写汉字切分分析研究[J].计算机工程与科学,2014,36(8):1595-1598. 被引量：5
7苏晨,张玉洁,郭振,徐金安.使用源语言复述知识改善统计机器翻译性能[J].北京大学学报（自然科学版）,2015,51(2):342-348. 被引量：4
8汪昆,宗成庆,苏克毅.统计机器翻译和翻译记忆的动态融合方法研究[J].中文信息学报,2015,29(2):87-94. 被引量：6
9王克非,秦洪武.论平行语料库在翻译教学中的应用[J].外语教学与研究,2015,47(5):763-772. 被引量：96
10应玉龙,项明.局部相位量化特征的织物瑕疵检测算法[J].西安工程大学学报,2015,29(5):541-545. 被引量：9

引证文献2

1朱丽秋.英汉机器翻译中的短语自动识别算法[J].现代电子技术,2017,40(15):126-128. 被引量：1
2赵会军,林国滨.机器翻译词语漏译的语料库语境策略研究[J].外语教学与研究,2022,54(2):277-287. 被引量：4

二级引证文献5

1陈思宇.人工智能背景下机器翻译在不同文本中的对比分析研究[J].现代英语,2023(19):111-114.
2邢蕾.英汉机器翻译中译文自动生成系统设计[J].现代电子技术,2018,41(24):86-89. 被引量：2
3余金菊,杨佑文.机电类文摘机助翻译的错误剖析[J].湖北工业大学学报,2024,39(3):68-75.
4赵会军,林墨丞.基于汉英词向量集的外交话语倾向度对比研究[J].当代外语研究,2024(4):168-178.
5贺晨.基于“语境”意识的医学汉英翻译“遣词”能力培养[J].赣南医科大学学报,2024,44(9):973-976.

1任高举,吐尔根.伊布拉音,艾山.吾买尔.基于短语的统计机器翻译中汉维短语对抽取算法改进[J].现代计算机,2010,16(5):9-11.
2任高举,吐尔根.伊布拉音,艾山.吾买尔.统计机器翻译中汉维短语对抽取的研究[J].新疆大学学报（自然科学版）,2010,27(3):349-352. 被引量：4
3杜剑雯,刘超,许鸿飞,文玲锋,李雪梅,彭青青.基于多智能体专家系统的呼叫中心员工能力评价研究[J].中国新通信,2015,17(17):52-54.
4曲晓慧,高义海,乔新勇.信息融合在柴油机状态评估中的应用[J].测控技术,2004,23(z1):404-405.
5郑云翔.高级语言程序设计课程考核系统的设计与开发[J].教育信息技术,2010(8):59-60.
6何彦青,周玉,宗成庆,王霞.基于“松弛尺度”的短语翻译对抽取方法[J].中文信息学报,2007,21(5):91-95. 被引量：6
7苗洪霞,蔡东风,宋彦.基于短语的统计机器翻译方法[J].沈阳航空工业学院学报,2007,24(2):32-34. 被引量：1
8王斌.基于未对齐汉英双语库的翻译对抽取[J].中文信息学报,2000,14(6):40-44. 被引量：4
9张剑,吴际,周明.机器翻译评测的新进展[J].中文信息学报,2003,17(6):1-8. 被引量：15
10李强,李沐,张冬冬,朱靖波.统计机器翻译中实例短语对研究[J].北京大学学报（自然科学版）,2016,52(1):113-119. 被引量：3

中文信息学报

2014年第5期

浏览历史

内容加载中请稍等...

统计机器翻译删词问题研究被引量：2

参考文献17

同被引文献21

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

统计机器翻译删词问题研究 被引量：2

参考文献17

同被引文献21

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

统计机器翻译删词问题研究被引量：2