使用源语言复述知识改善统计机器翻译性能被引量：4

Improved Statistical Machine Translation with Source Language Paraphrase

下载PDF

导出

摘要为了缓解双语语料不足导致的翻译知识欠缺问题,提出基于复述技术的翻译框架。此框架利用第三种语言获取带有概率的复述知识表,以Lattice表示输入句子的多种复述形式,扩展解码器使之可以对Lattice形式的输入进行解码,将复述知识作为特征加入到对数线性模型的目标函数中。在保持原始翻译知识表不变的情况下,此框架不仅可以增大短语翻译表对源语言现象的覆盖率,也能够增加候选译文表现形式的多样性。在3个不同规模训练集上的对比实验结果表明,在训练语料规模最小的情况下(10 K句对),系统性能有明显提升(BLEU+1.4%);在训练语料规模最大的情况下(1 M句对),系统性能也取得一定提升(BLEU+0.32%)。 The performance of statistical machine translation （SMT） suffers from the insufficiency of parallel corpus. To solve the problem, the authors propose a paraphrase based SMT framework with three solutions： 1） acquiring paraphrase knowledge based on a third language; 2） expressing multiple paraphrases of input sentence in a lattice and modifying decoder to be able to process it; 3） integrating paraphrase knowledge as features into log- linear model. In this way, not only more expressions in source language can be covered, but also more expressions in target language can be generated as candidate translations. To verify proposed method, experimetxts are conducted on three training data sets with different sizes, and evaluate the improvement of the performance of SMT system contributed by paraphrasing. Experimental results show that the translation performance is improved significantly （BLEU＋ 1.4%） when the parallel corpus is small （10 K）, and a good performance （BLEU＋0.32%） is also achieved when parallel corpus is large enough （1 M）.

作者苏晨张玉洁郭振徐金安

机构地区北京交通大学计算机学院

出处《北京大学学报（自然科学版）》 EI CAS CSCD 北大核心 2015年第2期342-348,共7页 Acta Scientiarum Naturalium Universitatis Pekinensis

基金国家国际科技合作专项(2014DFA11350) 国家自然科学基金(61370130) 北京交通大学人才基金(2011RC034)资助

关键词复述知识短语翻译表特征解码器 paraphrase phrase translation table teatures decoder

分类号 TP391.2 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献8

1Du Jinhua, Jiang Jie, Way A. Facilitating translationusing source language paraphrase lattices // Procee- dings of the 2010 Conference on Empirical Mthods in Natural Language Processing. Massachusetts: Association for Computational Linguistics, 2010: 420-429.
2Callison-Burch C, Koehn P, Osborne M. Improved statistical machine translation using paraphrases // Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computa- tional Linguistics. New York, 2006:17-24.
3赵世奇,刘挺,李生.复述技术研究[J].软件学报,2009,20(8):2124-2137. 被引量：14
4Madnani N, Dorr B J. Generating phrasal and sentential paraphrases: a survey of data-driven methods. Computational Linguistics, 2010, 36(3): 341-387.
5Wu Hua, Zhou Ming. Synonymous collocation extraction using translation information // Procee- dings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1. Sapporo, 2003: 120-127.
6Och F J. Minimum error rate training in statistical machine translation//Proceedings of the 41st Annual Meeting on Association for Computational Linguistics- Volume 1. Sapporo, 2003:160-167.
7Koehn P, Och F J, Marcu D. Statistical phrase-based translation // Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Atlanta, 2003:48-54.
8Papineni K, Roukos S, Ward T, et al. BLEU: a method for automatic evaluation of machine translation // Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Philadelphia, 2002: 311-318.

二级参考文献1

1刘挺,李维刚,张宇,李生.复述技术研究综述[J].中文信息学报,2006,20(4):25-32. 被引量：13

共引文献13

1马天欢.语用视角下复述句生成方式的类型考察[J].中文信息学报,2021,35(10):32-38.
2段利国,陈俊杰.限定语义距离的关键词同义扩展及精简[J].计算机工程与应用,2011,47(23):13-16. 被引量：2
3胡金铭,史晓东,苏劲松,陈毅东.引入复述技术的统计机器翻译研究综述[J].智能系统学报,2013,8(3):199-207. 被引量：6
4苏劲松,董槐林,陈毅东,史晓东,吴清强.引入基于主题复述知识的统计机器翻译模型[J].浙江大学学报（工学版）,2014,48(10):1843-1849. 被引量：1
5翁贞,李茂西,王明文.利用Markov网络抽取复述增强机器译文自动评价方法[J].中文信息学报,2015,29(5):136-142. 被引量：1
6张俊驰,胡婕,刘梦赤.基于复述的中文自然语言接口[J].计算机应用,2016,36(5):1290-1295. 被引量：1
7张丽林,李茂西,肖文艳,万剑怡,王明文.机器翻译自动评价中领域知识复述抽取研究[J].北京大学学报（自然科学版）,2017,53(2):230-238. 被引量：8
8刘明童,张玉洁,徐金安,陈钰枫.开放域上基于深度语义计算的复述模板获取方法[J].中文信息学报,2018,32(2):94-101. 被引量：4
9柔特,才让加,孙茂松.基于语序变换的藏文复述句生成方法[J].计算机工程,2018,44(4):231-235. 被引量：2
10刘明童,张玉洁,徐金安,陈钰枫.基于句法结构的神经网络复述识别模型[J].北京大学学报（自然科学版）,2020,56(1):45-52. 被引量：3

同被引文献30

1Papineni K, Roukos S, Ward T, et al. BLEU: a Method for Automatic Evaluation of Machine Transla- tion[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 2002.311- 318.
2Doddington G. Automatic Evaluation of Machine Translation Quality Using N-gram Cooccurrence Sta- tistics[C]//Proceedings of the 2nd International Con- ference on Human Language Technology Research, 2002:138-145.
3Banerjee S, Lavie A. METEOR: An Automatic Met- ric for MT Evaluation with Improved Correlation with Human Judgments [C]//Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, 2005.65-72.
4Snover M, Dorr B, Schwartz R, et al. A Study of Translation Edit Rate with Targeted Human Annota- tion[-C]//Proceedings of the Association for Machine Translation in the Americas, 2006:223-231.
5Chan Y S, Ng H T. MAXSIM. A Maximum Similari- ty Metric for Machine Translation Evaluation [C]// Proceedings of the 46th Annual Meeting of the Associ- ation for Computational Linguistics, 2008: 55-62.
6Wang B, Zhao T, Yang M, et al. References Exten- sion for the Automatic Evaluation of MT by Syntactic Hybridization[C]//Proceedings of the 3rd Workshop on Synlax and Structure in Statistical Translation, 2009: 37-44.
7Kauchak D, Barzilay R. Paraphrasing for automatic evaluation//Proceedings of the Main Conference on Human I.anguage Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2006: 455-462.
8Lavie M D A. Meteor Universal. Language Specific Translation Ewluation for Any Target Language. Proceedings of the 9th Workshop on Statistical Ma- chine Translation, 2014. 376-380.
9Snover M G, Madnani N, Dorr B, et al. TER-PIus: paraphrase, semantic, and alignment enhancements to Translation Edit Rate. Machine Translation, 2009, 23(2-3) : 117-127.
10Zhou L, l.in C Y, Munteanu D S, et al. ParaEval: Using Paraphrases to Evaluate Summaries Automati- cally [C ]//Proceedings of the Human Language Technology Conference of the NAACL, 2006: 447- 454.

引证文献4

1翁贞,李茂西,王明文.利用Markov网络抽取复述增强机器译文自动评价方法[J].中文信息学报,2015,29(5):136-142. 被引量：1
2张丽林,李茂西,肖文艳,万剑怡,王明文.机器翻译自动评价中领域知识复述抽取研究[J].北京大学学报（自然科学版）,2017,53(2):230-238. 被引量：8
3朱丽秋.英汉机器翻译中的短语自动识别算法[J].现代电子技术,2017,40(15):126-128. 被引量：1
4颜欣,张宇,潘晓彤,刘作鹏,刘挺.基于深度学习的中文短语复述抽取技术研究[J].中文信息学报,2021,35(2):61-68. 被引量：1

二级引证文献11

1张丽林,李茂西,肖文艳,万剑怡,王明文.机器翻译自动评价中领域知识复述抽取研究[J].北京大学学报（自然科学版）,2017,53(2):230-238. 被引量：8
2冯青文.知识抽取国内研究现状分析[J].常州信息职业技术学院学报,2017,16(2):32-36. 被引量：2
3谭亦鸣,王明文,李茂西.基于翻译质量估计的神经网络译文自动后编辑[J].北京大学学报（自然科学版）,2018,54(2):255-261. 被引量：2
4邢蕾.英汉机器翻译中译文自动生成系统设计[J].现代电子技术,2018,41(24):86-89. 被引量：2
5王亚娟,李晓,杨雅婷,米成刚.基于释义信息的维汉机器翻译系统融合研究[J].计算机工程,2019,45(4):288-295. 被引量：7
6林颖,吾守尔·斯拉木.机器翻译评价系统研究与设计[J].信息通信,2020(5):26-28. 被引量：2
7岳佩,张浩.用户反馈和模式识别相融合的机器翻译优化研究[J].信息技术,2021,45(1):126-130. 被引量：7
8胡仁青.基于深度学习算法的机器自动翻译质量评估模型[J].电子设计工程,2021,29(21):33-37. 被引量：8
9刘媛,李茂西,罗琪,李易函.基于神经网络的机器译文自动评价综述[J].中文信息学报,2023,37(9):1-14.
10胡纬,李茂西,裘白莲,王明文.融合XLM词语表示的神经机器译文自动评价方法[J].中文信息学报,2023,37(9):46-54. 被引量：1

1王斌.基于未对齐汉英双语库的翻译对抽取[J].中文信息学报,2000,14(6):40-44. 被引量：4
2搜索江湖之基础秘笈[J].微电脑世界,2003(13):115-120.
3刘颖,铁铮,余畅.汉英短语翻译对的自动抽取[J].计算机应用与软件,2012,29(7):69-72. 被引量：3
4张春祥,赵铁军,卢志茂,高雪瑶.基于对等模式的汉-英译文调序[J].高技术通讯,2013,23(1):29-34.
5张贯虹.融合句法信息的双语词对齐方法研究[J].电脑知识与技术,2014(3):1519-1523.
6胡茹.一种嵌入词义消歧的机器翻译框架[J].黑龙江科技信息,2014(30):126-126.
7张春祥,赵铁军,卢志茂.基于主动学习的短语翻译对获取[J].高技术通讯,2011,21(4):380-385.
8银花,王斯日古楞,艳红.基于短语的蒙汉统计机器翻译系统的设计与实现[J].内蒙古师范大学学报（自然科学汉文版）,2011,40(1):91-94. 被引量：8
9孔金英,温政阳,杨雅婷,王磊,李晓.面向维汉机器翻译的语料筛选技术研究[J].计算机应用研究,2016,33(12):3654-3657. 被引量：2
10孙水华,丁鹏,黄德根.利用句法短语改善统计机器翻译性能[J].中文信息学报,2015,29(2):95-102. 被引量：5

北京大学学报（自然科学版）

2015年第2期

浏览历史

内容加载中请稍等...

使用源语言复述知识改善统计机器翻译性能被引量：4

参考文献8

二级参考文献1

共引文献13

同被引文献30

引证文献4

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

使用源语言复述知识改善统计机器翻译性能 被引量：4

参考文献8

二级参考文献1

共引文献13

同被引文献30

引证文献4

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

使用源语言复述知识改善统计机器翻译性能被引量：4