多引擎机器翻译译文重排序与融合研究

A Study of Re-ranking and Combination for Multi-engine Machine Translation

下载PDF

导出

摘要 [目的/意义]使用不用的模型、方法、语种、数据构建的机器翻译引擎往往在不同的场景下具有不同的翻译效果。因此,很多研究者都在构建机器翻译引擎时尝试使用多引擎译文融合或多翻译方法融合的方式来利用不同翻译引擎的优点,然而过往的工作没有考虑到如何利用用户在使用多引擎机器翻译所产生的数据来获取存在于用户认知域中对这些引擎译文的评价。[方法/过程]本文研究提出了基于六个翻译引擎的多引擎翻译平台。该平台在长期使用中产生了翻译结果、用户特征、人工校译等数据,本文基于以上大规模历史数据构建了翻译模型训练资源库,结合Page Rank算法、贝叶斯公式和UNQE方法提出了多引擎机器翻译译文重排序方法,并利用译文重排序的结果与翻译模型训练资源库中的翻译实例相关数据,进一步使用Transformer架构训练了译文融合模型。[局限]所提方法存在冷启动问题,需要一定时间、大量用户的真实数据才能够实现预期效果。[结果/结论]实验结果表明了本文提出的方法能够融合多引擎优势,提高不同领域的平均译文质量。 [Objective/Significance]Machine Translation(MT)engines trained with different models,methods,language and data have different performance for multiple specific translation scenario.Thus,a number of research tried to use multi-engine or multi-method combination approach for constructing MT system with advances of each MT engine.[Methods/Processes]This research provides a multi-engine platform with six different MT engines.During the long-term using of it,there comes a huge amount of data of translation instances,user profiles and human translates.A resource warehouse for translation model training is constructed using these data.we offer a method of multi-engine MT re-ranking using the resource warehouse with Page Rank Algorithm,Bayes Rule and UNQE.Furthermore,we use the result generated by the re-ranking method with human translations provided by the resource warehouse to train a translation combination model.[Limitations]This Method has cold boot problem which requires data generated within a period of time and by a number of users to reach our goals.[Results/Conclusions]The test result shows the method we provide can use advantages of multiple MT engines and improve translation eventually.

作者李铭张克亮唐亮夏榕璟 LI Ming;ZHANG Keliang;TANG Liang;XIA Rongjing(Information Engineering University(Luoyang),Luoyang 471003,China)

机构地区战略支援部队信息工程大学

出处《情报工程》 2023年第2期96-107,共12页 Technology Intelligence Engineering

关键词多引擎机器翻译译文重排序译文融合 Multi-engine machine translation Translation re-ranking Translation combination

分类号 TP391 [自动化与计算机技术—计算机应用技术] G35 [文化科学—情报学]

引文网络
相关文献

参考文献4

1李洪政,冯冲,黄河燕.稀缺资源语言神经网络机器翻译研究综述[J].自动化学报,2021,47(6):1217-1231. 被引量：14
2Junguo Zhu,Muyun Yang,Sheng Li,Tiejun Zhao.Sentence-Level Paraphrasing for Machine Translation System Combination[J].国际计算机前沿大会会议论文集,2016(1):156-158. 被引量：1
3李响,胡小鹏,袁琦.面向多引擎融合技术的统计后编辑方法研究[J].工业技术创新,2015,2(6):591-596. 被引量：1
4宿建军,张小燕,吐尔洪.吾司曼,李晓.联合式多引擎维汉机器翻译系统[J].计算机工程,2011,37(16):179-181. 被引量：4

二级参考文献29

1孙广范,宋金平,袁琦,肖健,单玉秋.中英可比语料库中翻译等价对抽取方法研究[J].计算机工程与应用,2007,43(32):44-46. 被引量：9
2Matusov E,Ueffing N.Computing Consensus Translation from Multiple Machine Translation Systems Using Enhanced Hypotheses Alignment. Proc.of Conference of Association for Computational Linguistics . 2006
3Rosti A V I,Matsoukas S,Schwartz R.Improved Word-level System Combination for Machine Translation. Proc.of the 45th Annual Meeting of the Association of Computational Linguistics . 2007
4Creutz M,Lagus K.Induction of a Simple Morphology for Highly Inflecting Languages. Proc.of the 7th Meeting of the ACL Special Interest Group in Computational Phonology . 2004
5B. Bangalore,G. Bordel,G. Riccardi.Computing consensus translation from multiple machine translation systems. Proceedings of Automatic Speech Recognition and Understanding . 2001
6Sim K C,Byrne W J,Gales M J F,Sahbi H,Woodland P C.Consensus network decoding for statistical machine trans-lation system combination. Proceedings of the Interna-tional Conference on Acoustics,Speech and Signal Process-ing . 2007
7Reinhard Rapp et al., Introduction to The Third Workshopon Hybrid Approaches to Translation, Proceedings of the 3rdWorkshop on Hybrid Approaches to Translation (HyTra) @EACL 2014,pages iii.
8Rapha. el Rubino et al., Statistical Post-Editing of MachineTranslation for Domain Adaptation, Proceedings of the 16thEAMT Conference, p.221-228. 28-30 May 2012,Trento, Italy.
9Simard, M.,C. Goutte, and P. Isabelle. 2007a. Statistical Phrase-based Post-editing. In NAACL-HLT, pages 508,515.
10Isabelle, P., C. Goutte, and M. Simard. 2007. Domain adaptation of MTsystems through automatic postediting. In MT Summit XI, pages 255-261.

共引文献15

1陈韵,张鹏华,任利华.机器翻译研究述评[J].价值工程,2013,32(1):174-176. 被引量：2
2艾孜孜.吐尔逊,杨雅婷,吐尔洪.吾司曼,周俊林,李晓.维-汉统计机器翻译中维吾尔语预处理研究[J].计算机工程与设计,2014,35(11):4034-4039. 被引量：3
3巫奕君,秦永红.基于相似度模型的英语机器翻译研究[J].现代科学仪器,2020(6):159-162. 被引量：1
4刘畅,阿布都克力木·阿布力孜,姚登峰,哈里旦木·阿布都克里木.维吾尔语形态分析研究综述[J].计算机工程与应用,2021,57(15):42-61. 被引量：3
5程晓娇.基于多特征融合的机器英语翻译错误自动识别研究[J].黑龙江工业学院学报（综合版）,2021,21(10):66-71. 被引量：4
6黎家全,王丽清,李鹏,蒋晓敏,徐永跃.面向神经机器翻译的枢轴方法研究综述[J].计算机工程与应用,2022,58(16):49-55. 被引量：1
7张弘弢,文永华,王剑.基于依存距离惩罚的泰汉神经机器翻译方法[J].通信技术,2022,55(8):990-997.
8闵秋洁,刘东.基于机器翻译的普通话发音标准度测试系统[J].自动化与仪器仪表,2022(9):115-119. 被引量：1
9薛之芹,张贯虹,王见贤,范义飞.基于Attention-BiLSTM模型的对话式文本抑郁识别研究[J].电脑知识与技术,2023,19(7):38-41.
10李征,徐明瑞,吴永豪,刘勇,陈翔,武淑美,刘恒源.基于层次注意力机制的源代码迁移模型[J].计算机应用研究,2023,40(10):3082-3090.

1谢晓璐.学术文本机器翻译错误探析——以《神经机器翻译》(节选)为例[J].今古文创,2023(15):113-115.
2胡家敏,冷冰冰.MTPE模式下科技文本典型翻译错误的归类与分析[J].中国科技翻译,2023,36(1):23-26. 被引量：1
3赵川.打造“六维”党建平台助力粮储事业发展[J].中国粮食经济,2023(3):52-54.
4成新亮,赵杰.“平台化·项目化·行业化”三位一体英语翻译教学模式实践研究[J].当代教研论丛,2023,9(4):34-37.
5陈昌孝,李浩,王梓晗,姜文博.认知域作战新利器——ChatGPT认知剖析及对策[J].军事文摘,2023(6):28-32.
6汪明敏.美西方在俄乌冲突中的情报披露及启示[J].情报杂志,2023,42(6):12-18. 被引量：2
7张富丽.从作品出海到生态出海:中国网络文学国际传播现状[J].扬子江文学评论,2023(2):75-81. 被引量：5
8沈洋,谢铮铮.江苏省房地产价格影响因素地区性差异分析[J].中国房地产,2023(1):14-22. 被引量：1
9张梦瑶.目的论视域下景区语言景观英译问题与对策研究--以河南云台山风景区为例[J].品位·经典,2023(6):51-54.
10黄玮.功能对等理论视角下的儿童文学翻译——以《恐龙谷历险记》为例[J].秦智,2023(4):0106-0109.

情报工程

2023年第2期

浏览历史

内容加载中请稍等...

多引擎机器翻译译文重排序与融合研究

参考文献4

二级参考文献29

共引文献15

相关作者

相关机构

相关主题

浏览历史