期刊文献+

基于适应性训练与丢弃机制的神经机器翻译 被引量:2

Neural Machine Translation Based on Adaptive Training and Drop Mechanism
下载PDF
导出
摘要 在机器翻译领域中,提升翻译质量的一个重要方法是提高短语的翻译准确率。统计机器翻译模型通过对短语而非单词进行建模,大幅提升了短语翻译准确率。然而,对于神经机器翻译模型,传统的训练目标最小化每个词的损失,而无显式的约束记忆短语存在短语的翻译准确率较低的缺陷,另外基于自回归解码的神经机器翻译模型导致误译的短语会影响后续短语的准确翻译。为了解决上述问题,提出短语感知适应性训练和短语丢弃机制的方法。短语感知适应性训练将句子分割为多个短语片段,借助适应性训练目标为每个词分配合适的权重,以鼓励模型记忆短语,提高模型对短语的翻译准确率,短语丢弃机制通过在训练中随机丢弃目标端短语来增强模型对误译短语的鲁棒性,避免对后续短语的翻译造成影响。在WMT2014英德和NIST中英两个翻译任务上的实验结果表明,与Transformer基线模型相比,提出方法可以使译文的BLEU值分别提高1.64和0.96分。此外还证明了短语知识作为一种通用的知识,可以从教师模型迁移到学生模型,进一步提升翻译质量。 In the field of machine translation,enhancing the translation accuracy of phrases is a key strategy for improving overall translation quality.Although statistical machine translation models have substantially improved phrase translation accuracy by focusing on the phrase level instead of individual words.However,Neural Machine Translation(NMT)models face particular challenges.First,traditional training objectives,which minimize per-word loss,do not impose explicit constraints that encourage NMT models to prioritize phrases.Consequently,this often results in less precise phrase translations.Second,autoregressive decoding in neural machine translation can generate mistranslated phrases,leading to subsequent reduction in the accuracy of later translations.To address these challenges,this study introduces two methods:phrase perception adaptation training and a phrase drop mechanism.The former,known as phrase-aware adaptive training,begins by segmenting sentences into multiple phrase segments.During training,different weights are assigned to target words based on their positions within phrases,with the aim of augmenting the model's comprehension of phrases.Concurrently,the phrase drop mechanism is introduced to improve the model's resilience against mistranslated phrases by randomly omitting phrases during training.Experimental evaluations on two translation benchmarks,Workshop on statistical Machine Translation 2014(WMT2014)English-German and National Institute of Standards and Technology(NIST)Chinese-English,indicate that the proposed strategies enhance the translation BiLingual Evaluation Understudy(BLEU)scores by 1.64 and 0.96 points,respectively,when compared to the baseline model,the Transformer.Additionally,the experiments affirm that phrase knowledge is universally applicable,facilitating its transfer from teacher models to student models and further enhancing translation quality.
作者 段仁翀 段湘煜 DUAN Renchong;DUAN Xiangyu(School of Computer Science and Technology,Soochow University,Suzhou 215000,Jiangsu,China)
出处 《计算机工程》 CAS CSCD 北大核心 2023年第10期120-126,135,共8页 Computer Engineering
基金 江苏高校优势学科建设工程项目。
关键词 机器翻译 知识迁移 适应性训练 短语 丢弃机制 machine translation knowledge transfer adaptive training phrase drop mechanism
  • 相关文献

参考文献6

二级参考文献22

  • 1俞士汶等.机器翻译译文质量自动评估系统[A]..中国中文信息学会1991年会论文集[C].,.314—319.
  • 2Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Fredrick Jelinek, John D. Lafferty, Robert L. Mercer, Paul S. Roossin, A Statistical Approach to Machine Translation [J],Computational Linguistics, 1990.
  • 3Peter. F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, Robert L. Mercer, The Mathematics of Statistical Machine Translation: Parameter Estimation [J], Computational Linguiatics, 19,(2), 1993.
  • 4F. J. Och, C. Tillmann, and H. Ney. Improved alignment models for statistical machine translation[A]. In Proc. of the Joint SIGDAT Conf. On Empirical Methods in Natural Language Processing and Very Large Corpora, pages 20-28, University of Maryland, College Park, MD, June 1999.
  • 5Franz Josef Och, Hermann Ney. What Can Machine Translation Learn from Speech Recognition? [A]In: proceedings of MT 2001 Workshop: Towards a Road Map for MT, 26-31, Santiago de Compostels,Spain, September 2001.
  • 6Franz Josef Och, Hermann Ney, Discriminative Training and Maximum Entropy Models for Statistical Machine Translation [A], ACL2002.
  • 7K. A. Papineni, S. Roukos, and R. T. Ward. Feature-based language understanding[A]. In European Conf. on Speech Communication and Technology, 1435-1438, Rhodes, Greece, September,1997.
  • 8K. A. Papineni, S. Roukos, and R. T. Ward. Maximum likelihood and discriminative training of direct translation models [A] In Proc. Int. Conf. on Accoustics, Speech, and Signal Processing,pages,189-192, Seattle, WA, May, 1998.
  • 9Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu, Bleu: a Method for Automatic Evaluation of Machine Translation [R], IBM Research, RC22176 (W0109-022) September 17, 2001.
  • 10Ye-Yi Wang, Grammar Inference and Statistical Machine Translation [D], Ph.D Thesis, Carnegie Mellon University, 1998.

共引文献217

同被引文献23

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部