摘要
神经机器翻译是目前机器翻译领域的主流方法,而翻译记忆是一种帮助专业翻译人员避免重复翻译的工具,其保留之前完成的翻译句对并存储在翻译记忆库中,进而在之后的翻译过程中通过检索去重用这些翻译。该文基于数据扩充提出两种将翻译记忆与神经机器翻译相结合的方法:(1)直接拼接翻译记忆在源语句后面;(2)通过标签向量拼接翻译记忆。该文在中英与英德数据集上进行了实验,实验表明,该方法可以使翻译性能获得显著提升。
Neural machine translation is currently the most popular method in the field of machine translation,while translation memory is a tool to help professional translators avoid repeated translations.This paper proposes two methods to integrate the translation memory into neural machine translation via data augmentation:(1)directly stitching translation memory after the source sentence;(2)stitching translation memory by tag embedding.Experiments on Chinese-English and English-German datasets show that proposed methods can achieve significant improvements.
作者
曹骞
熊德意
CAO Qian;XIONG Deyi(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China)
出处
《中文信息学报》
CSCD
北大核心
2020年第5期36-43,共8页
Journal of Chinese Information Processing
基金
国家重点研发计划(2019QY1802)
关键词
神经机器翻译
翻译记忆
数据扩充
neural machine translation
translation memory
data augmentation