摘要
虽然源语言和目标语言单语数据已被证明通过正向翻译和反向翻译改进神经机器翻译非常有用,但如何更有效的同时使用还值得更深入的研究.为了在神经机器翻译中更有效地同时使用源语言和目标语言单语数据,本文提出了一种基于集束搜索的正向翻译和基于最优N随机采样的反向翻译的组合方法.具体地,将该方法应用于第十七届全国机器翻译大会(CCMT 2021)汉英和英汉新闻领域的翻译评测任务,实验结果表明,与其他常用的单语数据增强方法相比,该方法可以更有效地提升神经机器翻译模型的翻译质量.此外,在使用该方法之前,先进行领域知识迁移还可以进一步取得翻译质量的提升.
Although source-side and target-side monolingual data have been proven useful in enhancing neural machine translation(NMT)performances,how to effectively leverage these data is not well studied.In this study,we investigate how to better utilize monolingual data on both sides when training an NMT model,and proposes a strategy that combines both the forward translation using beam search and the back translation using N-best sampling.Also,comparative experiments are conducted using the Chinese-English and English-Chinese news data retrieved from the 17 th China Conference on Machine Translation(CCMT 2021).Experimental results reveal that the proposed strategy outperforms other commonly used monolingual data augmentation methods.Finally,it is found that adopting the domain transfer prior to using this combined strategy can further improve the translation quality.
作者
吴章淋
魏代猛
李宗耀
於正哲
商恒超
陈潇雨
郭嘉鑫
王明涵
雷立志
陶士敏
杨浩
秦璎
WU Zhanglin;WEI Daimeng;LI Zongyao;YU Zhengzhe;SHANG Hengchao;CHEN Xiaoyu;GUO Jiaxin;WANG Minghan;LEI Lizhi;TAO Shimin;YANG Hao;QIN Ying(Huawei Text Machine Translation Lab,Beijing 100038,China)
出处
《厦门大学学报(自然科学版)》
CAS
CSCD
北大核心
2022年第4期675-681,共7页
Journal of Xiamen University:Natural Science
关键词
神经机器翻译
单语数据增强
集束搜索
最优N随机采样
正向翻译
反向翻译
neural machine translation
monolingual data augmentation
beam search
N-best sampling
forward translation
back translation