摘要
由于低资源语料稀少而导致的语义捕获不充分现象已成为影响机器翻译质量的主要因素.为此,该文在预处理的基础上利用CNN和门控机制来改进Transformer模型,通过对抗训练的方式来引导模型参数的优化,同时通过加入命名实体识别来提高模型对实体的翻译性能.此外,通过多模型融合的方式将来自多个机器翻译的输出经过改进、重组、合并转变为一个单一的改进的翻译结果.通过3组对比实验表明,该方法优于基准方法.
The phenomenon of insufficient semantic capture due to the scarcity of low-resource corpus has become a major factor affecting the quality of machine translation.Therefore,based on the pretreatment,the paper improves the transformer model by using CNN and gating mechanism and guides the optimization of model parameters using confrontation training.At the same time,named entity recognition is added to improve the translation performance of model entities.Reorganize and merge the output from multiple machine translations into a single improved translation result through multi-model fusion.The Mongolian-Chinese translation experiments show that the proposed method is superior to the benchmark method.
作者
武子玉
侯宏旭
白天罡
吉亚图
乌尼尔
郭紫月
王雪姣
孙硕
WU Ziyu;HOU Hongxu;BAI Tiangang;JI Yatu;WU Nier;GUO Ziyue;WANG Xuejiao;SUN Shuo(College of Computer Science,Inner Mongolia University,Hohhot Inner Mongolia 010021,China;Inner Mongolia A.R.Key Laboratory of Mongolian Information Processing Technology,Inner Mongolia University,Hohhot Inner Mongolia 010021,China)
出处
《江西师范大学学报(自然科学版)》
CAS
北大核心
2020年第2期153-159,共7页
Journal of Jiangxi Normal University(Natural Science Edition)
基金
内蒙古自然科学基金(2018MS06005)
内蒙古自治区科技成果转化“蒙古文机器翻译与辅助翻译云平台建设与推广”(2019CG028)资助项目。
关键词
蒙汉机器翻译
数据稀疏
系统融合
命名实体
Mongolian-Chinese machine translation
data sparsity
fusion system
named entity