摘要
针对蒙汉平行语料资源比较稀缺和现有平行语料数据覆盖面少等导致的蒙汉翻译质量不佳的问题,采用跨语言多任务学习的方式对机器翻译建模。在数据预处理阶段,引入两种新的无监督预训练和一种监督预训练的方法,用于跨语言建模来学习跨语言表示,并研究三种语言预训练方法在蒙汉翻译中的效果。实验结果表明,三种跨语言预训练的模型可以显著降低低资源语言的困惑度,提高蒙汉翻译质量。
To solve the problem of poor quality of Mongolian and Chinese translations caused by the difficulty of Mongolian-Chinese parallel corpus resources and the lack of coverage of existing parallel corpus data,this paper models machine translation by means of cross-language multi-task learning.In the data preprocessing stage,two new unsupervised pre-training and one supervised pre-training method were introduced for cross-language modeling to learn cross-language representation,and the effect of three language pre-training methods were studied in Mongolian-Chinese translation.The experimental results show that the above three cross-language pre-training models can significantly reduce the confusion of low-resource language,improve the quality of Mongolian-Chinese translation.
作者
张振
苏依拉
仁庆道尔吉
高芬
王宇飞
Zhang Zhen;Su Yila;Ren Qingdaoerji;Gao Fen;Wang Yufei(School of Information Engineering,Inner Mongolia University of Technology,Hohhot 010080,Inner Mongolia,China)
出处
《计算机应用与软件》
北大核心
2021年第1期157-160,178,共5页
Computer Applications and Software
基金
国家自然科学基金项目(61363052)
内蒙古自治区自然科学基金项目(2016MS0605)
内蒙古自治区民族事务委员会基金项目(MW-2017-MGYWXXH-03)。
关键词
蒙汉机器翻译
无监督预训练
监督预训练
跨语言建模
多任务学习
Mongolian-Chinese machine translation
Unsupervised pre-training
Supervised pre-training
Cross-language modeling
Multi-task learning