期刊文献+

基于mRASP的藏汉双向神经机器翻译研究

Research on Tibetan-Chinese Bidirectional Neural MachineTranslation Based on mRASP
下载PDF
导出
摘要 藏汉机器翻译技术的研究对于弘扬和传承优秀民族文化,推进藏族地区经济、教育和文化的发展有着十分重要的现实意义。该文立足于藏汉平行语料匮乏而导致的藏汉神经机器翻译效果欠佳的问题,对跨语言预训练模型进行了研究。使用第十八届全国机器翻译大会(CCMT 2022)的藏汉数据集构建藏汉双语的跨语言预训练模型(mRASP),采用谷歌的Transformer神经网络机器翻译架构作为基线模型,主要利用数据增强的方式对藏汉平行语料进行扩充、优化藏汉机器翻译所用到的词表,并探索跨语言预训练模型中的联合词表对翻译性能的影响,最终提出了一种融合跨语言预训练模型(mRASP)与改进后的绿色联合词表的藏汉双向神经机器翻译。经过上述策略,藏汉翻译任务上的BLEU值达到了55.69,汉藏翻译任务上的BLEU值达到了29.57。与传统的基于预训练模型的藏汉双向神经机器翻译相比,在稀缺资源条件下有效地提升了藏汉双向机器翻译的性能。 The study of Tibetan-Chinese machine translation technology is of great practical significance to promote and inherit excellent national culture and advance the development of economy,education and culture in Tibetan areas.Based on the problem of poor Tibetan-Chinese neural machine translation caused by the lack of Tibetan-Chinese parallel corpus,we investigate the cross-linguistic pre-training model.We use the Tibetan-Chinese dataset from the 18th National Conference on Machine Translation(CCMT 2022)to construct the cross-lingual pre-training model(mRASP)for Tibetan-Chinese bilingualism,and adopt Google's Transformer neural network machine translation architecture as the baseline model,and mainly use data augmentation to expand the Tibetan-Chinese parallel corpus and optimize the vocabulary used in Tibetan-Chinese machine translation,and explore the influence of the joint vocabulary in the cross-language pre-training model on the translation performance.Finally,a Tibetan-Chinese bidirectional neural machine translation that integrates the cross-language pre-training model(mRASP)and the improved green joint vocabulary is proposed.Through the above strategies,the BLEU value on the Tibetan-Chinese translation task reached 55.69,and the BLEU value on the Chinese-Tibetan translation task reached 29.57.Compared with the traditional Tibetan-Chinese bidirectional neural machine translation based on pre-trained model,it effectively improves the performance of Tibetan-Chinese bidirectional machine translation under the condition of scarce resources.
作者 杨丹 拥措 仁青卓玛 唐超超 YANG Dan;YONG Cuo;RENQING Zhuo-ma;TANG Chao-chao(School of Information Science and Technology,Tibet University,Lhasa 850000,China;State Key Laboratory of Artificial Intelligence for Tibetan Information Technology in Tibet Autonomous Region,Lhasa 850000,China;Ministry of Education Engineering Research Center for Tibetan Information Technology,Lhasa 850000,China)
出处 《计算机技术与发展》 2023年第12期200-206,共7页 Computer Technology and Development
基金 国家重点研发计划项目(2017YFB1402202) 西藏自治区科技创新基地自主研究项目(XZ2021HR002G) 西藏大学珠峰学科建设计划项目(zf22002001)。
关键词 跨语言预训练模型 藏汉双向神经机器翻译 mRASP 数据增强 词表 cross-language pre-training model Tibetan-Chinese bidirectional neural machine translation mRASP data augmentation vocabulary
  • 相关文献

参考文献10

二级参考文献41

共引文献129

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部