期刊文献+

基于字符级语言建模的汉蒙神经机器翻译方法研究 被引量:1

Chinese-Mongolian Neural Machine Translation Method Based on Character-Level Language Modeling
原文传递
导出
摘要 随着机器学习技术的发展,文字翻译模型的翻译效率与准确率逐步提高,要达到理想的翻译效果离不开大量高质量的平行语料.疫情以来,我国坚持扩大内需、形成强大的国内市场,各民族间的联系比以往更为紧密,各种语言间的翻译尤为重要.蒙古语作为一类使用量较大的少数民族语言,不同词形涵义千差万别且缺少足够的平行语料支撑训练,导致现有的语言翻译模型效果不佳.本文针对以上问题,进行如下研究:(1)提出字符级语句分割,缓解平行语料不足带来的未登录词问题,降低了计算成本.(2)使用去噪自编码技术,迫使模型学习如何更加鲁棒地表达输入特征,增强模型的泛化能力.(3)使用无监督对偶式迭代翻译模型,将汉蒙翻译与蒙汉翻译以对偶方式同时进行无监督式迭代训练,实现语言建模与双向翻译,通过比较同数据集下该模型与传统Transformer模型训练的BLEU值得出,该模型具有更好的性能、更高的翻译准确率. With the advancement of machine learning technology,the efficiency and accuracy of text translation models have significantly improved.However,achieving desired translation results heavily relies on high-quality parallel corpora.Since the out break of pandemic,in light of China's focus on expanding domestic demand and fostering a strong domestic market,translation among different languages has become increasingly important,especially with closer ties between ethnic groups.Mongolian,as a minority language with limited usage,poses challenges due to varying word forms and the lack of sufficient parallel corpus for training,resulting in unsatisfactory performance of existing translation models.To address these issues,this paper conducts the following research:(1)Introducing character-level sentence segmentation to alleviate the problem of unlisted words caused by the scarcity of parallel corpora and reduce computational costs.(2)Employing denoising self-encoding technology to enhance the model's ability to robustly represent input features and improve generalization.(3)Utilizing an unsupervised dual iterative translation model to simultaneously train Chinese-Mongolian and Mongolian-Chinese translation,enabling language modeling and bidirectional translation.By comparing the Bleu score of this model with that of the traditional transformer model using the same dataset,superior performance and higher translation accuracy are observed.
作者 胡泽林 高翊 李淼 曹宜超 HU Zein;GAO Yi;LI Miao;CAO Yichao(School of Physics and Electronic Information,Gannan Normal University,Ganzhou,Jiangxi 341000,China;Yunnan Minority Language Working Committee Office,Kunming 650499,China;Institute of Intelligent Machines,Chinese Academy of Sciences,Hefei 230031,China)
出处 《昆明理工大学学报(自然科学版)》 北大核心 2023年第3期85-92,共8页 Journal of Kunming University of Science and Technology(Natural Science)
基金 国家重点研发计划项目(2017YFD0701600) 赣南师范大学博士科研启动基金项目(13SJJ202130)。
关键词 汉蒙神经机器翻译 小语种语言翻译 翻译模型 去噪自编码 字符级语言 Chinese Mongolian neural machine translation translation of minority language translation model denoising self-encoding character-level language
  • 相关文献

参考文献6

二级参考文献25

共引文献28

同被引文献11

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部