期刊文献+

自学习和回译的双向增强藏汉机器翻译方法

A Bidirectional Enhanced Tibetan Chinese Machine Translation Method Based on Self-Learning And Backtracking
下载PDF
导出
摘要 近年来在神经机器翻译框架内利用源端或目标端单语文本数据的工作机制引起了广泛关注和研究。在提出的众多方案中,回译被视为提高低资源神经机器翻译性能最有前景的方法之一。虽然该方法具有低廉的可操作性,但是其有效性在很大程度上依赖于利用现有双语平行数据训练的初始回译模型本身的性能。为改善藏汉对齐数据在规模上受限条件下的机器翻译性能,利用目标端单语数据,通过逐步接替式采用自学习和回译方法以改进反向和正向模型,使用20M句子规模的汉语单语数据,在藏汉神经机器翻译任务上,利用双向增强方法的正向和反向模型在测试集上的性能比只使用平行数据训练的Transformer模型分别增长了3.1和8.2个BLEU分值,证实了该方法的有效性。 Although neural machine translation has achieved great academic and industrial success,challenges persist under low-resourced conditions.In recent years,the working mechanism of using source or target monolingual text data within the framework of neural machine translation has attracted widespread attention.Among the array of supervised and unsupervised proposals,back translation is regarded as one of the most promising methods to improve the performance of low-resource neural machine translation.Although this method is simple and affordable,its effectiveness largely depends on the performance of the initial back translation model trained with existing bilingual parallel data.At present,due to the relatively limited scale of Tibetan-Chinese parallel data,to alleviate the difficulties in Tibetan-Chinese machine translation back translation practice under low-resource scenarios,this paper proposes to employ targetside(Chinese)monolingual data to improve both backward and forward models by step-wise adaptation of self-learning and back-translation(i.e.,Bidirectional Boost).By using 20M sentence-scale Chinese monolingual data on the Tibet-an-Chinese neural machine translation task,the proposed Bidirectional Boost model produces 3.1 and 8.2 BLEU scores,respectively,both on forward and backward models over vanilla Transformers trained on genuine parallel data,indicating the effectiveness of the proposed model.In addition,this method is language-agnostic and can be applied to translation tasks in other low-resource settings.
作者 桑杰端珠 才让加 SANGJIE Duanzhu;CAIRANG Jia(State Key Laboratory of Tibetan Intelligent Information Processing and Application,Qinghai Normal University,Xining Qinghai 810000,China;School of Computer Science,Qinghai Normal University,Xining Qinghai 810000,China;Tibetan Information Processing Engineering Technology Research Center of Qinghai Province,Qinghai Normal University,Xining Qinghai 810000,China)
出处 《计算机仿真》 2024年第8期544-548,共5页 Computer Simulation
基金 青海省重点研发与转化计划项目(2022-GX-104) 青海省中央引导地方科技发展资金项目(2022ZY006)。
关键词 机器翻译 神经网络 藏汉 回译 自学习 Machine translation Neural networks Tibetan-chinese Back-translation Self-learning
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部