自学习和回译的双向增强藏汉机器翻译方法

A Bidirectional Enhanced Tibetan Chinese Machine Translation Method Based on Self-Learning And Backtracking

下载PDF

导出

摘要近年来在神经机器翻译框架内利用源端或目标端单语文本数据的工作机制引起了广泛关注和研究。在提出的众多方案中,回译被视为提高低资源神经机器翻译性能最有前景的方法之一。虽然该方法具有低廉的可操作性,但是其有效性在很大程度上依赖于利用现有双语平行数据训练的初始回译模型本身的性能。为改善藏汉对齐数据在规模上受限条件下的机器翻译性能,利用目标端单语数据,通过逐步接替式采用自学习和回译方法以改进反向和正向模型,使用20M句子规模的汉语单语数据,在藏汉神经机器翻译任务上,利用双向增强方法的正向和反向模型在测试集上的性能比只使用平行数据训练的Transformer模型分别增长了3.1和8.2个BLEU分值,证实了该方法的有效性。 Although neural machine translation has achieved great academic and industrial success,challenges persist under low-resourced conditions.In recent years,the working mechanism of using source or target monolingual text data within the framework of neural machine translation has attracted widespread attention.Among the array of supervised and unsupervised proposals,back translation is regarded as one of the most promising methods to improve the performance of low-resource neural machine translation.Although this method is simple and affordable,its effectiveness largely depends on the performance of the initial back translation model trained with existing bilingual parallel data.At present,due to the relatively limited scale of Tibetan-Chinese parallel data,to alleviate the difficulties in Tibetan-Chinese machine translation back translation practice under low-resource scenarios,this paper proposes to employ targetside(Chinese)monolingual data to improve both backward and forward models by step-wise adaptation of self-learning and back-translation(i.e.,Bidirectional Boost).By using 20M sentence-scale Chinese monolingual data on the Tibet-an-Chinese neural machine translation task,the proposed Bidirectional Boost model produces 3.1 and 8.2 BLEU scores,respectively,both on forward and backward models over vanilla Transformers trained on genuine parallel data,indicating the effectiveness of the proposed model.In addition,this method is language-agnostic and can be applied to translation tasks in other low-resource settings.

作者桑杰端珠才让加 SANGJIE Duanzhu;CAIRANG Jia(State Key Laboratory of Tibetan Intelligent Information Processing and Application,Qinghai Normal University,Xining Qinghai 810000,China;School of Computer Science,Qinghai Normal University,Xining Qinghai 810000,China;Tibetan Information Processing Engineering Technology Research Center of Qinghai Province,Qinghai Normal University,Xining Qinghai 810000,China)

机构地区青海师范大学省部共建藏语信息智能处理与应用国家实验室青海师范大学计算机学院青海师范大学青海省藏文信息处理工程技术研究中心

出处《计算机仿真》 2024年第8期544-548,共5页 Computer Simulation

基金青海省重点研发与转化计划项目(2022-GX-104) 青海省中央引导地方科技发展资金项目(2022ZY006)。

关键词机器翻译神经网络藏汉回译自学习 Machine translation Neural networks Tibetan-chinese Back-translation Self-learning

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1马洁.聚焦“五维双向”增强党组织“两个功能”[J].中国电力企业管理,2024(12):82-84.
2胡朝东,叶娜,张桂平,蔡东风.面向低资源场景的神经机器翻译方法[J].中文信息学报,2024,38(6):58-66.
3万飞.基于情感语义增强编解码的神经机器翻译方法[J].计算机技术与发展,2024,34(9):94-101.
4申姜.沙海蜃宫[J].民间故事选刊,2023(21):40-44.
5张宝兴.基于词典的低资源神经机器翻译数据增强方法[J].智能计算机与应用,2024,14(3):67-75.
6孟云聪,周光明,蔡登安.连续碳纤维3D打印圆形增强蜂窝的面内压缩性能[J].复合材料学报,2024,41(4):1776-1787.
7李泽宇,殷锋,陈赛飞扬,王小雪.基于对比学习的神经机器翻译研究[J].西南民族大学学报（自然科学版）,2024,50(4):436-440.
8沙孟海先生《题陆微昭育王山记游山水条幅》《董玄宰写智果论书卷跋(绫本)》[J].中国篆刻（钢笔书法）,2023(11):5-7.
9曹智泉,穆永誉,肖桐,李北,张春良,朱靖波.预训练神经机器翻译研究进展分析[J].中文信息学报,2024,38(6):1-23.
10阿布都克力木·阿布力孜,史亚庆,侯钰涛,张雨宁,阿力木江·亚森,哈里旦木·阿布都克里木.形态切分在维吾尔语机器翻译中的性能[J].厦门大学学报（自然科学版）,2024,63(4):694-704.

计算机仿真

2024年第8期

浏览历史

内容加载中请稍等...

自学习和回译的双向增强藏汉机器翻译方法

相关作者

相关机构

相关主题

浏览历史