摘要
藏汉机器翻译有利于加强民族团结,有利于推进藏文信息化技术发展与突破不同语言之间的语言壁垒。藏汉神经机器翻译已经在很多翻译任务上获得了显著的提升效果,但它需要大规模的平行语料库作为支撑,而平行语料一直以来都面临着低资源语种匮乏的困境。论文希望通过同义词替换和回译两种数据增强策略的研究,为低资源条件下的藏汉机器翻译提供一个研究思路,从而促进藏区社会的发展。通过测试,藏汉机器翻译平均提升了4.59个BLEU值。
Tibetan-Chinese machine translation is conducive to strengthening national unity,promoting the development of Tibetan information technology and breaking through the language barriers between different languages.Tibetan-Chinese neural machine translation has achieved remarkable improvement in many translation tasks,but it needs large-scale parallel corpus as support,and parallel corpus has been faced with the dilemma of low resource language shortage.This paper hopes to provide a research idea for Tibetan-Chinese machine translation under low resource conditions by studying two data enhancement strategies of synonym replacement and back translation,so as to promote the development of Tibetan society.By testing,Tibetan-Chinese machine translation increased by an average of 4.59 BLEU values.
作者
杨丹
孙义栋
拥措
YANG Dan;SUN Yidong;YONG Cuo(School of Information Science and Technology,Tibet University,Lhasa 850000;State Key Laboratory of Artificial Intelligence for Tibetan Information Technology in Tibet Autonomous Region,Lhasa 850000;Ministry of Education Engineering Research Center for Tibetan Information Technology,Lhasa 850000)
出处
《计算机与数字工程》
2022年第11期2473-2477,共5页
Computer & Digital Engineering
基金
国家重点研发计划项目“藏文文献资源数字化技术集成与应用示范”(编号:No.2017YFB1402200)
西藏自治区科技创新基地自主研究项目(编号:XZ2021JR002G)资助。
关键词
藏汉神经机器翻译
数据增强
同义词替换
回译
Tibetan-Chinese neural machine translation
data augmentation
synonym replacement
back translation