期刊文献+

基于双语词典的远距离语对无监督神经机器翻译方法

Bilingual dictionary based unsupervised neural machine translation method for distant language pairs
下载PDF
导出
摘要 为了缓解大型平行语料库稀缺性对机器翻译质量的影响,无监督方法在神经机器翻译领域备受关注,但其在远距离语言对上的翻译表现仍有待提高。因此,文中引入了翻译语言模型(TLM)并提出了Dict-TLM方法。该方法的核心思想是结合单语语料和无监督双语词典训练语言模型。具体而言,模型首先接受源语言句子作为输入,然后,不同于传统TLM只接受平行语料,Dict-TLM模型还接受源语言句子通过无监督双语词典处理后的数据作为输入,在这种输入中,模型将源语言句子中在双语词典中出现的单词替换为相应的目标语言翻译词,重要的是,该方法中的双语词典是无监督获得的。实验表明,Dict-TLM相对于传统无监督机器翻译在中英语言对上提高了3个BLEU分数。 Unsupervised methods,which strives to alleviate the impact of the scarcity of large parallel corpora on the quality of machine translation,have attracted much attention in the field of neural machine translation.However,their translation performances in distant language pairs still need to be improved.Therefore,the translation language model(TLM)is introduced and the Dict-TLM method is proposed.The core idea of this method is to train language models by combining monolingual corpora and unsupervised bilingual dictionaries.Specifically,the model accepts source language sentences and takes them as the input first,and then,unlike the traditional TLM that only accepts parallel corpora,the Dict-TLM model even accepts data from source language sentences processed by unsupervised bilingual dictionaries and takes them as the input.In this input,the proposed model replaces the words that appear in the bilingual dictionary in the source language sentence with the corresponding target language translation words.Importantly,the bilingual dictionary is obtained in an unsupervised manner.The experiment shows that the Dict-TLM improves the BLEU score by 3%in comparison with the traditional unsupervised machine translation in Chinese English language pairs.
作者 黄孟钦 HUANG Mengqin(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
出处 《现代电子技术》 北大核心 2024年第7期161-164,共4页 Modern Electronics Technique
关键词 无监督神经机器翻译 远距离语言对 预训练 TLM 双语词典 双语词嵌入 unsupervised neural machine translation distant language pairs pre-training TLM bilingual dictionary bilingual word embedding
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部