期刊文献+

基于LSTM的蒙汉机器翻译的研究 被引量:7

Mongolian-Chinese machine translation based on LSTM
下载PDF
导出
摘要 由于内蒙古地区蒙汉机器翻译水平落后、平行双语语料规模较小,利用传统的统计机器翻译方法会出现数据稀疏以及训练过拟合等问题,导致翻译质量不高。针对这种情况,提出基于LSTM的蒙汉神经机器翻译方法,通过利用长短时记忆模型构建端到端的神经网络框架并对蒙汉机器翻译系统进行建模。为了更有效地理解蒙古语语义信息,根据蒙古语的特点将蒙古文单词分割成词素形式,导入模型,并在模型中引入局部注意力机制计算与目标词有关联的源语词素的权重,获得蒙古语和汉语词汇间的对齐概率,从而提升翻译质量。实验结果表明,该方法相比传统蒙汉翻译系统提高了翻译质量。 Due to the small scale of Mongolian-Chinese bilingual parallel corpus and problems such as sparse data and over fitting of data training,the translation quality of traditional statistical machine translation methods for Mongolian-Chinese translation needs to be improved.In view of this situation,we propose a Mongolian-Chinese neural machine translation method based on LSTM.It constructs an end-to-end neural network frame by using the long and short memory model and models the Mongolian-Chinese machine translation system.In order to understand Mongolian sematic information more effectively,Mongolian words are divided into morphemes according to the characteristics of Mongolian language,which are then introduced into the model.Besides,the local attention mechanism is introduced into the model to calculate the weight of the source morphemes that are associated with the target word to achieve the probability of alignment between Mongolian and Chinese vocabularies and improve the translation quality.Experimental results show that compared with the traditional Mongolian-Chinese translation system,the proposed method obtains better translation quality.
作者 刘婉婉 苏依拉 乌尼尔 仁庆道尔吉 LIU Wan-wan;SU Yi-la;WU Ni-er;RENQING Dao-er-ji(College of Information Engineering,Inner Mongolia University of Technology,Hohhot 010080,China)
出处 《计算机工程与科学》 CSCD 北大核心 2018年第10期1890-1896,共7页 Computer Engineering & Science
基金 国家自然科学基金(61363052 61502255) 内蒙古自治区自然科学基金(2016MS0605) 内蒙古民族事务委员会基金(MW-2017-MGYWXXH-03)
关键词 注意力 端到端模型 机器翻译 蒙汉 LSTM神经网络 attention end-to-end model machine translation Mongolian-Chinese LSTM neural network
  • 相关文献

参考文献7

二级参考文献71

  • 1刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429. 被引量:198
  • 2陈小荷.用基于词的二元模型消解交集型分词歧义[J].南京师大学报(社会科学版),2004(6):109-113. 被引量:7
  • 3付雷,刘群.单纯形算法在统计机器翻译Re-ranking中的应用[J].中文信息学报,2007,21(3):28-33. 被引量:2
  • 4侯宏旭,刘群,那顺乌日图.基于实例的汉蒙机器翻译[J].中文信息学报,2007,21(4):65-72. 被引量:16
  • 5Sonja Niessen, Hermann Ney. Statistical Machine translation with Scarce Resources Using Morphosyntatic Information [J]. Computational Linguistics, 2004,30(2) : 181-204.
  • 6Mei Yang, Katrin Kirchhoff. Phrase-based Backoff Models for Machine Translation of Highly Inflected Languages[C]// Proceedings of EACL. 2006: 41-48.
  • 7Young-Suk Lee. Morphological analysis for statistical machine translation[C]//Proceedings of HLT-NAACL 2004-Companion Volume. 2004: 57-60.
  • 8Andreas Zollmann, Ashish Venugopal, Stephan Vogel. Bridging the Inflection Morphology Gap for Arabic Statistical Machine Translation [C]//Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume. 2006: 201-204.
  • 9Maja Popovic, Hermann Ney. Towards the Use of Word Stems and Suffixes for Statistical Machine Translation[C]//Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC). 2004:1585- 1588.
  • 10Sharon Goldwater, David McClosky. Improving Statistical MT Through Morphological Analysis[C]// Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing. 2005 : 676-683.

共引文献42

同被引文献55

引证文献7

二级引证文献47

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部