期刊文献+

维吾尔语复杂形态对汉维机器翻译的影响研究 被引量:2

Research on Influence of Uyghur Complex Morphology on Chinese-Uyghur Machine Translation
下载PDF
导出
摘要 维吾尔语形态较为复杂,构形词缀在维吾尔语中占有重要地位,其语法与汉语有较大差别。针对维吾尔语的形态特点,分析汉语端到维吾尔语端在统计机器翻译中维吾尔语词缀的作用,搭建基于短语的汉维统计机器翻译系统,对词级粒度、词干级粒度、最大词干级粒度、词干-词缀级粒度、词干-词尾级粒度的汉维平行语料库进行对比实验,研究不同粒度的维吾尔语对汉维机器翻译中的词语对齐质量和语言模型质量的影响。实验结果表明,在上述5种粒度的维吾尔语语料中,基于词干的维吾尔语和基于词干-词尾的维吾尔语目标端语料的翻译质量明显提高。 The Uyghur morphology is comparatively complex and the configuration affix plays a significant role in Uyghur,which is grammatically very different from Chinese.Aiming at the morphology characteristics of Uyghur,this paper analyzes the function of Uyghur affix in statistical machine translation from Chinese to Uyghur.A phrase-based Chinese-Uyghur statistical translation system is built to conduct comparative experiments on Chinese-Uyghur parallel corpus with different levels of granularity,such as the word level granularity,the stem level granularity,the maximum stem level granularity,the stem-affix level granularity and the stem-suffix level granularity.Then the influence of Uyghur with different granularity on words alignment quality and language model quality in Chinese-Uyghur machine translation is studied.Experimental results show that the translation quality of the stem-based and the stem-suffix based Uyghur target corpus is significantly improved.
作者 穆妮热·穆合塔尔 李晓 杨雅婷 MUNIRE·Muhetare;LI Xiao;YANG Yating(Xinjiang Technical Institute of Physics and Chemistry,Chinese Academy of Sciences,Urumqi 830011,China;University of Chinese Academy of Sciences,Beijing 100049,China;Xinjiang Laboratory of Minority Speech and Language Information Processing,Urumqi 830011,China)
出处 《计算机工程》 CAS CSCD 北大核心 2020年第2期309-314,共6页 Computer Engineering
基金 国家自然科学基金(U1703133) 中科院西部之光人才培养引进计划(2017-XBQNXZ-A-005) 中国科学院青年创新促进会项目(2017472) 新疆维吾尔自治区重大科技专项(2016A03007-3) 新疆维吾尔自治区高层次人才引进工程(Y839031201)
关键词 维吾尔语形态 构形词缀 词缀粒度 统计机器翻译 翻译质量 Uyghur morphology configuration affix affix granularity statistical machine translation translation quality
  • 相关文献

参考文献6

二级参考文献55

  • 1肖自乾,王弗雄,陈经优.基本路径测试方法之圈复杂度计算[J].软件导刊,2010,9(1):10-12. 被引量:8
  • 2阿依克孜.卡德尔,开沙尔.卡德尔,吐尔根.依布拉音.面向自然语言信息处理的维吾尔语名词形态分析研究[J].中文信息学报,2006,20(3):43-48. 被引量:22
  • 3苏新春,杨尔弘.2005年度汉语词汇统计的分析与思考[J].厦门大学学报(哲学社会科学版),2006,56(6):84-91. 被引量:13
  • 4Dyer C.Using Word Lattices to Improve Translation from Morphologically Complex Languages[EB/OL].(2007-04-20).http://www.ling.umd.edul-redpony/edinburgh.pdf.
  • 5Koehn P.Europarl:A Parallel Corpus for Statistical Machine Translation[C]//Proc.of the 10th Machine Translation Summit.Phuket,Thailand:[s.n.],2005.
  • 6Creutz M,Lagus K.Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Moffessor1.0[M].Berlin,Germany:Springer-Verlag,2005.
  • 7Koehn P,Och F J,Marcu D.Statistical Phrase-based Translation[C]//Proc.of HLTNAACL'03.Edmonton,Canada:[s.n.],2003:48-54.
  • 8Austin J L. How to Do Things With Words[M]. Oxford, Eng land: Oxford University Press, 1962.
  • 9Bar-Hillet J. Imnguage and Information--Selected Essays on Their Theory and Application[M]. Mass: Addison Wesley and J erusa lem Academic Press, 1964.
  • 10Chomsky N. Syntactic Structure[M]. The Hague: Mouton &. Co, 1957.

共引文献37

同被引文献5

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部