摘要
维吾尔语形态较为复杂,构形词缀在维吾尔语中占有重要地位,其语法与汉语有较大差别。针对维吾尔语的形态特点,分析汉语端到维吾尔语端在统计机器翻译中维吾尔语词缀的作用,搭建基于短语的汉维统计机器翻译系统,对词级粒度、词干级粒度、最大词干级粒度、词干-词缀级粒度、词干-词尾级粒度的汉维平行语料库进行对比实验,研究不同粒度的维吾尔语对汉维机器翻译中的词语对齐质量和语言模型质量的影响。实验结果表明,在上述5种粒度的维吾尔语语料中,基于词干的维吾尔语和基于词干-词尾的维吾尔语目标端语料的翻译质量明显提高。
The Uyghur morphology is comparatively complex and the configuration affix plays a significant role in Uyghur,which is grammatically very different from Chinese.Aiming at the morphology characteristics of Uyghur,this paper analyzes the function of Uyghur affix in statistical machine translation from Chinese to Uyghur.A phrase-based Chinese-Uyghur statistical translation system is built to conduct comparative experiments on Chinese-Uyghur parallel corpus with different levels of granularity,such as the word level granularity,the stem level granularity,the maximum stem level granularity,the stem-affix level granularity and the stem-suffix level granularity.Then the influence of Uyghur with different granularity on words alignment quality and language model quality in Chinese-Uyghur machine translation is studied.Experimental results show that the translation quality of the stem-based and the stem-suffix based Uyghur target corpus is significantly improved.
作者
穆妮热·穆合塔尔
李晓
杨雅婷
MUNIRE·Muhetare;LI Xiao;YANG Yating(Xinjiang Technical Institute of Physics and Chemistry,Chinese Academy of Sciences,Urumqi 830011,China;University of Chinese Academy of Sciences,Beijing 100049,China;Xinjiang Laboratory of Minority Speech and Language Information Processing,Urumqi 830011,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2020年第2期309-314,共6页
Computer Engineering
基金
国家自然科学基金(U1703133)
中科院西部之光人才培养引进计划(2017-XBQNXZ-A-005)
中国科学院青年创新促进会项目(2017472)
新疆维吾尔自治区重大科技专项(2016A03007-3)
新疆维吾尔自治区高层次人才引进工程(Y839031201)
关键词
维吾尔语形态
构形词缀
词缀粒度
统计机器翻译
翻译质量
Uyghur morphology
configuration affix
affix granularity
statistical machine translation
translation quality