摘要
维吾尔语丰富而复杂的形态结构往往对维汉词对齐产生不良影响.如果将词尾丢弃只保留词干,虽然可以解决数据稀疏问题,但同时丢掉词尾中很多有意义的信息.为此,对词尾采用统一化形式并保留词尾是解决以上问题的方法之一,而这方法又带来句子长度过长的问题.针对以上问题,通过分析维汉两种语言的语法范畴的特点,提出选择性的保留词尾的分离—丢弃方案,并将此方案应用到维吾尔语名词上.实验数据表明,本文提出的方案不仅可行而且对提高词对齐正确率以及机器翻译质量起到了积极作用.
As a typical agglutinative language, the rich and complex morphological structure of Uyghur language has adverse effect on Uyghur-Chinese word alignment. It will be good methods that dropping all suffix and leave roots only, but it will cause lost most of useful information that suffix has. To solving this problem, we can use the method that unified suffix form for variants and do not drop it. However, it will cause another problem that the length of sentences will get longer. In this paper, we proposed splitting dropping scheme that leaving suffix selectively to solving these problems. After using this scheme on Noun in Uyghur language, the experiment results shows this method plays important role on improving Uyghur-Chinese word alignment and machine translation.
出处
《新疆大学学报(自然科学版)》
CAS
北大核心
2015年第4期469-474,共6页
Journal of Xinjiang University(Natural Science Edition)
基金
国家自然科学基金资助项目(61262061)
新疆维吾尔自治区科技计划项目(201423120)
关键词
词对齐
机器翻译
维吾尔语名词
维吾尔语
alignment
machine translation
Noun in Uyghur language
Uyghur language