摘要
在维吾尔语与汉语的机器翻译的研究中,词性标注起到很大的作用,词性标注也是自然语言处理的基础性工作。介绍基于隐马尔可夫模型的词性标注算法和词性标注器Citar,并且将Citar标注器应用到维吾尔语上进行词性标注。为了能对维吾尔语进行词性标注,在在布朗词性标注集的基础上,定义一套适用于维吾尔语的词性标注集,采用基于隐马尔可夫模型的方法,对部分维吾尔语进行词性标注实验,经过实验表明,Citar标注器能准确对维吾尔语进行词性标注,从而表明此标注器适用于维吾尔语。
The part-of-speech tagging plays a very important role in the research on machine translation in Uyghur and Chinese. The part-of- speech tagging is the groundwork for natural language processing. Introduces the part-of-speech tagging algorithm based on HMM and the part-of-speech tools named Citar, improves Citar in order to make the part-of-speech tagging tools apply to the Uyghur. On the basis of brown part-of-speech tagging sets, defines part-of-speech tagging sets used in the Uyghur for the part-of-speech tagging of Uyghur. Uses the method based on hidden Markov model, carried out the part of speech tagging experiment. The experiment result show that Citar has a good result on the part-for-speech tagging of Uyghur and the label machine is suitable for the Uyghur.
作者
李萍
杨勇
赛买提.艾力
任鸽
LI Ping YANG Yong SAI Mai Ti. Ai Li REN Ge(College of Computer Science and Technology, Xinjiang Normal University, Urumqi 83005)
出处
《现代计算机》
2017年第5期11-14,共4页
Modern Computer
基金
新疆师范大学优秀青年教师科研启动基金项目(No.XJNU201420)