摘要
维吾尔语作为一种典型的黏着语,通过丰富的功能词缀来表达各种语法和语气。该文探讨了"词干词性标注方法"与"词缀词性标注方法"在维吾尔语自然语言处理中的优缺点。在大规模语料库中,统计了常用词缀串的数量、频次和覆盖度,以此来判断词缀词性标注方法在自然语言处理中的可行性。以力提甫·托乎提教授的维吾尔语生成语法理论为指导,对词缀串的词性标注进行了相应的语法定义,并且在实际语料中进行了小规模词性标注实验。该文提出的基于词缀串的词性标注方法不仅适用于维吾尔语,也适用于有着大量相似词缀的突厥语族其他语言。
As a typical agglutinative language,Uyghur have rich suffixes to express syntax and mood.This paper contrast two kinds of POS-Tagging method in Uyghur language processing:one is POS-Tagging based on the stem words,the other is based on the suffixes.We statistics the sum,the frequency,and the cover degree of common functional suffix strings in a big corpus,aim to judge the feasibility of POS-Tagging method based on suffix strings.We define the regulation of suffix POS-Tagging based on the theory of Prof.Litip Tohti and label some corpus based on this kind of POS-Tagging definition,which is not only useful to Uyghur,but also to other Turkic languages which have much similar suffixes.
出处
《中文信息学报》
CSCD
北大核心
2013年第5期179-183,共5页
Journal of Chinese Information Processing