摘要
涉数的时间语素的词类问题一直是汉语学界争论的一个热点,这些语素在汉语词法自动分析中也是最容易造成混淆和产生不一致的元素。本文从中文信息处理中词法自动分析的角度剖析了涉数时间语素的词类归属,考察了1200万真实语料中涉数时间语素的词性标注和自动分词情况,并提出了改进的分词原则和词性标注原则。
The classification of the temporal morphemes is a controversial issue in the studies of Chinese grammar. These morphemes may produce a great deal of inconsistency of the word-segmentation and POS-tagging in the process of lexical auto-analysis. Based on an observation of the segmentation and POS-tagging results in a 12,000,000-Chinese-character corpus, this paper analyzes their categorical status from a perspective of Chinese lexical auto-analysis, and proposes the principles of their segmentation and POS-tagging.
出处
《语言教学与研究》
CSSCI
北大核心
2010年第3期84-90,共7页
Language Teaching and Linguistic Studies
关键词
涉数时间语素
词类
词法自动分析
词性标注
自动分词
temporal morphemes concerning numbers
word class
lexical auto-analysi,s
POS-tagging
word-segmentation