摘要
该文简要回顾了中文信息处理30年的主要成果,以及近20年来中文信息处理中的计算语言学研究的状况。该文分析了汉语与英语的主要差异,讨论了语言的共性与个性。该文表示了对于中文大规模语料的词性标注、树库建设的质疑。该文提出未来的中文语言资源建设的一些设想,期望一些新的尝试,提出以语义取代现有的句法,以深度标注取代现有的浅层标注,具体将包括标注的目标的定点化,内容的多样化,步骤的阶段化,标注人员的大众化、群体化。文章还提出了未来发展的关键点:技术的融合,人本计算。
This paper briefly depicts major achievements in the Chinese language information processing and roughly reviews the computational linguistic research for the recent 20 years in China.The author questions the current methodologies such as POS tagging and treebank for Chinese.The paper presents some new ideas about the construction of Chinese data resources.The authors suggest that for Chinese we should address deep and semantic annotation instead of current shallow and syntactic one.The future annotation will include targeted tackling,diversified content,stepwise procedure,and non-professional annotators.The paper predicts some eye-catching features: merge of technologies and human-centered computing.
出处
《中文信息学报》
CSCD
北大核心
2011年第6期3-11,共9页
Journal of Chinese Information Processing
关键词
中文信息处理
语言数据资源
语料标注
句法
语义
Chinese information processing
linguistic data resources
annotation
syntax
semantics