摘要
实现对大规模真实文本的处理是计算语言学今后的一个时期的战略目标。基于语料库的语言研究是计算语言学一个重要领域,这是由于语料库是最理想语言知识资源。为从语料库获取语言知识,必须在各个层次上对汉语语料库进行加工。本文讨论了汉语语料库的加工技术,即对语料库进行词法、句法和语义等方面的标注。其中。
The large scale authentic text processing becomes a strategic target of the computational linguistics. The linguistic research based on corpus is an important region of the conputational linguistics. This is because corpus is the most ideol resource of linguistic knouledge. Inorder to fbtain linguistic knowledge from Chinese corpus, we must process Chinese corpus at all levels. This paper discusses the process technology of Chinese corpus. The Chinese corpus must be annotated with part of speech, syntactic relation and semantic re1ation. Especially we introduce the Aystem of Chinese automatic word segmentation and the approach of parsing Chinses phrase's boundaries in detail.
出处
《杭州电子工业学院学报》
1996年第1期32-37,共6页
Journal of Hangzhou Institute of Electronic Engineering