摘要
通过对中文机构名的语法语义特性进行分析,将中文机构名分成前部词和特征词,提出了一种基于统计的识别方法。使用成熟语料库的训练数据,计算候选机构名的特征词可信度、前部词首词可信度和前部词中部可信度,最终得到机构名构词可信度,并与给定阈值比较,实现了中文机构名识别,在开放性实验中,达到了85.57%的召回率和94.37%的准确率。
By analysing the syntactical and semantical characteristics of Chinese organization and dividing it into the forward word and the special word, an approach based on statistical method is put forward about Chinese organization automatic recognition. The credibilities of both the special word and the forward word for the candidate organization name are computed by using the data from the trained corpus to decide the final credibility of organization name. This final credibility is compared with the given threshold to decide whether it is an organization name. After the primary test, this method can get 85.57% recall, and 94.37% precision.
出处
《四川大学学报(自然科学版)》
CAS
CSCD
北大核心
2009年第3期613-617,共5页
Journal of Sichuan University(Natural Science Edition)
关键词
自然语言处理
中文机构名识别
前部词
特征词
natural language processing, Chinese organization recognition, forward word, special word