摘要
文章基于姓氏驱动和上下文信息,利用从真实姓名样本库和文本语料库中得到的大量统计数据,提出了一种中国姓名识别的分级加权筛选模型,利用基于这一模型的识别算法和冲突解决策略,实现中国人名的自动识别。通过从《人民日报》随机抽取的500个含有人名的句子进行测试,表明:中国姓名召回率达89.2%,精确率达93.15%。
The Chinese person names identification has important effect in many fields,for example information retrieval,machine translation and text proofread.This paper presents a hierarchy weighting model for Chinese person name identification.This model is based on the surname and context boundary information,and makes use of a large amount of statistical data,which are extracted from real name library and real text corpus.Using the algorithm based on this model and the strategy for solving contradiction,it bring the Chinese person names identification to pass.The test is carried out ,the testing sample,500sentences containing Chinese person names,are randomly extracted from the People's Daily News Corpus.The experiment shows that the recall and precision of this algorithm reaches89.2%and93.15%respectively.
出处
《计算机工程与应用》
CSCD
北大核心
2003年第4期62-65,共4页
Computer Engineering and Applications
基金
山西省青年科技研究基金资助项目