期刊文献+

基于支持向量机的中国人名的自动识别 被引量:9

Auto Recognition of Person Names from Chinese Texts Based on Support Vector Machines
下载PDF
导出
摘要 提出并实现了一种基于支持向量机(SVM)的中文文本中人名的自动识别方法。对训练文本进行自动分词、词性标注及分类标注,然后按字抽取特征,并将其转化为二进制表示,在此基础上建立了训练集。然后通过对多项式Kernel函数的测试,得到了用支持向量机进行人名识别的机器学习模型。实验结果表明,所建立的SVM人名识别模型是有效的。 Based on the characteristics of person names in Chinese texts, a method of automatic recognition of Chinese person names using support vector machines (SVMs) is proposed. The character itself, character-based POS tag, the information whether a character appears in a last names table, the probability of a character's occurrence in person names and context information are extracted as the features of the vectors. Each sample is represented by a long binary vector, and thus a training set is established. The machine learning models of automatic identification of person names are obtained by testing polynomial Kernel functions. The results show that the models are efficient in identifying person names from Chinese texts. The recall, precision and F-measure are up to 92.14%, 96.43% and 94.24% respectively in open test.
出处 《计算机工程》 EI CAS CSCD 北大核心 2006年第19期188-190,201,共4页 Computer Engineering
基金 国家自然科学基金资助项目(60373095 60373096)
关键词 支持向量机 中文文本 人名识别 机器学习 Support vector machines(SVM) Chinese texts Recognition of person names Machine learning
  • 相关文献

参考文献5

二级参考文献16

  • 1孙茂松,黄昌宁,高海燕,方捷.中文姓名的自动辨识[J].中文信息学报,1995,9(2):16-27. 被引量:87
  • 2吴胜远.一种汉语分词方法[J].计算机研究与发展,1996,33(4):306-311. 被引量:49
  • 3孙茂松,黄昌宁,邹嘉彦,陆方,沈达阳.利用汉字二元语法关系解决汉语自动分词中的交集型歧义[J].计算机研究与发展,1997,34(5):332-339. 被引量:66
  • 4罗智勇,宋柔.现代汉语自动分词中专名的一体化、快速识别方法[C]//Ji Dong-Hong.国际中文电脑学术会议,新加坡,2001:323-328.
  • 5吴胜远.并行分词方法的研究[J].计算机研究与发展,1997,34(7):542-545. 被引量:13
  • 6Ji Heng, Luo Zhen-Shen. Inverse name frequency model and rules based on Chinese name identifying. In: Huang ChangNing, Zhang Pu ed.. Natural Language Understanding and Machine Translation. Beijing: Tsinghua University Press,2001, 123 - 128( in Chinese)(季姮,罗振声.基于反比概率模型和规则的中文姓名自动辨识系统.见:黄昌宁,张普编.自然语言理解与机器翻译.北京:清华大学出版社,2001,123-128)
  • 7Zhen Jia-Heng, Liu Kai-Ying. Discussion on strategy of surname and personal name processing in Chinese word segmentation. In: Chen Li-Wei ed.. Research and Application of Computational Linguistics. Beijing: Beijing Institute of Linguistics and Culture Press, 1993(in Chinese)(郑家恒刘开瑛.自动分词系统中姓氏人名的处理策略探讨.见:陈力为编.计算语言研究与应用.北京:北京语言学院出版社,1993)
  • 8Song Rou, Zhu Hong et al.. Approach of personal name recognition based on corpus and rules. In: Chen Li Wei ed.. Research and Application of Computational Linguistics. Beijing:Beijing Institute of Linguistics and Culture Press, 1993(in Chinese)(宋柔,朱宏等.基于语料库和规则库的人名识别法.见:陈力为编.计算语言研究与应用.北京:北京语言学院出版社,1993)
  • 9Wang Sheng, Huang De-Gen, Yang Yuan-Sheng. Chinese person name recognition based on mixture of statistics and rules.In: Huang Chang-Ning, Dong Zhen-Dong ed.. Corpora of Computational Linguistics. Beijing: Tsinghua University Press, 1999 (in Chinese)(王省,黄德根,杨元生.基于统计和规则相结合的中文姓名识别.见:黄昌宁,董振东编.计算语言学文集.北京:清华大学出版社,1999)
  • 10Chen Xiao-He. Automatic Analysis of Modern Chinese. Beijing: Beijing University Linguistics and Culture Press, 2000,104-114(in Chinese)(陈小荷.现代汉语自动分析.北京:北京语言文化大学出版社, 2000, 104-114 )

共引文献145

同被引文献118

引证文献9

二级引证文献43

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部