期刊文献+

一种基于扩展模式集的中国人名识别方法

An Expanded Pattern Set-Based Approach to Chinese Name Recognition
下载PDF
导出
摘要 由于中国人名形式复杂多样,且存在简称、别名等不规范形式,针对传统的中国人名识别方法对诸如人名简称或别名这类非完整形式中国人名识别尚不完善的问题,提出了一种基于扩展模式集的中国人名识别方法,通过扩展人名识别模式集,提高对于非完整形式的中国人名的识别效果。实验结果表明,该方法取得了较好的正确率和召回率,尤其对于非完整形式的中文人名识别取得了一定效果,促进了人名识别工作的完整性。 Named entity recognition is a foundational task in Chinese information processing.Entity identification is the extraction of proper nouns and numeric information from the text and classifies them into categories such as person,organization and location.The Chinese names appear with a higher frequency in Chinese texts,so as an important basic subject of named entity recognition,the study of Chinese names recognition can significantly improve the quality of Chinese information processing.The forms of Chinese names are complex and diverse,which can be short names,aliases and other non-standard forms of names.Since the traditional Chinese name recognition methods are not yet perfect,we propose a new recognition method based on the expanded pattern set,and improve the recognition accuracy of noncomplete Chinese names by expanding the set of recognition patterns.The main idea of this method is using role labeling to achieve Chinese name recognition.Firstly,through training of the corpus,we achieve the automatic role labeling and get the role sequence of the text.The role of each word is mainly based on the different roles in the composition of a person's name,such as family name,name,above,below,etc.Secondly,on the basis of the role sequence and the name recognition pattern set,the pattern matching algorithm is used to find the strings that match the name pattern defined by the name recognition pattern set from the text,and ultimately identify them as names.In this paper,the non-complete forms of names are fully considered,and the pattern set of name recognition is extended to adapt to more complex names.The experimental results demonstrate that the method is especially effective in recognition of noncomplete Chinese names,thereby promoting the integrity of name recognition.
作者 栾伟锋 张欢欢 LUAN Wei-feng;ZHANG Huan-huan(School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China)
出处 《华东理工大学学报(自然科学版)》 CAS CSCD 北大核心 2018年第3期425-430,共6页 Journal of East China University of Science and Technology
基金 上海市科委科研计划项目(17DZ1101003)
关键词 中国人名 非完整形式中国人名 角色标注 人名识别模式集 Chinese name non-complete Chinese name role labeling set of recognition patterns
  • 相关文献

参考文献6

二级参考文献56

共引文献113

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部