期刊文献+

基于多级阈值的中文人名识别 被引量:1

Chinese personal name recognition based on multilevel threshold
下载PDF
导出
摘要 在对大规模姓名样本库统计的基础上,研究了各种中文人名的姓氏、名字用字规律,并通过对大规模语料库的统计分析,得到了每个姓氏用字在真实文本中用作真实姓氏的概率及其上下文规律;针对汉族人名和少数民族人名及音译人名,分别提出了多级姓氏阈值和多级首字阈值的概念,并使用3σ法则确定阈值。实验结果表明,基于多级阈值的中文人名识别模型是有效的。 This paper presents the rules of surname words and name words of all kinds of Chinese personal names based on a large scale personal names hase.lt also shows the probability of all surname words being a surname and their contexts rules by making a statistics on a large scale corpus.In allusion to personal names of Chinese Han Natinnality,multilevel threshold of surname is proposed.In order to recognize personal names of Chinese minority nationalities and transliterated personal names,it pro- poses multilevel threshold of the first word of personal name as well,And these thresholds are chosen by 3σ rule.The results show that the model of multilevel threshold is effective in recognizing Chinese personal names.
出处 《计算机工程与应用》 CSCD 北大核心 2007年第33期1-3,18,共4页 Computer Engineering and Applications
基金 国家高技术研究发展计划(863)(the National High-Tech Researchand Development Plan of Chinaunder Grant No.2006AA012140)
关键词 自然语言处理 未登录词识别 中文人名识别 多级阈值 3σ法则 natural language processing unknnwn words reengnition Chinese personal name recognition multilevel threshold 3cr rule natural language processing unknnwn words reengnition Chinese personal name recognition multilevel threshold 3σ rule
  • 相关文献

参考文献8

二级参考文献28

共引文献191

同被引文献13

  • 1张素香,高国洋,戚银城.基于条件随机场的中国人名识别方法[J].郑州大学学报(理学版),2009,41(2):40-43. 被引量:7
  • 2李丽双,黄德根,毛婷婷,徐潇潇.基于支持向量机的中国人名的自动识别[J].计算机工程,2006,32(19):188-190. 被引量:9
  • 3Wang Zhengyan,He Houfeng,Li Sujian.The task 2 ofCIPS-SIGHAN 2012 named entity recognition and disambiguationin Chinese bakeoff[C]//Proceedings of the2nd CIPS-SIGHAN Joint Conference on Chinese LanguageProcessing,Tianjin,China,2012:108-114.
  • 4Carpineto C,Romano G.A survey of automatic queryexpansion in information retrieval[J].ACM ComputingSurveys(CSUR),2012,44(1).
  • 5Chiang D,Knight K,Wang W.11,001 new features forstatistical machine translation[C]//Proceedings of HumanLanguage Technologies:The 2009 Annual Conference ofthe North American Chapter of the Association for ComputationalLinguistics,2009:218-226.
  • 6Prettenhofer P,Stein B.Cross-language text classificationusing structural correspondence learning[C]//Proceedingsof the 48th Annual Meeting of the Association for ComputationalLinguistics,2010:1118-1127.
  • 7Patel A,Ramakrishnan G,Bhattacharya P.Relational learningassisted construction of rule base for Indian languageNER[C]//Proceedings of the 7th International Conferenceon Natural Language Processing(ICON’09),India,2009.
  • 8Li Lishuang,Li Zezhong,Ding Zhuoye,et al.A hybridmodel combining CRF with boundary templates for Chineseperson name recognition[J].International Journal ofAdvanced Intelligence,2010,2(1):73-80.
  • 9Wang Z X,Zhu X T,Lu Z.A context-aware automaticChinese transliterated person names recognition approach[C]//8th International Conference on Semantics,Knowledgeand Grids(SKG),Beijing,China,2012:143-149.
  • 10Lafferty J,McCallum A,Pereira F C N.Conditional randomfields:probabilistic models for segmenting and labelingsequence data[C]//Proceedings of the 18th InternationalConference on Machine Learning,2001:282-289.

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部