期刊文献+

信息抽取中地点归一化研究

The Location Normalization in Information Extraction
下载PDF
导出
摘要 汉语中地名歧义现象非常普遍。我国每个县级城市基本上都有一个镇名叫城关镇。在信息抽取、融合、知识图谱构建中,首先要解决地名歧义问题。运用最大生成树算法,提出了一种地点归一化的混合模式解决方法,其基本步骤为:基于CRF的地点命名实体识别;用最大生成树的图搜索算法进行地名消岐,如无法消岐,则通过半自动抽取计算缺省地名。对《人民日报》2013下半年相关数据进行测试显示,正确率为93.7%。 Ambiguity is very high for location names in the Chinese document . For example, there is a town called Cheng- guan zhen nearly in every city in China. Such ambiguity needs to be handled before we can refer to location names for Infor- mation Extraction and fusion, as well as knowledge base construction. In this paper presents a hybrid model approach for location normalization which combines location name entity recognition based on CRF;graph search for maximum spanning tree and integration of semi -automatically derived default senses. The results show that with our method. After the test for the second half of 2013 the People's Daily, it is concluded that the precision is 93.7%.
出处 《软件导刊》 2015年第7期26-29,共4页 Software Guide
基金 国家自然科学基金项目(61373116)
关键词 信息抽取 地点归一化 最大生成树 命名实体 歧义 Information Extraction. Location Normalization, Maximum Spanning Tree Named Entity(NE) Ambiguity
  • 相关文献

参考文献16

  • 1HIRST, GRAEME. Semantic interpretation and the iesolution of ambiguity [M]. Cambridge University Press, 1998.
  • 2MCROY, SUSAN W. Using multiple knowledge sources for word sense discrimination[J]. Computational Linguistics, 1992,18 ( 1 ) : 1 -30.
  • 3NG, HWEE TOU, HIAN BENG LEE. Integrating multiple knowl-edge sources to disambiguate word sense: an exemplar-based ap- proach[C]. In proceedings of 34th Annual Meeting of the Associa- tion for Computational Linguistics, 1996.
  • 4DAGON, IDO , ALON ITAI. Word sense disambiguation using a second language monolingual corpus [J].Computational Linguis- tics,1994, 20(4): 563- 596.
  • 5YAROWSKY, DAVID. Word -sense disambiguation using statis- tical models of roget's categories trained on large eorpora[C]. In Proceedings of the 14th International Conference on Computational Linguistics (COLING- 92), 1992.
  • 6YAROWSKY, DAVID. Unsupervised word sense disambiguation rivaling supervised methods[C]. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, 1995.
  • 7谷川,周宏宇,于江德.融合多特征的中文产品命名实体识别[J].科学技术与工程,2013,21(31):9417-9421. 被引量:7
  • 8李广一,王厚峰.基于多步聚类的汉语命名实体识别和歧义消解[J].中文信息学报,2013,27(5):29-34. 被引量:17
  • 9陈钰枫,宗成庆,苏克毅.汉英双语命名实体识别与对齐的交互式方法[J].计算机学报,2011,34(9):1688-1696. 被引量:16
  • 10乐娟,赵玺.基于HMM的京剧机构命名实体识别算法[J].计算机工程,2013,39(6):266-271. 被引量:16

二级参考文献93

  • 1张晓艳,王挺,陈火旺.命名实体识别研究[J].计算机科学,2005,32(4):44-48. 被引量:65
  • 2刘非凡,赵军,吕碧波,徐波,于浩,夏迎炬.面向商务信息抽取的产品命名实体识别研究[J].中文信息学报,2006,20(1):7-13. 被引量:47
  • 3俞鸿魁,张华平,刘群,吕学强,施水才.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27(2):87-94. 被引量:151
  • 4周俊生,戴新宇,尹存燕,陈家骏.基于层叠条件随机场模型的中文机构名自动识别[J].电子学报,2006,34(5):804-809. 被引量:112
  • 5余鸿魁,张华平.基于角色标注的中文机构名识别[C].20th International Conference on Computer Processing of Oriental Languages, Shenyang, China, 2003.79 - 87.
  • 6Li Huifeng,Srihari R K,Niu Cheng,et al.Location Normalization for Information Extraction[C]//Proc.of the 19th International Conference on Computational Linguistics.Taipei,China:[s.n.],2002.
  • 7Lafferty J,McCallum A,Pereira F.Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data[C]//Proc.of the 18th International Conference on Machine Learning.San Francisco,USA:[s.n.],2001.
  • 8Huang Fei, Vogel S, Waibel A. Automatic extraction of named entity translingual equivalence based on multi-feature cost minimization//Proceedings of the 2003 Annual Confer- ence of the ACL, Workshop on Multilingual and Mixed-lan- guage Named Entity Recognition. Sapporo, Japan, 2003: 184-192.
  • 9Al-Onaizan Y, Knight K. Translating named entities using monolingual and bilingual resources//Proceedings of the 40th Annual Meeting of the Association for Computational Lin- guistics (ACL). Philadelphia, PA, USA, 2002:400 -408.
  • 10Feng Donghui, Lv Yajuan, Zhou Ming. A new approach for English Chinese named entity alignment//Proceedings of the Conference on Empirical Methods in Natural Language Pro cessing (EMNLP 2004). Barcelona, 2004 : 372-379.

共引文献121

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部