信息抽取中地点归一化研究

The Location Normalization in Information Extraction

下载PDF

导出

摘要汉语中地名歧义现象非常普遍。我国每个县级城市基本上都有一个镇名叫城关镇。在信息抽取、融合、知识图谱构建中,首先要解决地名歧义问题。运用最大生成树算法,提出了一种地点归一化的混合模式解决方法,其基本步骤为:基于CRF的地点命名实体识别;用最大生成树的图搜索算法进行地名消岐,如无法消岐,则通过半自动抽取计算缺省地名。对《人民日报》2013下半年相关数据进行测试显示,正确率为93.7%。 Ambiguity is very high for location names in the Chinese document . For example, there is a town called Cheng- guan zhen nearly in every city in China. Such ambiguity needs to be handled before we can refer to location names for Infor- mation Extraction and fusion, as well as knowledge base construction. In this paper presents a hybrid model approach for location normalization which combines location name entity recognition based on CRF;graph search for maximum spanning tree and integration of semi -automatically derived default senses. The results show that with our method. After the test for the second half of 2013 the People＇s Daily, it is concluded that the precision is 93.7%.

作者孙雪闵李晓戈周晓辉

机构地区西安邮电大学计算机学院

出处《软件导刊》 2015年第7期26-29,共4页 Software Guide

基金国家自然科学基金项目(61373116)

关键词信息抽取地点归一化最大生成树命名实体歧义 Information Extraction. Location Normalization, Maximum Spanning Tree Named Entity（NE） Ambiguity

分类号 TP301 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献16

1HIRST, GRAEME. Semantic interpretation and the iesolution of ambiguity [M]. Cambridge University Press, 1998.
2MCROY, SUSAN W. Using multiple knowledge sources for word sense discrimination[J]. Computational Linguistics, 1992,18 ( 1 ) : 1 -30.
3NG, HWEE TOU, HIAN BENG LEE. Integrating multiple knowl-edge sources to disambiguate word sense: an exemplar-based ap- proach[C]. In proceedings of 34th Annual Meeting of the Associa- tion for Computational Linguistics, 1996.
4DAGON, IDO , ALON ITAI. Word sense disambiguation using a second language monolingual corpus [J].Computational Linguis- tics,1994, 20(4): 563- 596.
5YAROWSKY, DAVID. Word -sense disambiguation using statis- tical models of roget's categories trained on large eorpora[C]. In Proceedings of the 14th International Conference on Computational Linguistics (COLING- 92), 1992.
6YAROWSKY, DAVID. Unsupervised word sense disambiguation rivaling supervised methods[C]. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, 1995.
7谷川,周宏宇,于江德.融合多特征的中文产品命名实体识别[J].科学技术与工程,2013,21(31):9417-9421. 被引量：7
8李广一,王厚峰.基于多步聚类的汉语命名实体识别和歧义消解[J].中文信息学报,2013,27(5):29-34. 被引量：17
9陈钰枫,宗成庆,苏克毅.汉英双语命名实体识别与对齐的交互式方法[J].计算机学报,2011,34(9):1688-1696. 被引量：16
10乐娟,赵玺.基于HMM的京剧机构命名实体识别算法[J].计算机工程,2013,39(6):266-271. 被引量：16

二级参考文献93

1张晓艳,王挺,陈火旺.命名实体识别研究[J].计算机科学,2005,32(4):44-48. 被引量：65
2刘非凡,赵军,吕碧波,徐波,于浩,夏迎炬.面向商务信息抽取的产品命名实体识别研究[J].中文信息学报,2006,20(1):7-13. 被引量：47
3俞鸿魁,张华平,刘群,吕学强,施水才.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27(2):87-94. 被引量：151
4周俊生,戴新宇,尹存燕,陈家骏.基于层叠条件随机场模型的中文机构名自动识别[J].电子学报,2006,34(5):804-809. 被引量：112
5余鸿魁,张华平.基于角色标注的中文机构名识别[C].20th International Conference on Computer Processing of Oriental Languages, Shenyang, China, 2003.79 - 87.
6Li Huifeng,Srihari R K,Niu Cheng,et al.Location Normalization for Information Extraction[C]//Proc.of the 19th International Conference on Computational Linguistics.Taipei,China:[s.n.],2002.
7Lafferty J,McCallum A,Pereira F.Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data[C]//Proc.of the 18th International Conference on Machine Learning.San Francisco,USA:[s.n.],2001.
8Huang Fei, Vogel S, Waibel A. Automatic extraction of named entity translingual equivalence based on multi-feature cost minimization//Proceedings of the 2003 Annual Confer- ence of the ACL, Workshop on Multilingual and Mixed-lan- guage Named Entity Recognition. Sapporo, Japan, 2003: 184-192.
9Al-Onaizan Y, Knight K. Translating named entities using monolingual and bilingual resources//Proceedings of the 40th Annual Meeting of the Association for Computational Lin- guistics (ACL). Philadelphia, PA, USA, 2002:400 -408.
10Feng Donghui, Lv Yajuan, Zhou Ming. A new approach for English Chinese named entity alignment//Proceedings of the Conference on Empirical Methods in Natural Language Pro cessing (EMNLP 2004). Barcelona, 2004 : 372-379.

共引文献121

1梁鸿翔,余辉,颉明明,张博羿.面向刑事案件情节判定的知识库构建技术[J].数据通信,2020(6):35-40. 被引量：1
2亢孟军,曹浩杰,苏世亮,翁敏,王明军.一种优化的自然语言空间查询转换模型[J].测绘科学,2022,47(7):194-200.
3Zhixiang Ji,Xiaohui Wang,Changyu Cai,Hongjian Sun.Power entity recognition based on bidirectional long short-term memory and conditional random fields[J].Global Energy Interconnection,2020,3(2):186-192. 被引量：7
4刘重来.论卢作孚“乡村现代化”思想[J].西南师范大学学报（人文社会科学版）,2000,26(2):134-139. 被引量：6
5张瑞霞,杨国增,闫新庆.基于知网的汉语普通未登录词语义分析模型[J].计算机应用与软件,2012,29(8):126-130. 被引量：4
6李静,程文娟,杨超宇.机器翻译对网络信息安全的影响研究[J].中国科技论坛,2013(12):129-134.
7关晓炟,吕学强,李卓,郑略省.用户查询日志中的中文机构名识别[J].现代图书情报技术,2014(1):72-78. 被引量：4
8虞金中,杨先凤,陈雁,李娟.基于混合模型的新闻事件要素提取方法[J].计算机系统应用,2018,27(12):169-174. 被引量：2
9秦娅,申国伟,赵文波,陈艳平.基于深度神经网络的网络安全实体识别方法[J].南京大学学报（自然科学版）,2019,55(1):29-40. 被引量：18
10汪泱,古丽拉.阿东别克,户冰心,牛宁宁.基于条件随机场的哈萨克语基本短语自动识别[J].计算机工程与设计,2014,35(10):3602-3607. 被引量：3

1崔新波,张琳.基于招生问答系统的中文依存句法分析[J].现代计算机,2010,16(5):42-44.
2蒋强荣,张鸿宾,路倩倩.基于生成树的人脸识别[J].北京工业大学学报,2012,38(1):110-114. 被引量：1
3刘运通,孙华.基于动态规划的简单语义单元词义消歧[J].计算机工程与设计,2014,35(4):1480-1485. 被引量：1
4杜艳新,葛洪伟,肖志勇.基于模糊连接度的近邻传播聚类图像分割方法[J].计算机应用,2014,34(11):3309-3313. 被引量：3
5张滢,冯筠,赵翊凯,康宝生,贺小伟.结肠中心线快速提取算法研究[J].计算机辅助设计与图形学学报,2013,25(3):381-389. 被引量：1
6任爱芝.基于最大生成树的传感器任务分配方法研究[J].中北大学学报（自然科学版）,2012,33(4):471-473.
7辛霄,范士喜,王轩,王晓龙.基于最大熵的依存句法分析[J].中文信息学报,2009,23(2):18-22. 被引量：11
8刘芳.最大生成树聚类算法研究[J].软件导刊,2015,14(5):68-70. 被引量：1
9郭童,林峰.基于混合遗传鱼群算法的贝叶斯网络结构学习[J].浙江大学学报（工学版）,2014,48(1):130-135. 被引量：7
10蒋强荣.基于生成树的回路核[J].郑州大学学报（理学版）,2010,42(3):46-50.

软件导刊

2015年第7期

浏览历史

内容加载中请稍等...

信息抽取中地点归一化研究

参考文献16

二级参考文献93

共引文献121

相关作者

相关机构

相关主题

浏览历史