摘要
汉语中地名歧义现象非常普遍。我国每个县级城市基本上都有一个镇名叫城关镇。在信息抽取、融合、知识图谱构建中,首先要解决地名歧义问题。运用最大生成树算法,提出了一种地点归一化的混合模式解决方法,其基本步骤为:基于CRF的地点命名实体识别;用最大生成树的图搜索算法进行地名消岐,如无法消岐,则通过半自动抽取计算缺省地名。对《人民日报》2013下半年相关数据进行测试显示,正确率为93.7%。
Ambiguity is very high for location names in the Chinese document . For example, there is a town called Cheng- guan zhen nearly in every city in China. Such ambiguity needs to be handled before we can refer to location names for Infor- mation Extraction and fusion, as well as knowledge base construction. In this paper presents a hybrid model approach for location normalization which combines location name entity recognition based on CRF;graph search for maximum spanning tree and integration of semi -automatically derived default senses. The results show that with our method. After the test for the second half of 2013 the People's Daily, it is concluded that the precision is 93.7%.
出处
《软件导刊》
2015年第7期26-29,共4页
Software Guide
基金
国家自然科学基金项目(61373116)
关键词
信息抽取
地点归一化
最大生成树
命名实体
歧义
Information Extraction. Location Normalization, Maximum Spanning Tree
Named Entity(NE)
Ambiguity