期刊文献+

基于词嵌入的地理知识库实体类别对齐方法研究 被引量:2

Word Embedding-based Method for Entity Category Alignment of Geographic Knowledge Base
原文传递
导出
摘要 地理知识库是地理实体及其相互间关系的集合,对于智能搜索、问答、推荐等知识服务有重要的支撑作用。然而,已有的地理知识库由于来源、形式、构建者等的不同,在实体地名、空间位置、类别等方面存在"同义异形"和"同形异义"的语义异构现象,影响了地理知识库间的知识融合与共享。语义对齐是解决语义异构的一种有效方法,其中实体类别对齐是语义对齐的基础,对于提高实体地名和空间位置的对齐精度具有重要作用。现有的实体类别对齐方法主要采用传统的字符相似度和结构相似度等来度量类别的相似度,无法捕捉实体类别深层次的语义相关性,从而影响了类别对齐的精确性。因此,本文提出了一种基于词嵌入的地理实体类别对齐方法,采用词嵌入模型从语料中学习实体类别的语义信息,并通过词向量来表达,以此弥补现有方法存在的缺失,进而提升实体对齐精度。进一步地,通过通用语料与地理信息语料的融合,本文实现了词嵌入模型所用语料在地理语义方面的增强,从而更精准地度量地理实体类别间的相关性。不同地理知识库实体类别对齐的实验表明,本文提出的方法能够有效捕捉地理实体类别的深层次语义信息,其实体类别对齐的调和平均值(F1)可达0.9568,有效提高了实体类别的对齐精度。 Geographic knowledge base is a collection of geographic entities and the relationships between them,which plays an important role in many applications of knowledge services,such as intelligent search,question answering,and recommendation.However,due to the differences in the data source,data form,and publisher,the existing geographical knowledge bases have the problems of homonym and homographs in the place name,spatial footprint,and feature type.Thus it leads to a barrier of the knowledge sharing and fusion between different geographic knowledge bases.Semantic alignment is an effective way to solve semantic heterogeneity,and the alignment of feature types is very important to further improve the accuracy of the alignments of place names and spatial footprints.The existing methods of feature type alignment mainly rely on the traditional similarity measures of string and structure of feature types that are unable to capture their deep semantic correlation,thereby influencing the alignment accuracy.Therefore,this paper proposes a word embedding based method to align the feature type.The proposed method uses the word embedding model to learn the semantic information of feature type from the corpus and represent the learned information as a vector,so as to capture the deep semantic information of feature type which cannot be obtained by using the existing methods,thereby increasing the alignment accuracy.Meanwhile,this paper enhances the geographic semantics of the corpus by the combination of the corpus of geographic information and the general corpus used in the word embedding model,which can help to more accurately measure the correlation of feature types.In the case study,the method is applied to align the feature types of different geographic knowledge bases.The results show that the average F1 score is up to 0.9568,and indicates the method can effectively capture the deep semantic information of geographic feature types,effectively improving the alignment accuracy of entity categories.
作者 徐召华 诸云强 宋佳 孙凯 王曙 XU Zhaohua;ZHU Yunqiang;SONG Jia;SUN Kai;WANG Shu(School of Architecture Engineering,Shandong University of Technology,Zibo 255000,China;State Key Laboratory of Resources and Environmental Information System,Institute of Geographic Sciences and Natural Resources,Chinese Academy of Sciences,Beijing 100101,China;Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application,Nanjing 210023,China;University of Chinese Academy of Sciences,Beijing 100049,China)
出处 《地球信息科学学报》 CSCD 北大核心 2021年第8期1372-1381,共10页 Journal of Geo-information Science
基金 国家自然科学基金面上项目(41771430) 国家自然科学基金重点项目(41631177) 中国科学院战略性先导科技专项(A类)(XDA23100100)。
关键词 地理知识库 语义异构 地理实体 实体类别 类别对齐 词嵌入 词向量 地理语料 相似度 geospatial knowledge base semantic heterogeneity geographical entity feature type type alignment word embedding word vector geographical corpus similarity
  • 相关文献

参考文献14

二级参考文献171

共引文献399

同被引文献47

引证文献2

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部