期刊文献+

面向关联数据的实体链接发现方法研究 被引量:7

Linked Data-Oriented Method of Entity Linking Discovery
下载PDF
导出
摘要 随着关联数据应用的不断深入,已有众多的数据集发布在网上,但目前已发布的关联数据集之间关联很少,为数据的共享使用带来不便。本研究提出一种基于统计学习方法进行关联数据集间实体识别及链接构建的方法。首先进行数据集间的实体匹配,采用基于K中心点聚类算法实现属性的聚合及关系发现,对具有高相关度的属性进行匹配关系描述,降低实体匹配时的属性匹配计算次数;其次对已匹配的属性进行实体属性值的相似度比较计算,实现实体间相似度的比较,在SILK框架下实现实体的链接构建工作,以达到实体链接发现的目的;最后通过实验验证,这一方法能降低数据集间实体匹配计算次数,提高实体链接的正确率,具有可行性及实用性。 The World Wide Web has been developed into a global data space, which links web data and database data. Linked data is one of the best tools to achieve this information evolution. Linked data publish data in a structured form to interlink resources. With the depth of linked data being deeply applied, more and more data are published on the web as linked data. The published web information also has been transformed into linked data in automatic or semi-automatic ways. Practically, there are still only a few connections between the released linked dataset, and it is inconvenient to share data. So based on the entity linking discovery, we can discover the real relation between entities, build the entity linking according to the publishing standard, realize the goal of discovering potential entity linking, enhance the interlinking between datasets, and then increase the accuracy of published linked data. In this thesis, a statistical learning method is proposed to recognize entities and build links across different linked datasets. Before the entities comparing computation, first, the method finds class correspondences to classify related entity attributes correspondences across datasets. It gives a matching relationship description for the high correlation attributes and reduces the calculation times to match entity attributes. Second, our method compares the similarity of entities based on calculating the similarity of the matched attributes, and builds entities' linking to complete the goal of linking discovery across different datasets. When to cluster the attributes correspondences, we use K-medoids clustering algorithm to discover the potential attributes correspondences. K-medoids clustering algorithm is mainly aimed at classifying property concepts and corresponding attributes that represent the same expression meanings between datasets. At last, the attributes can be compared and matched in groups. Then EDOAL language is used to define the clustered attributes and describe the correspondences relation between those attributes. According to the matching relation, we compare and calculate the similarity between entity attributes. Finally our method works out the linking under the SILK framework: mapping the property relationship to SILK scripts, building entities linking between datasets according to a preset confidence value, endowing entities with RDFs properties, and realizing entity links discovery between datasets. The thesis testifies different open linked datasets on the basis of linked data entity linking discovery method. The datasets mainly include IM@ OAIE2014( dataset Abox3) ,CKAN( dataset EUROSTAT)and GADM-RDF( dataset GADM), and data are used to cluster matched attributes and interlink entities. Through twice entity linking discovery process of experimental verification, experimental results show that K-medoids clustering algorithm calculates the similarity of entities matching between dissimilar properties can increase the number of entities links. The method already reaches the high accuracy rate and F values. So the proposed method can reduce the calculation times of matching entities across different datasets and improve the accuracy of physical links. It has high feasibility and practicability to solve this problem. 12 figs. 4 tabs. 19 refs.
作者 高劲松 周习曼 梁艳琪 GAO Jinsong ZHOU Ximan LIANG Yanqi
出处 《中国图书馆学报》 CSSCI 北大核心 2016年第6期85-101,共17页 Journal of Library Science in China
基金 国家社会科学基金一般项目"基于关联数据的知识创造中知识外化和融合机制研究"(编号:12BTQ039)的研究成果之一~~
关键词 关联数据 实体链接 数据链接 链接发现 Linked data. Entity linking. Data linking. Linking discovery.
  • 相关文献

参考文献2

二级参考文献29

  • 1Berners-Lee T. Design issues : Linked data [ EB/OL]. [ 2012 - 10 - 10]. http ://www. w3. org/DesignIssues/LinkedDa- ta. hunl.
  • 2W3C community projects: Linking open data[EB/OL]. [2010 -07 - 12]. http://esw, w3. org/topic/SweolG/TaskForc- es/CommunityProj ects/LinkingOpenData.
  • 3State of the LOD cloud [ EB/OL ]. [ 2011 - 09 - 10 ]. http ://www4. wiwiss, fu-berlirL de/lodcloud/state/.
  • 4Linked data: Open research problems [ C/OL ]// World Wide Web Confeince 2010. http://www, slideshare, net,/ j uansequeda/07- openresearchproblems.
  • 5Hassanzadeh O, Consens M. Linked movie data base [C]// Proceedings of LDOW21309. Madrid, Spain; April 2009.
  • 6Volz J, Bizer C, Gaedke M, et al. SILK-A link discovery framework for the Web of data[ C]//Proceedings of LDOW 20139. Madrid, Spain; 2009.
  • 7Isele R, Jentzsch A, Bizer C. SILK server-Adding missing links while consuming linked data[C]//Proceedings of 1st International Workshop on Consuming Linked Data (COLD 2010), Shanghai, China (November 2010).
  • 8Hassanzadeh O, Lim L, Kementsietsidis A, et al. A declarative framework for semantic link discovery over relational data[ C ]//Proceedings of the 18th international conference on World Wide Web. ACM, 2009:1101 -1102.
  • 9Ngomo A C N, Auer S. LIMES: A time-efficient approach for large-scale link discovery on the Web of data[C]//Pro- ceedings of the Twenty-Second international joint conference on Artificial Intelligence-Voltune Volume Three. AAAI Press, 2011:2312-2317.
  • 10Scharffe F, Liu Y B, Zhou C G. RDF-AI: An architecture for RDF datasets matching, fusion and interlink [C]//Pro- ceedings of the IJCAI 2009 workshop on Identity, reference, and knowledge representation (IR-KR), Pasadena, CA US, 2009,.

共引文献13

同被引文献39

引证文献7

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部