期刊文献+

高效的实体匹配结果消解算法

Efficient Algorithms for Resolving Conflicts in Entity Matching Results
下载PDF
导出
摘要 实体同一性检测问题,即实体识别问题,是数据质量领域一个比较热门的研究问题.利用运行在两个实体上的实体匹配算法求解实体识别问题是目前研究工作中最主要的一个思路.然而,实体匹配算法的输出结果中可能有"歧义",使得算法的输出很难直接转化为实体识别问题的结果.考虑如何利用额外的知识来消去这种"歧义",形式化定义了实体匹配结果消解问题.该问题被证明是NP-完全问题.一个基于线性规划的近似算法Round被给出,它的近似比是O(log n),针对特殊情况,一个随机近似算法KwikResolution被给出.考虑到两个算法各自的不足,4个直观的启发式算法被给出.实验结果验证了理论分析的结果,并且证明了给出的启发式算法是有效的. Entity identification problem is a popular topic in data quality research area.Utilizing entity matching algorithms to solve entity identification problem is a popular method.However,the result of entity matching algorithms has 'conflicts',such that it is hard to obtain solution for entity identification.Considering how to remove such conflicts using external information,the problem of resolving conflicts in entity matching results is formally defined and is proved to be NP-complete.Two approximation algorithms,Round and KwikResolution,are given,with approximation ratio O(log n) and 3L,respectively.Since those two algorithms have their drawbacks,four heuristic algorithms are given also.Experimental results verify the theoretical analysis of the algorithms,and show that the heuristic algorithms work well.
出处 《计算机研究与发展》 EI CSCD 北大核心 2013年第S1期239-247,共9页 Journal of Computer Research and Development
基金 国家"九七三"重点基础研究发展计划基金项目(2012CB316200)
关键词 实体匹配 实体同一性 消解 近似算法 启发式算法 entity matching entity consistency resolution approximation algorithms heuristic algorithms
  • 相关文献

参考文献4

  • 1Mauricio A. Hernández,Salvatore J. Stolfo.The merge/purge problem for large databases[J].ACM SIGMOD Record.1995(2)
  • 2Erik D. Demaine,Dotan Emanuel,Amos Fiat,Nicole Immorlica.Correlation clustering in general weighted graphs[J].Theoretical Computer Science.2006(2)
  • 3Moses Charikar,Venkatesan Guruswami,Anthony Wirth.Clustering with qualitative information[J].Journal of Computer and System Sciences.2004(3)
  • 4Omar Benjelloun,Hector Garcia-Molina,David Menestrina,Qi Su,Steven Euijong Whang,Jennifer Widom.Swoosh: a generic approach to entity resolution[J].The VLDB Journal.2009(1)

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部