期刊文献+

利用二次归并的Deep Web实体匹配方法

Deep Web entity matching method based on twice-merging
下载PDF
导出
摘要 针对权重边剪枝(WEP)方法在准确率和匹配效率等方面的不足,通过引入自匹配和归并概念,提出一种基于二次归并的Deep Web实体匹配方法。首先,提取各对象的属性值,并按属性值重组对象,使具有相同属性值的对象聚集在一起,实现块的有效划分;其次,计算块内各对象间的匹配度,并据此进行剪枝、自匹配检测、归并,输出初步类簇;最后,以初步类簇为基础,利用簇内对象间传递的消息以及对象属性相似值,进一步挖掘匹配关系,触发新一轮的类簇归并与更新。实验结果表明,与WEP方法相比,所提方法通过自匹配检测,自动区分匹配关系并采取合适的匹配策略,使归并过程逐渐精化,提高了匹配准确率;通过分块、剪枝,有效缩减了匹配空间,提高了系统运行效率。 Concerning the limitations of the Weighted Edge Pruning (WEP) method in accuracy and matching efficiency, a Deep Web entity matching method based on twice-merging was proposed by introducing the concepts of self-matching and merging. Firstly, attribute values of each object were extracted to regroup objects for gathering objects with the same attribute value together, therefore, all objects could be divided into blocks efficiently. Secondly, the matching values between objects within a same block were calculated for pruning, self-matching detection, merging explicit matching to generate preliminary clusters. Finally, based on these preliminary clusters, matching relationships were further discovered by using the message passing between objects within a cluster and objects' attribute similarity values, which triggered a new round of cluster merging and updating. Experimental results show that compared with the WEP method, the proposed method, by detecting self- matching to automatically distinguish matching relationships and take the proper matching method, gradually refines the merging process to improve the matching accuracy; simultaneously, by blocking and pruning to effectively reduce the matching space, its system efficiency is improved.
作者 陈丽君
出处 《计算机应用》 CSCD 北大核心 2016年第8期2139-2143,共5页 journal of Computer Applications
基金 全国教育信息技术研究课题资助项目(136241401) 浙江越秀外国语学院科研项目(N201375)~~
关键词 二次归并 DEEP WEB 实体匹配 类簇 相似值 twice-merging Deep Web entity matching cluster similarity value
  • 相关文献

参考文献14

  • 1陈丽君,林怀忠.一种用于深层网接口集成的模式匹配方法[J].计算机工程,2012,38(12):42-44. 被引量:2
  • 2KOPCKE H, RAHM E. Frameworks for entity matching: a compari- son [ J]. Data & Knowledge Engineering, 2010, 69(2): 197 - 210.
  • 3HAN X, SUN L, ZHAO J. Collective entity linking in Web text: a graph-based method [ C]//SIGIR '11: Proceedings of the 34th An- nual ACM SIG1R Conference on Research and development in Infor- mation Retrieval. New York: ACM, 2011:765-774.
  • 4RASTOGI V, DALVI N, GAROFALAKIS M. Large-scale collective entity matching [ J]. Proceedings of the VLDB Endowment, 2011, 4 (4) : 208 -218.
  • 5WANG Z, LI J, WANG Z, et al. Cross-lingual knowledge linking across Wiki knowledge bases [ C]// WWW '12: Proceedings of the 21st International Conference on Word Wide Web. New York: ACM, 2012:459-468.
  • 6FAN J, LU M, OOI B C, et al. A hybrid machine-crowdsourcing system for matching Web tables [ C]// Proceedings of the 2014 IEEE 30th International Conference on Data engineering. Washing- ton, DC: IEEE Computer Society, 2014:976-987.
  • 7崔晓军,肖红宇,丁立新.基于距离的自适应Web数据库记录匹配方法[J].武汉大学学报(理学版),2012,58(1):89-94. 被引量:5
  • 8LIU W, MENG X. A holistic solution for duplicate entity identifica- tion in deep Web data integration [ C]// SKG '10: Proceedings of the 2010 Sixth International Conference on Semantics, Knowledge and Grids. Washington, DC: IEEE Computer Society, 2010:267 - 274.
  • 9徐红艳,党晓婉,冯勇,李军平.基于BP神经网络的Deep Web实体识别方法[J].计算机应用,2013,33(3):776-779. 被引量:5
  • 10LIU W, MENG X, YANG J, et al. Duplicate identification in Deep Web data integration [ C]// WAIM '10: Proceedings of the l lth International Conference on Web-age Information Manage- ment, LNCS 6184. Berlin: Springer-Verlag, 2010:5-17.

二级参考文献57

  • 1凌妍妍,刘伟,王仲远,艾静,孟小峰.Deep Web数据集成中的实体识别方法[J].计算机研究与发展,2006,43(z3):46-53. 被引量:4
  • 2强保华,陈凌,余建桥,吴开贵,吴中福.基于BP神经网络的属性匹配方法研究[J].计算机科学,2006,33(1):249-251. 被引量:4
  • 3朱恒民,王宁生.一种改进的相似重复记录检测方法[J].控制与决策,2006,21(7):805-808. 被引量:12
  • 4王丽娟,关守义,王晓龙,王熙照.基于属性权重的Fuzzy C Mean算法[J].计算机学报,2006,29(10):1797-1803. 被引量:45
  • 5Hernandez M A, Stolfo S J. Real-world data is dirty: data cleansing and the merge/purge problem. Data Mining and Knowledge Discovery, 1988, 2(1): 9-37.
  • 6Hassanzadeh O, Sadoghi M, Miller R J. Accuracy of approx- imate string joins using grams//Proceedings of the Interna- tional Workshop on Quality in Databases (QDB). Vienna, Austria, 2007:11-18.
  • 7Hassanzadeh O. Benchmarking declarative approximate selection predicates[Ph. D. dissertation]. University of Toronto, Canada, 2007.
  • 8Whang Steven Euijong, Menestrina David, Koutrika Georgia et al. Entity resolution with iterative blocking//Proceedings of the 35th SIGMOD International Conference on Manage- ment of Data. Rhode Island, USA, 2009:219-231.
  • 9Weis M, Naumann F. Detecting duplicate objects in XML documents//Proeeedings of the IQIS. Paris, France, 2004: 10-19.
  • 10Weis Georgia, Naumann Felix. DogmatiX tracks down dupli- cates in XML//Proceedings of the ACM SIGMOD 2005. New York, USA, 2005:431-442.

共引文献30

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部