摘要
针对权重边剪枝(WEP)方法在准确率和匹配效率等方面的不足,通过引入自匹配和归并概念,提出一种基于二次归并的Deep Web实体匹配方法。首先,提取各对象的属性值,并按属性值重组对象,使具有相同属性值的对象聚集在一起,实现块的有效划分;其次,计算块内各对象间的匹配度,并据此进行剪枝、自匹配检测、归并,输出初步类簇;最后,以初步类簇为基础,利用簇内对象间传递的消息以及对象属性相似值,进一步挖掘匹配关系,触发新一轮的类簇归并与更新。实验结果表明,与WEP方法相比,所提方法通过自匹配检测,自动区分匹配关系并采取合适的匹配策略,使归并过程逐渐精化,提高了匹配准确率;通过分块、剪枝,有效缩减了匹配空间,提高了系统运行效率。
Concerning the limitations of the Weighted Edge Pruning (WEP) method in accuracy and matching efficiency, a Deep Web entity matching method based on twice-merging was proposed by introducing the concepts of self-matching and merging. Firstly, attribute values of each object were extracted to regroup objects for gathering objects with the same attribute value together, therefore, all objects could be divided into blocks efficiently. Secondly, the matching values between objects within a same block were calculated for pruning, self-matching detection, merging explicit matching to generate preliminary clusters. Finally, based on these preliminary clusters, matching relationships were further discovered by using the message passing between objects within a cluster and objects' attribute similarity values, which triggered a new round of cluster merging and updating. Experimental results show that compared with the WEP method, the proposed method, by detecting self- matching to automatically distinguish matching relationships and take the proper matching method, gradually refines the merging process to improve the matching accuracy; simultaneously, by blocking and pruning to effectively reduce the matching space, its system efficiency is improved.
出处
《计算机应用》
CSCD
北大核心
2016年第8期2139-2143,共5页
journal of Computer Applications
基金
全国教育信息技术研究课题资助项目(136241401)
浙江越秀外国语学院科研项目(N201375)~~