摘要
模式匹配是DeepWeb异构信息集成中的关键问题.介绍了一种整体性匹配方法,即同时发现大量模式,并一次性进行匹配。主要通过分析和比较两种已经存在的大规模模式匹配原型系统:MGS和DCM,结合它们核心算法的优点,提出一种新的基于数据挖掘技术的算法(Correlated-clustering)。该算法先利用积极相关发现组匹配,再通过概念相似度的计算聚类同义属性,最后进行匹配选择。实验结果表明,本算法全面、效率高,充分体现了整体性方法的思想。
Schema matching is a critical problem in Deep Web heterogeneous information integration. In this paper it introduces a holistic matching approach, which finds many schemas simultaneously and one-off matches them. We mainly analyzed and compared two existing large scale schema matching archetypal system:MGS and DCM, and proposed a new algorithm based on data mining, named as Correlated-clustering,which combines the advantages of the two existing systems. This algorithm first mines group attributes by positively correlated attributes, and then clusters the synonymous attributes by calculating the similarity of each two concepts, finally makes matching selection from above results. The experiment result shows the effectiveness and completeness of our algorithm, which demonstrates the conception of holistic schema matching.
出处
《计算机应用与软件》
CSCD
2009年第5期46-49,共4页
Computer Applications and Software
基金
国家自然科学基金项目(60673092)
江苏省高校自然科学基金项目(07KJD520187)