期刊文献+

基于SimHash和混合相似度的多模式匹配方法 被引量:3

Multiple schema matching method based on Sim Hash and mixed similarity
下载PDF
导出
摘要 为了解决多源异构民航旅客服务数据集成过程中存在多模式匹配的效率不高、精确性不足、完整模式信息获取难度较大等问题,提出了一种基于Sim Hash和混合相似度的多模式匹配方法。该方法首先基于PMI计算特征单元权重,并通过Sim Hash算法构造属性列的签名来表示属性特征,以降低特征维度,进而引入K-means++算法对属性聚类并生成候选匹配集。最后基于属性的混合相似度构建属性映射图,以直观的方式展示属性间的匹配关系,同时提高多模式匹配效率。实验结果表明该方法具有可行性,为高效地解决多源异构民航旅客服务数据集成中的模式冲突问题提供新的解决方案。 In order to solve the problems of multiple schema matching in the process of integrating multi-source heterogeneous civil aviation passenger service data,such as low efficiency,low accuracy and the complexity of obtaining complete schema information,this paper proposed the multiple schema matching method based on SimHash and mixed similarity.Firstly,the method calculated the weight of feature units based on PMI,and generated the signature of columns by SimHash to represent attribute features to reduce feature dimension.Further,it employed K-means++to generate candidate matching sets by clustering the columns.Finally,it constructed the mapping graph of attributes based on attributes’mixed similarity,and displayed the matching relationship between attributes intuitively.Meanwhile,it improved efficiency of multiple schema matching.The experimental results verify the feasibility of the proposed method.The method provides a new solution for efficiently resolving the schema conflict in the process of integrating multi-source heterogeneous civil aviation passenger service data.
作者 曹卫东 胡炜 王家亮 王静 Cao Weidong;Hu Wei;Wang Jialiang;Wang Jing(College of Computer Science&Technology,Civil Aviation on University of China,Tianjin 300300,China)
出处 《计算机应用研究》 CSCD 北大核心 2020年第1期198-202,共5页 Application Research of Computers
基金 民航局科技创新引导资金重大专项资助项目(MHRD20150107,MHRD20160109) 中央高校基本业务费资助项目(3122014C017).
关键词 多模式匹配 签名 点互信息 混合相似度 属性映射图 multiple schema matching signature PMI mixed similarity attribute mapping graph
  • 相关文献

参考文献4

二级参考文献14

共引文献20

同被引文献45

引证文献3

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部