列名与数值不确定情况下的模式匹配问题研究

Study on Schema Matching with Uncertain Column Names and Data Values

下载PDF

导出

摘要模式匹配是数据集成领域的一个重要研究内容,列名与数据值不确定是模式匹配中的一种常见情况,当前较普遍的方法是基于互信息及欧式空间距离。但该方法没有解决因属性相似度相同或相近而引起的错误匹配问题。针对该问题,提出了多重迭代筛选方法,首先确定两个关系模式中能一次性正确匹配的部分属性对,再从中求出最优属性对,然后给出基于条件互信息的匹配方法,利用最优属性对计算未匹配属性的条件互信息,进一步计算各属性之间的欧氏距离,最终得到匹配结果,从而解决了错误匹配问题。实验结果表明所提算法正确、有效。 Schema matching is an important research in the field of data integration. The uncertainty of column names and data values is a common situation. The common method at present dealing with schema matching problem is based on mutual information and Euclidean distance. But this method does not solve the mistaken matching problem caused by the identity or the high similarity of the attributes. To solve this problem, this paper proposed multiple iterative screen- ing method, which firstly, in two relation models, fixes some of the corrects attribute pairs in one time and then selects the best optimized attribute pair. Secondly, this paper lodged the method based on conditional mutual information, which utilizes the best optimized attribute pair to calculate the conditional mutual information of un-matched attributes and further calculates the Euclidean distance between each attribute. Finally, the matching result was acquired. The wrong matching problem was solved. The experiment result indicates the given algorithm is correct and effective.

作者黄冬梅冯恺赵丹枫郭颖新

机构地区上海海洋大学信息学院

出处《计算机科学》 CSCD 北大核心 2014年第8期85-89,共5页 Computer Science

基金国家自然科学基金资助项目(61272098) 科技部973项目(2012CB316200) 南北极环境综合考察与评估专项(CHINARE2012-04-07)资助

关键词不确定性模式匹配条件互信息 Uncertainty, Schema matching, Conditional mutual information

分类号 TP391.7 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献8

1翁年凤,刁兴春,曹建军,冯径.不确定模式匹配研究综述[J].计算机科学,2011,38(12):1-5. 被引量：4
2Doan A H,Halevy A Y.Semantic integration research in the database community:A brief survey[J].AI magazine,2005,26(1):83.
3Kang J,Naughton J F.On schema matching with opaque column names and data values[J].International Conference on Management of Data:Proceedings of the 2003 ACM SIGMOD international conference on Management of data,2003,9(12):205-216.
4Jaiswal A,Miller D J,Mitra P.Schema matching and embedded value mapping for databases with opaque column names and mixed continuous and discrete-valued data fields[J].ACM Transactions on Database Systems (TODS),2013,38 (1):2.
5Rabinovich B,Last M.Uninterpreted Semi Automatic Schema Matching Approach Using Inter-Attribute Dependencies[C]// NATO Workshop on Semantic Interoperability Framework.Oslo.Norway.2011.
6吕锋,王虹.信息理论与编码[M].北京:人民邮电出版社.2005.
7王萼芳,石生明.高等数学(第三版)[M].北京:高等教育出版社.2003.
8Chen W,Guo H,Zhang F,et al.Mining schema matching between heterogeneous databases[C]//2012 2nd International Conference on Consumer Electronics,Communications and Networks (CECNet).IEEE,2012:1128-1131.

二级参考文献39

1Doan A H, Halevy A Y. Semantic Integration Research in the Database Community: A Brief Survey [J]. AI Magazine, 2005, 26(1):83-94.
2Almarimi A, Pokorny J. Schema Management for Data Integration: A Short Survey [J]. Acta Polytechnica, 2005,45 (1).
3He Bin, Zhang Zhen, Chang K C-C. Knocking the Door to the Deep Web.. Integration Web Query Interfaces [C]//ACM SIGMOD. 2004.
4Shvaiko P, Euzenat J. Ten Challenges for Ontology Matching [C],//Proceedings of the 7th International Conference on Ontologics, Data Bases, and Applications of Semantics (ODBASE). 2008.
5Choi N, Song I-Y, Han H. A Survey on Ontology Mapping [J]. SIGMOD Record, 2006,35(3).
6Srivastava B, Koehler J. Web Service Composition-Current Solutions and Open Problems [C]//Proceedings of ICAPS. 2003.
7Aulbach S,Grust T, Jacobs D, et al. Multi-tenant Databases for Software as a Service.- Schema-Mapping Techniques [C]/,/ACM SIGMOD. 2008.
8Halevy A Y. Structures, Semantics and Statistics [C]//Proceedings of the 30th VLDB Conference. 2004.
9Halevy A Y. Why Your Data Don't Mix:Semantic Heterogeneity [J]. Queue,2005,3(8) :50-58.
10Gal A, Anaby-Tavor A, Trombetta A, et al. A framework for modeling and evaluating automatic semantic reconciliation[J].The VLDB Journal, 2003.

共引文献4

1曹光辉,鄂旭,顾新财,权强.时域图像加密算法的研究[J].辽宁工业大学学报（自然科学版）,2009,29(6):377-379.
2燕红文.基于Snort的改进BMH单模式匹配算法研究[J].计算机工程与应用,2012,48(31):78-81. 被引量：6
3吴旭婧,许勇,张亚楠.基于指纹模式匹配的无线传感器网络密钥预分配方案[J].计算机工程,2015,41(3):106-109. 被引量：3
4胡文彬,潘祝山,纪兆辉.模式匹配不确定性的多因素集结度量[J].智能系统学报,2015,10(2):286-292. 被引量：1

1彭兴媛,刘琼荪,王立威.基于条件互信息下聚类的朴素贝叶斯分类算法[J].云南大学学报（自然科学版）,2011,33(5):517-520. 被引量：4
2刘海燕,王超,牛军钰.基于条件互信息的特征选择改进算法[J].计算机工程,2012,38(14):135-137. 被引量：9
3王建林,王志海,王学玲.基于不完全数据的TAN学习算法[J].计算机工程与应用,2007,43(36):181-184. 被引量：1
4申昇,杨宏晖,王芸,潘悦,唐建生.联合互信息水下目标特征选择算法[J].西北工业大学学报,2015,33(4):639-643. 被引量：2
5金聪,金枢炜.面向图像识别的条件互信息特征选择方法[J].测试技术学报,2010,24(5):459-462.
6王卫玲,刘培玉,初建崇.一种改进的基于条件互信息的特征选择算法[J].计算机应用,2007,27(2):433-435. 被引量：23
7汤文伟,于威威.基于多标签数据的降维与分类算法的研究[J].现代计算机（中旬刊）,2016(5):3-9.
8李珍,江贵平.基于条件互信息量的随机蕨特征匹配算法[J].计算机工程与设计,2012,33(5):1908-1912. 被引量：4
9张云飞.浅谈威胁校园网络安全运行的原因及对策[J].计算机光盘软件与应用,2010(15):50-50.
10郭鹏.浅谈数字化办公中的安全防范[J].职业技术,2011(3):81-81.

计算机科学

2014年第8期

浏览历史

内容加载中请稍等...

列名与数值不确定情况下的模式匹配问题研究

参考文献8

二级参考文献39

共引文献4

相关作者

相关机构

相关主题

浏览历史