结合属性分布特征的模式匹配算法

Schema Matching Incorporating Attribute Distribution Features

下载PDF

导出

摘要该文提出了一种结合属性分布特征的Web模式匹配算法,属性分布特征包括属性对互斥特征和属性对共现特征。属性对互斥特征由属性对的互斥性和出现次数计算得出,这个特征隐含了属性对的语义相似程度。为了充分利用传统的属性名、属性值相似性特征,该文通过机器学习方法结合属性对互斥特征与相似性特征进行属性匹配。并以潜在的匹配属性对为基础,引入有约束的属性聚类方法进行Web模式匹配,聚类方法的约束条件来自属性对共现特征。实验结果表明,相对于仅使用相似性特征的方法,在不同的实验设置下,结合属性分布特征的Web模式匹配算法将F值提高了0.13到0.55。 This paper presents a new web schema matching algorithm incorporateing attribute distribution features. Attribute distribution features include the mutually exclusive feature and the co-occurring feature. By discovering mutually exclusive attribute pair and various statistics of the attribute pair, the mutually exclusive feature is calculat- ed with the implication of the semantic similarity of the attribute pair. To utilize name similarity and value similarity based features, the attribute distribution features are combined with traditional similarity based features through machine learning techniques. After potential matched attribute pairs are discovered, this paper introduces the co-occur- ring feature as the constraint of clustering algorithms and solves the web schema matching problem by constrained attribute clustering algorithms. Experiments on a wide variety of domains demonstrate the improvements of F-scores ranging from 0.13 to 0.55.

作者王宇方滨兴吴博宋林海郭岩

机构地区中国科学院计算技术研究所智能信息与智能安全中心中国科学院研究生院

出处《中文信息学报》 CSCD 北大核心 2010年第3期89-96,共8页 Journal of Chinese Information Processing

基金国家863高技术研究发展计划资助项目(2007AA01Z438) 国家242信息安全计划资助项目(2009A19 2009A91)

关键词计算机应用中文信息处理属性对互斥属性对共现 Web模式匹配约束聚类 computer application Chinese information processing mutually exclusive attributes go-occurring attributes web schema matching constrained clustering

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献13

1Z.Jun,et al.,2D Conditional Random Fields for Web information extraction[C]// Proceedings of the 22nd international conference on Machine learning.2005,ACM; Bonn,Germany.
2D.AnHai,D.Pedro,and Y.H.Alon,Reconciling schemas of disparate data sources:a machine-learning approach[J].SIGMOD Rec.,2001,30(2):p.509-520.
3H.He,et al.,Automatic integration of Web search interfaces with WISE-Integrator[J].The VLDB Journal,2004,13(3):256-273.
4W.Kiri,et al.,Constrained K-means Clustering with Background Knowledge[C]// Proceedings of the Eighteenth International Conference on Machine Learning.2001,Morgan Kaufmann Publishers Inc.
5I.Davidson and S.S.Ravi,Agglomerative Hierarchical Clustering with Constraints:Theoretical and Empirical Results,in Knowledge Discovery in Databases:PKDD2005[M].2005,59-70.
6W.Jiying,et al.,Instance-based schema matching for web databases by domain-specific query probing[C]//Proceedings of the Thirtieth international conference on Very large data bases-Volume 30.2004,VLDB Endowment:Toronto,Canada.
7W.Wensheng,et al.,An interactive clustering-based approach to integrating source query interfaces on the deep Web[C]// Proceedings of the 2004 ACM SIGMOD international conference on Management of data.2004,ACM:Paris,France.
8姜芳艽,孟小峰,贾琳琳.Deep Web集成服务的不确定模式匹配[J].计算机学报,2008,31(8):1412-1421. 被引量：14
9H.Bin and C.Kevin Chen-Chuan,Statistical schema matching across web query interfaces[C]// Proceedings of the 2003 ACM SIGMOD international conference on Management of data.2003,ACM:San Diego,California.
10H.Bin and C.Kevin Chen-Chuan,Automatic complex schema matching across Web query interfaces:A correlation mining approach[J].ACM Trans.Database Syst.,2006,31(1),346-395.

二级参考文献15

1Halevy A Y, Rajaraman A, Ordille J J. Data integration: The teenage years//Proceedings of the 32nd International Conference on Very Large Data Bases. Seoul, 2006:9-16
2Elmagarmid A K, Ipeirotis P G, Verykios V S. Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(1): 1-16
3He H, Meng W, Yu C T, Wu Z. WISE-integrator: An automatic integrator of Web search interfaces for E-commerce// Proceedings of the 29th International Conference on Very Large Data Bases. Berlin, 2003:357-368
4Raghavan S, Garcia-Molina H. Crawling the Hidden Web// Proceedings of the 27th International Conference on Very Large Data Bases. Roma, 2001:129-138
5He B, Chang K C-C. Making holistic schema matching robust: An ensemble approaeh//Proeeedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Chicago, 2005:429-438
6Wang J, Wen J, Lochovsky F H, Ma W. Instance-based schema matching for Web databases by domain- specific query probing//Proceedings of the 13th International Conference on Very Large Data Bases. Toronto, 2004:408-419
7He B, Chang K C-C. Statistical schema'matching across Web query interfaces//Proceedings of the 22th ACM SIGMOD International Conference on Management of Data. San Diego, 2003:217-228
8Rahm E, Bernstein P A. A survey of approaches to automatic schema matching. VLDB Journal, 2001, 10(4) : 334-350
9Li W, Clifton C. SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data and Knowledge Engineering, 2000, 33(1): 49- 84
10Madhavan J, Bernstein P A, Rahm E. Generic schema matching with cupid//Proceedings of the 27th International Conference on Very Large Data Bases. Roma, 2001:49-58

共引文献13

1姜芳艽,孟小峰.Deep Web数据集成中查询处理的研究与进展[J].计算机科学与探索,2009,3(2):113-129. 被引量：4
2付文静,彭志平,杨社堂.语义web服务发现技术研究进展[J].茂名学院学报,2009,19(3):56-58.
3刘芳.查询自动生成器在Web数据库发现中的应用[J].信息技术,2009,33(6):85-87. 被引量：2
4马丹,王翰虎,陈梅,张小平.Deep Web数据源发现与分类模型[J].计算机技术与发展,2010,20(7):65-67. 被引量：2
5吴思颖,吴扬扬.一种实体模式匹配算法[J].郑州大学学报（理学版）,2011,43(1):50-56. 被引量：1
6翁年凤,刁兴春,曹建军,冯径.不确定模式匹配研究综述[J].计算机科学,2011,38(12):1-5. 被引量：4
7冯永,张洋.结合匹配度和语义相似度的Deep Web查询接口模式匹配[J].计算机应用,2012,32(6):1688-1691. 被引量：1
8王英,左祥麟,左万利,王鑫.基于本体的Deep Web查询接口集成[J].计算机研究与发展,2012,49(11):2383-2394. 被引量：3
9韩露,潘善亮,何静.一种支持服务关联的Web服务选择方法[J].计算机应用研究,2014,31(3):879-883. 被引量：1
10高华玲.一种基于中文Deep Web的属性相似度计算方法[J].科技创新导报,2014,11(32):58-59.

1杨静,刘宁,张键沛.一种基于约束的半监督聚类查询扩展方法[J].中国科技论文,2013,8(10):994-997.
2郭建军,梁敬东,牛又奇.约束聚类算法研究[J].南京师范大学学报（工程技术版）,2008,8(4):128-131.
3金顺福,黄国言.谈网络原理课程实验设置的改革[J].教学研究,2001,24(3):224-225.
4朱莹,贾永兴,王渊,杨宇.基于网络的信号与系统Labview虚拟实验室建设[J].电子技术与软件工程,2015(23):70-71. 被引量：2
5王珏,杨涛,徐静,罗海燕.《数据库原理》课程教学改革探讨[J].中国科技信息,2011(21):124-124. 被引量：4
6宋峰.信息安全技术的实验设置及实践分析[J].计算机光盘软件与应用,2013,16(16):145-146. 被引量：1
7於跃成,生佳根,邹晓华.基于约束正则化的生成聚类分析[J].系统工程与电子技术,2014,36(4):777-783. 被引量：1
8尚松蒲,赵中建.一类带最小约束的模糊聚类问题[J].中国新技术新产品,2009(19):244-244.
9钟洪,夏利民.基于互信息约束聚类的图像语义标注[J].中国图象图形学报,2009,14(6):1199-1205. 被引量：5
10张永正.“计算机组成原理与系统结构”实验教学研究[J].科技信息,2012(27):8-8. 被引量：1

中文信息学报

2010年第3期

浏览历史

内容加载中请稍等...

结合属性分布特征的模式匹配算法

参考文献13

二级参考文献15

共引文献13

相关作者

相关机构

相关主题

浏览历史