期刊文献+

结合属性分布特征的模式匹配算法

Schema Matching Incorporating Attribute Distribution Features
下载PDF
导出
摘要 该文提出了一种结合属性分布特征的Web模式匹配算法,属性分布特征包括属性对互斥特征和属性对共现特征。属性对互斥特征由属性对的互斥性和出现次数计算得出,这个特征隐含了属性对的语义相似程度。为了充分利用传统的属性名、属性值相似性特征,该文通过机器学习方法结合属性对互斥特征与相似性特征进行属性匹配。并以潜在的匹配属性对为基础,引入有约束的属性聚类方法进行Web模式匹配,聚类方法的约束条件来自属性对共现特征。实验结果表明,相对于仅使用相似性特征的方法,在不同的实验设置下,结合属性分布特征的Web模式匹配算法将F值提高了0.13到0.55。 This paper presents a new web schema matching algorithm incorporateing attribute distribution features. Attribute distribution features include the mutually exclusive feature and the co-occurring feature. By discovering mutually exclusive attribute pair and various statistics of the attribute pair, the mutually exclusive feature is calculat- ed with the implication of the semantic similarity of the attribute pair. To utilize name similarity and value similarity based features, the attribute distribution features are combined with traditional similarity based features through machine learning techniques. After potential matched attribute pairs are discovered, this paper introduces the co-occur- ring feature as the constraint of clustering algorithms and solves the web schema matching problem by constrained attribute clustering algorithms. Experiments on a wide variety of domains demonstrate the improvements of F-scores ranging from 0.13 to 0.55.
出处 《中文信息学报》 CSCD 北大核心 2010年第3期89-96,共8页 Journal of Chinese Information Processing
基金 国家863高技术研究发展计划资助项目(2007AA01Z438) 国家242信息安全计划资助项目(2009A19 2009A91)
关键词 计算机应用 中文信息处理 属性对互斥 属性对共现 Web模式匹配 约束聚类 computer application Chinese information processing mutually exclusive attributes go-occurring attributes web schema matching constrained clustering
  • 相关文献

参考文献13

  • 1Z.Jun,et al.,2D Conditional Random Fields for Web information extraction[C]// Proceedings of the 22nd international conference on Machine learning.2005,ACM; Bonn,Germany.
  • 2D.AnHai,D.Pedro,and Y.H.Alon,Reconciling schemas of disparate data sources:a machine-learning approach[J].SIGMOD Rec.,2001,30(2):p.509-520.
  • 3H.He,et al.,Automatic integration of Web search interfaces with WISE-Integrator[J].The VLDB Journal,2004,13(3):256-273.
  • 4W.Kiri,et al.,Constrained K-means Clustering with Background Knowledge[C]// Proceedings of the Eighteenth International Conference on Machine Learning.2001,Morgan Kaufmann Publishers Inc.
  • 5I.Davidson and S.S.Ravi,Agglomerative Hierarchical Clustering with Constraints:Theoretical and Empirical Results,in Knowledge Discovery in Databases:PKDD2005[M].2005,59-70.
  • 6W.Jiying,et al.,Instance-based schema matching for web databases by domain-specific query probing[C]//Proceedings of the Thirtieth international conference on Very large data bases-Volume 30.2004,VLDB Endowment:Toronto,Canada.
  • 7W.Wensheng,et al.,An interactive clustering-based approach to integrating source query interfaces on the deep Web[C]// Proceedings of the 2004 ACM SIGMOD international conference on Management of data.2004,ACM:Paris,France.
  • 8姜芳艽,孟小峰,贾琳琳.Deep Web集成服务的不确定模式匹配[J].计算机学报,2008,31(8):1412-1421. 被引量:14
  • 9H.Bin and C.Kevin Chen-Chuan,Statistical schema matching across web query interfaces[C]// Proceedings of the 2003 ACM SIGMOD international conference on Management of data.2003,ACM:San Diego,California.
  • 10H.Bin and C.Kevin Chen-Chuan,Automatic complex schema matching across Web query interfaces:A correlation mining approach[J].ACM Trans.Database Syst.,2006,31(1),346-395.

二级参考文献15

  • 1Halevy A Y, Rajaraman A, Ordille J J. Data integration: The teenage years//Proceedings of the 32nd International Conference on Very Large Data Bases. Seoul, 2006:9-16
  • 2Elmagarmid A K, Ipeirotis P G, Verykios V S. Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(1): 1-16
  • 3He H, Meng W, Yu C T, Wu Z. WISE-integrator: An automatic integrator of Web search interfaces for E-commerce// Proceedings of the 29th International Conference on Very Large Data Bases. Berlin, 2003:357-368
  • 4Raghavan S, Garcia-Molina H. Crawling the Hidden Web// Proceedings of the 27th International Conference on Very Large Data Bases. Roma, 2001:129-138
  • 5He B, Chang K C-C. Making holistic schema matching robust: An ensemble approaeh//Proeeedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Chicago, 2005:429-438
  • 6Wang J, Wen J, Lochovsky F H, Ma W. Instance-based schema matching for Web databases by domain- specific query probing//Proceedings of the 13th International Conference on Very Large Data Bases. Toronto, 2004:408-419
  • 7He B, Chang K C-C. Statistical schema'matching across Web query interfaces//Proceedings of the 22th ACM SIGMOD International Conference on Management of Data. San Diego, 2003:217-228
  • 8Rahm E, Bernstein P A. A survey of approaches to automatic schema matching. VLDB Journal, 2001, 10(4) : 334-350
  • 9Li W, Clifton C. SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data and Knowledge Engineering, 2000, 33(1): 49- 84
  • 10Madhavan J, Bernstein P A, Rahm E. Generic schema matching with cupid//Proceedings of the 27th International Conference on Very Large Data Bases. Roma, 2001:49-58

共引文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部