摘要
该文提出了一种结合属性分布特征的Web模式匹配算法,属性分布特征包括属性对互斥特征和属性对共现特征。属性对互斥特征由属性对的互斥性和出现次数计算得出,这个特征隐含了属性对的语义相似程度。为了充分利用传统的属性名、属性值相似性特征,该文通过机器学习方法结合属性对互斥特征与相似性特征进行属性匹配。并以潜在的匹配属性对为基础,引入有约束的属性聚类方法进行Web模式匹配,聚类方法的约束条件来自属性对共现特征。实验结果表明,相对于仅使用相似性特征的方法,在不同的实验设置下,结合属性分布特征的Web模式匹配算法将F值提高了0.13到0.55。
This paper presents a new web schema matching algorithm incorporateing attribute distribution features. Attribute distribution features include the mutually exclusive feature and the co-occurring feature. By discovering mutually exclusive attribute pair and various statistics of the attribute pair, the mutually exclusive feature is calculat- ed with the implication of the semantic similarity of the attribute pair. To utilize name similarity and value similarity based features, the attribute distribution features are combined with traditional similarity based features through machine learning techniques. After potential matched attribute pairs are discovered, this paper introduces the co-occur- ring feature as the constraint of clustering algorithms and solves the web schema matching problem by constrained attribute clustering algorithms. Experiments on a wide variety of domains demonstrate the improvements of F-scores ranging from 0.13 to 0.55.
出处
《中文信息学报》
CSCD
北大核心
2010年第3期89-96,共8页
Journal of Chinese Information Processing
基金
国家863高技术研究发展计划资助项目(2007AA01Z438)
国家242信息安全计划资助项目(2009A19
2009A91)
关键词
计算机应用
中文信息处理
属性对互斥
属性对共现
Web模式匹配
约束聚类
computer application
Chinese information processing
mutually exclusive attributes
go-occurring attributes
web schema matching
constrained clustering