期刊文献+

面向多源关系数据的融合 被引量:10

Multi-source relational data fusion
原文传递
导出
摘要 针对"信息孤岛"中的关系数据融合问题,本文提出并实现了多源关系数据融合的基本框架(multi-source relational data fusion,MSF).框架包含3个主要模块:模式匹配、实体对齐、实体融合.模式匹配面向多源关系数据的属性对齐问题,结合属性值的多维特征,提出基于匈牙利(Hungarian)算法的属性间对齐发现机制,实现了多源关系数据的快速模式匹配.实体对齐连接多源关系中的元组对,通过引入多样性取样策略和实体特征抽取方法,提升了实体对齐的效果.最后将对齐实体进行融合,为数据分析提供统一的数据视图.为了验证MSF的效果和效率,实现了数据融合系统DataPuzzle,并在该系统上,结合真实公开的多领域数据,对提出的方法进行了验证.结果表明,所提出的方法可以高效地实现数据融合,具有较高的查全率、查准率. Focusing on the problem of relational data fusion in the environment with"information isolated island",this paper presents a multi-sources relational data fusion(MSF)framework.The framework consists of three components:schema matching,entity alignment,and entity fusion.Based on the Hungarian algorithm,we propose an alignment discovery mechanism for the attributes alignment among multi-sources relational data.By extracting the multi-dimensional features of attribute values,we efficiently realized schema matching of multisources relational data.To link the tuple pairs from multi-source data,we introduced the diversity sampling strategy and the entity feature extraction approach.These can effectively improve the performance of entity alignment.Finally,linked entities are fused to provide a unified view of data analysis.To verify the usefulness and efficiency of the proposed methods,we implemented a fusion system called Data Puzzle,which is verified with the real public multi-field data.Experimental results demonstrate that the proposed methods can fuse multi-source relational data efficiently with high recall and precision.
作者 丁玥 王涓 卢卫 荣垂田 杜小勇 Yue DING;Juan WANG;Wei LU;Chuitian RONG;Xiaoyong DU(MOE Key Laboratory of Data Engineering and Knowledge Engineering,Rernrnin University of China,Beijing 100872,China;School of Information,Renmin University of China,Beijing 100872,China;School of Computer Science and Technology,Tianjin Polytechnic University,Tianjin 300387,China)
出处 《中国科学:信息科学》 CSCD 北大核心 2020年第5期649-661,共13页 Scientia Sinica(Informationis)
基金 北京市科技计划(批准号:Z171100005117002) 国家自然科学基金(批准号:61972403,61872315,61661146001)资助项目。
关键词 多源异构数据 关系数据 信息孤岛 模式匹配 实体对齐 数据融合 multi-source heterogeneous data relational data information isolated island schema matching entity alignment data fusion
  • 相关文献

参考文献4

二级参考文献31

共引文献81

同被引文献97

引证文献10

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部