期刊文献+

面向数据融合的多粒度数据溯源方法 被引量:4

Method on Multi-granularity Data Provenance for Data Fusion
下载PDF
导出
摘要 随着数据量的增加、数据间的关联和交叉,需要通过数据融合来实现数据的价值最大化。然而,由于数据融合过程复杂,为清晰解释数据融合过程,建立数据融合的回溯机制十分必要。虽然对数据溯源的研究很多,但大多是面向查询和工作流的溯源研究,而面向数据融合的溯源研究很少。文中面向数据融合溯源展开研究,提出了一种支持多粒度数据溯源的方法。首先,对数据融合过程进行抽象,以实体为核心构建模式、实体和属性的语义图,将数据融合过程语义化,并提出优化的溯源信息存储模式;然后,基于语义图,分别提出了实体级和属性级的溯源查询算法,以及相应的查询优化策略;最后,通过实验证明了提出的数据溯源方法的有效性。 As the amount of data increases, correlates and crosses between data, the value of data needs to be maximized through data fusion.However, due to the complexity of the data fusion process, to clearly explain the data fusion process, it is necessary to establish a backtracking mechanism for data fusion.Although many researches are focused on data provenance, most of them are based on query and workflow, and few of them are for data fusion.This paper focuses on the provenance of data fusion, and proposes a method to support multi-granularity provenance.Firstly, the data fusion process is abstracted, and the semantic graphs of patterns, entities and attributes are constructed with the entity as the core, and an optimized model for storing storage provenance information is proposed.Secondly, on the basis of the semantic graph, the data provenance query algorithms at the entity level and the attribute level are proposed respectively, and the corresponding query optimization strategy are also proposed.Finally, experiments demonstrate the effectiveness of the proposed data provenance method.
作者 杨斐斐 沈思妤 申德荣 聂铁铮 寇月 YANG Fei-fei;SHEN Si-yu;SHEN De-rong;NIE Tie-zheng;KOU Yue(College of Computer Science and Engineering,Northeastern University,Shenyang 110169,China)
出处 《计算机科学》 CSCD 北大核心 2022年第5期120-128,共9页 Computer Science
基金 国家自然科学基金(62072084,62072086) 国家重点研发计划(2018YFB1003404)。
关键词 数据溯源 数据融合 多粒度 Data provenance Data fusion Multi-granularity
  • 相关文献

参考文献3

二级参考文献22

  • 1陈跃国,王京春.数据集成综述[J].计算机科学,2004,31(5):48-51. 被引量:139
  • 2杨先娣,彭智勇,刘君强,李旭辉.信息集成研究综述[J].计算机科学,2006,33(7):55-59. 被引量:35
  • 3Cui Y, Widom J, Wiener J L. Tracing the lineage of view data in a warehousing environment [J]. ACM Trans on Database Systems, 2000, 25(2): 179-227.
  • 4Green T J, Karvounarakis G, Tannen V. Provenance semirings [C]//Proe of the 27th A(?M SIGMOD Int Couf on Management of Data Symp on Principles of Database Systems. New York: ACM, 2007:31-40.
  • 5Llinas J, Hall D L. An introduction to multi-sensor data fusion [C]//Proc of ISCAS'98. Piscataway, NJ: IEEE, 1998: 537-540.
  • 6Dong X L, Naumann F. Data fusion--Resolving data cooncfcts for integration [J]. Proceedings of the VLDB Endowment, 2009, 2(2): 1654-1655.
  • 7l)ong X L, Gabrilovich E, Heitz G, et al. From data fusion to knowledge fusion [J]. Proceedings of the VLDB Endowment, 2014, 7(10): 881-892.
  • 8Dong X l, Berti-Equille L, Srivastava D. Data fusion: Resolving conflicts from mutiple sources [C] //Proc of the WAIM2013. Berlin: Springer, 2013:64-76.
  • 9Li Xian, Dong X L, Lyons K, et al. Scaling up copy detection [C] //Proc of the 31st IEEE Int Conf on Data Engineering. Piscataway, NJ: IEEE, 89-100.
  • 10Xuan Liu, Xin Luna Dong, Beng Chin Ooi, et al. Online data fusion[J]. Proceedings of the VLDB Endowment, 2011, 4 (11): 932-943.

共引文献156

同被引文献39

引证文献4

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部