期刊文献+

数据世系管理技术研究综述 被引量:66

A Survey on Management of Data Provenance
下载PDF
导出
摘要 世系描述了数据产生、并随时间推移而演变的整个过程,它的应用领域很广,包括数据质量评价、数据核查、数据恢复和数据引用等.数据世系大致可分为不同数据源之间的数据演化过程和同一数据源内部的数据演化过程,即模式级和实例级数据演化过程.文中以模式级和实例级数据世系的表示、查询为主线综述数据世系的研究进展.模式级世系部分主要介绍了查询重写和模式映射的世系追踪技术,实例级世系部分则从关系型数据、XML数据、流数据三方面总结了新近的研究进展.文中还综述了跟踪不确定性数据及其演化过程的研究进展.最后,列举了数据世系管理的应用,并讨论了世系分析研究面临的挑战及未来的研究方向. The data provenance describes about how data is generated and evolves with time going on,which has many applications,including evaluation of data quality,audit trail,replication recipes,data citation,etc.Generally,the data provenance could be recorded among multiple sources,or just within a single data source.In other words,the derivation history of data could take place either in schema level,or in instance level.This paper surveys the researches about presentation and query of data provenance both in schema level and instance level.For the schema level,the focus is on query rewriting and schema mappings,and for the instance level,the focus includes relational data provenance,XML data provenance,streaming data provenance.Moreover,the research efforts of uncertain data provenance to track the derivation of data and uncertainty are also summarized.Finally,this paper lists applications of the data provenance,discusses the main challenges,and points out some research issues in future.
出处 《计算机学报》 EI CSCD 北大核心 2010年第3期373-389,共17页 Chinese Journal of Computers
基金 国家自然科学基金重点项目(60933001) 国家杰出青年基金(60925008) 国家自然科学基金(60803020) 上海市重点学科建设项目(B412)资助
关键词 数据世系 世系半环 数据集成 数据空间 不确定数据 data provenance provenance semiring data integration data space uncertain data
  • 相关文献

参考文献80

  • 1Wang Y Richard, Madnick Stuart E. A polygen model for heterogeneous database systems: The source tagging perspective//Proceedings of the 16th International Conference on Very Large Data Bases. Brisbane, Queensland, Australia, 1990:519-538.
  • 2Lanter D P. Design of a lineage-based meta-data base for GIS. Cartography and Geographic Information Systems, 1991, 18:255-261.
  • 3Woodruff A, Stonebraker M. Supporting fine-grained data lineage in a database visualization environment//Proceedings of the 13rd IEEE International Conference on Data Engineering. Birmingham, England, 1997:91-102.
  • 4Cui Y, Widom J, Wiener J L. Tracing the lineage of view data in a warehousing environment. The ACM Transactions on Database Systems, 2000, 25(2): 179-227.
  • 5Buneman P, Khanna S, Tan WC. Why and where, A characterization of data provenanee//Proceedings of the 17th International Conference on Data Engineering. London, UK 2001:316-330.
  • 6Simmhan Yogesh L, Plale Beth, Gannon Dennis. A survey of data provenance techniques. Computer Science Department: Indiana University, Bloomington IN: Technical Report IUB-CS-TR618, 2001.
  • 7Glavic Boris, Dittrich Klaus. Data provenance: A categorization of existing approaehes//Proceedings of the 6th MMC Workshop of BTW 2007. Aachen, Germany, 2007:227-241.
  • 8Glavic Boris, Alonso Gustavo. Perm: Processing provenance and data on the same data model through query rewriting// Proceedings of the 25th IEEE International Conference on Data Engineering. Shanghai, China, 2009: 174-185.
  • 9Kolaitis P G. Schema mappings, data exchange, and metadata management//Proceedings of the 24th ACM SIGMOD- SIGACT-SIGART Symposium on Principles of Database Systems. Baltimore, Maryland, USA, 2005:61- 75.
  • 10Glavic B, Alonso G. Provenance for nested subqueries//Proceedings of the 12nd International Conference on Extending Database Technology. Saint-Petersburg, Russia, 2009: 982- 993.

二级参考文献150

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2谷峪,于戈,张天成.RFID复杂事件处理技术[J].计算机科学与探索,2007,1(3):255-267. 被引量:54
  • 3Deshpande A, Guestrin C, Madden S, Hellerstein J M, Hong W. Model-driven data acquisition in sensor networks// Proceedings of the 30th International Conference on Very Large Data Bases. Toronto, 2004:588-599
  • 4Madhavan J, Cohen S, Xin D, Halevy A, Jeffery S, Ko D, Yu C. Web-scale data integration: You can afford to pay as you go//Proceedings of the 33rd Biennial Conference on Innovative Data Systems Research. Asilomar, 2007:342-350
  • 5Liu Ling. From data privacy to location privacy: Models and algorithms (tutorial)//Proceedings of the 33rd International Conference on Very Large Data bases. Vienna, 2007: 1429- 1430
  • 6Samarati P, Sweeney L. Generalizing data to provide anonymity when disclosing information (abstract)//Proeeedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Seattle, 1998:188
  • 7Cavallo R, Pittarelli M. The theory of probabilistic databases//Proceedings of the 13th International Conference on Very Large Data Bases. Brighton, 1987:71-81
  • 8Barbara D, Garcia-Molina H, Porter D. The management of probabilistic data. IEEE Transactions on Knowledge and Data Engineering, 1992, 4(5): 487-502
  • 9Fuhr N, Rolleke T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Transactions on Information Systems, 1997, 15(1): 32-66
  • 10Zimanyi E. Query evaluation in probabilistic databases. Theoretical Computer Science, 1997, 171(1-2): 179-219

共引文献344

同被引文献803

引证文献66

二级引证文献2383

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部