期刊文献+

基于多日志分析的大数据世系生成可行性证明

Theoretical Feasibility Proof of the Multi-Log Analysis-Based Big Data Provenance Generation Method
原文传递
导出
摘要 数据世系是实现大数据安全监管的有效方法。在大数据系统中,仅基于多日志分析的世系生成方法具有展现数据全生命周期完整世系视图的能力。但由于世系类型多样,该方法能否完整获取监管所需数据,即方法的理论可行性有待证明。为此,本文首次提出了一种形式化的证明方法。首先,对世系完整性进行了形式化定义。然后,通过对各日志包含的记录类型进行归约与关联,证明基于现有日志能否完整获取指定的世系信息;其中,提出了两种基于日志公共元素与基于程序运行原理的日志关联方法,用于对属性元素分散在多个日志中的世系类型的证明。最后,采用所提方法,证明了多日志分析方法能够用于Hadoop世系生成。 Data provenance is a practical approach for data security supervision.In the big data system,only the provenance generation method based on multi-log analysis can present a complete provenance view of the data’s whole life cycle.However,due to the diversity of provenance types,the theoretical feasibility must be proved to determine whether the required provenance information for data supervision can be obtained entirely via this method.As such,this paper proposed a formal proof method for the first time.Firstly,the formal definition of provenance completeness was presented.Then,it is proven whether the specified provenance information can be fully obtained based on existing logs by reducing and associating the record types of each log.For the proof of provenance types whose attribute elements are spread out in multiple logs,two log association methods were suggested.These methods are based on common log elements and program operation principles.Finally,by adopting the proposed method,the multi-log analysis-based method proved feasible for Hadoop provenance generation.
作者 高元照 陈性元 杜学绘 李炳龙 GAO Yuanzhao;CHEN Xingyuan;DU Xuehui;LI Binglong(Third School,Information Engineering University,Zhengzhou 450001,Henan,China;State Key Laboratory of Cryptology,Beijing 100878,China)
出处 《武汉大学学报(理学版)》 CAS CSCD 北大核心 2023年第6期729-738,共10页 Journal of Wuhan University:Natural Science Edition
基金 国家重点研发计划(2018YFB0803603) 科技创新特区资助项目(18-H863-01-ZT-005-017-01)。
关键词 大数据 世系数据生成 多日志分析 世系完整性证明 HADOOP Progger big data provenance data generation multi-log analysis provenance completeness proof Hadoop Progger
  • 相关文献

参考文献3

二级参考文献84

  • 1刘喜平,万常选.数据起源研究综述[J].科技广场,2005(1):47-52. 被引量:13
  • 2李亚子.数据起源标注模式与描述模型[J].现代图书情报技术,2007(7):10-13. 被引量:16
  • 3BOSE R,FREW J.Lineage retrieval for scientific data processing:a survey[J].ACM Computing Surveys,2005,37(1):1-28.
  • 4WIDOM J.Trio:a system for integrated management of data,accuracy,lineage[C]//Proc of the 2nd Biennial Conference on Innovative Data Systems.2006:262-276.
  • 5IKEDA R,WIDOM J.Panda:a system for provenance and data[J].IEEE Data Engineering Bulletin,2010,33(3):1-4.
  • 6WANG Y R,MADNICK S E.A polygen model for heterogeneous database systems:the source tagging perspective[R].Cambridge:Sloan School of Management,1990.
  • 7WOODRUFF A,STONEBRAKER G.Supporting fine-grained data lineage in a database visualization environment[C]//Proc of the 13th International Conference on Data Engineering.Washington DC:IEEE Computer Society,1997:91-102.
  • 8STONEBRAKER M,CHEN J,NATHAN N,et al.Tioga:providing data management support for scientific visualization applications[C]//Proc of the 19th International Conference on Very Large Databases.San Francisco:Morgan Kaufmann,1993:25-38.
  • 9BUNEMAN P,MAIER D,WIDOM J.Where was your data yesterday,where will it go tomorrow? data annotation and provenance for scientific applications[EB/OL].(2000-02-28).http://hermes.dpi.inpe.br:1910/col/dpi.inpe.br/ban on/2004/04.21.11.45/doc/BunemanWhereTomorrow.pdf.
  • 10CUI Ying-wei,WIDOM J,WIENER J L.Tracing the lineage of view data in a data warehousing environment[J].ACM Trans on Database Systems,2000,25(2):179-227.

共引文献111

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部