摘要
数据世系是实现大数据安全监管的有效方法。在大数据系统中,仅基于多日志分析的世系生成方法具有展现数据全生命周期完整世系视图的能力。但由于世系类型多样,该方法能否完整获取监管所需数据,即方法的理论可行性有待证明。为此,本文首次提出了一种形式化的证明方法。首先,对世系完整性进行了形式化定义。然后,通过对各日志包含的记录类型进行归约与关联,证明基于现有日志能否完整获取指定的世系信息;其中,提出了两种基于日志公共元素与基于程序运行原理的日志关联方法,用于对属性元素分散在多个日志中的世系类型的证明。最后,采用所提方法,证明了多日志分析方法能够用于Hadoop世系生成。
Data provenance is a practical approach for data security supervision.In the big data system,only the provenance generation method based on multi-log analysis can present a complete provenance view of the data’s whole life cycle.However,due to the diversity of provenance types,the theoretical feasibility must be proved to determine whether the required provenance information for data supervision can be obtained entirely via this method.As such,this paper proposed a formal proof method for the first time.Firstly,the formal definition of provenance completeness was presented.Then,it is proven whether the specified provenance information can be fully obtained based on existing logs by reducing and associating the record types of each log.For the proof of provenance types whose attribute elements are spread out in multiple logs,two log association methods were suggested.These methods are based on common log elements and program operation principles.Finally,by adopting the proposed method,the multi-log analysis-based method proved feasible for Hadoop provenance generation.
作者
高元照
陈性元
杜学绘
李炳龙
GAO Yuanzhao;CHEN Xingyuan;DU Xuehui;LI Binglong(Third School,Information Engineering University,Zhengzhou 450001,Henan,China;State Key Laboratory of Cryptology,Beijing 100878,China)
出处
《武汉大学学报(理学版)》
CAS
CSCD
北大核心
2023年第6期729-738,共10页
Journal of Wuhan University:Natural Science Edition
基金
国家重点研发计划(2018YFB0803603)
科技创新特区资助项目(18-H863-01-ZT-005-017-01)。