摘要
提出一种面向提取-转换-加载(ETL)过程的数据起源追踪系统,讨论实现的关键技术,包括转换分类、元数据设计、转换序列构建、追踪流程设计以及不同转换的追踪方法。系统将追踪所需的元数据设计在包文件结构中,在逆向追踪时抽取元数据进行相关处理,构建各个层次的转换起源信息图,从而实现数据起源的追踪。
This paper proposes a lineage tracing frame for Extraction-Transform-Load(ETL) process,and discusses some key issues in this system,such as classifying the transformations,the designing of metadata,constructing transformation series,tracing process design and the tracing methods for every type of the transformation.The metadata for tracing is injected into the package file,these information are extracted when tracing query were proposed to support data tracing.
出处
《计算机工程》
CAS
CSCD
北大核心
2011年第17期256-258,261,共4页
Computer Engineering
关键词
数据起源
起源管理系统
提取-转换-加载
同步/异步转换
data provenance
provenance management system
Extraction-Transform-Load(ETL)
synchronous/asynchronous transformation