摘要
ETL是将数据由不同数据源抽取到数据仓库的重要过程,ETL的过程设计、维护和修改直接影响数据仓库中数据处理的效率和数据的质量。通过分析ETL活动的模型特点,结合分布式计算的思想提出一种新的ETL系统模型,并提出基于该系统架构的满足ETL任务形态特征的优化方案,详细描述数据以及调度信息在系统中的周转过程。
ETL is an important process of extracting data from different data sources to Data Warehouse.Its process design,maintenance and modification directly affect the efficiency of data processing and data quality in the data warehouse.Combined with the concept of distributed computing,presents a new ETL System model,and furthermore puts forward an optimizing method that is based on the system architecture and satisfies the topological characteristics of ETL tasks,describes the data flow and scheduling process of the system in details.