This paper analyzes the main characteristics, benefits, and disadvantages of existing traditional ETL (extraction, transformation, loading) methods, and summaries some factors affecting the performance of ETL tools....This paper analyzes the main characteristics, benefits, and disadvantages of existing traditional ETL (extraction, transformation, loading) methods, and summaries some factors affecting the performance of ETL tools. Then, a new ETL approach, E-LT (extraction, loading and transformation), is proposed. The E-LT approach applies database mapping technique to realize that loading stage and transformation stage in the ETL process are performed at the same time after the extraction stage. Thus, it can use SQL commands to complete loading and transformation processing, and eliminates the staging area before loading in traditional ETL process. The framework of an ETL engine based on E-LT method is presented. The ETL process including initial loading and incremental refreshment is discussed in detail, and the SQL-based algorithm for initial loading is presented. The performance of E-LT method on loading throughout outperforms some commercial ETL approaches by experimental proof and theoretical analysis. At last, a real case in marine data warehousing of the E-LT method is discussed for illustrating the validity of the proposed method.展开更多
基金Supported by the National Natural Science Foundation of China (60673139, 60573090)
文摘This paper analyzes the main characteristics, benefits, and disadvantages of existing traditional ETL (extraction, transformation, loading) methods, and summaries some factors affecting the performance of ETL tools. Then, a new ETL approach, E-LT (extraction, loading and transformation), is proposed. The E-LT approach applies database mapping technique to realize that loading stage and transformation stage in the ETL process are performed at the same time after the extraction stage. Thus, it can use SQL commands to complete loading and transformation processing, and eliminates the staging area before loading in traditional ETL process. The framework of an ETL engine based on E-LT method is presented. The ETL process including initial loading and incremental refreshment is discussed in detail, and the SQL-based algorithm for initial loading is presented. The performance of E-LT method on loading throughout outperforms some commercial ETL approaches by experimental proof and theoretical analysis. At last, a real case in marine data warehousing of the E-LT method is discussed for illustrating the validity of the proposed method.