摘要
分析了分布式抽取-转换-加载(ETL)节点调度方法,给出了一种根据ETL节点抽取的数据类型对所抽取数据进行分割的策略,并提出了基于映射/化简(MapReduce)的分布式ETL节点调度方法。试验表明,该方法提升了ETL节点的数据处理能力,改善了整个ETL过程的吞吐率及响应时间等计算性能,从而提高了分布式ETL的效率。
Scheduling methods for dstributed extraction-transformation-loading (ETL) nodes are analyzed. A strategy is proposed that the data extracted from ETL nodes can be divided according to the type of data. A scheduling method for distributed ETL nodes is presented based on MapReduce. Experiments show that the method improves the data processing capability and the performance of the system, such as throughput and response time of the whole ETL process, thus increasing the efficiency of the distributed ETL.
出处
《指挥信息系统与技术》
2013年第4期17-20,共4页
Modern Electronic Engineering
关键词
抽取-转换-加载
映射
化简
调度
extraction-transformation-loading (ETL)
MapReduce
scheduling