摘要
针对数据分析融合平台建设中数据集成开发效率低、数据集成慢和数据网络分散等问题,提出跨网络传输的分布式ETL框架设计。通过对主流ETL工具进行分析,总结了ETL的工作原理及过程,设计了一种基于消息中间件面向数据集成的分布式ETL框架。使用该框架处理数据集成任务时,提交数据集成过程的描述文件进行数据处理。基于元模型驱动和面向切面设计思想,设计数据任务执行引擎和控制模型。基于该框架开发的工具可使数据开发人员从大量重复的数据操作中解脱出来,将更多精力放在数据的逻辑处理上。
Aiming at the problems of low development efficiency of data integration,slow data integration and decentralized data network in the construction of data analysis and fusion platform,this paper proposes a distributed ETL framework design for cross network transmission.Through analyzing and researching the mainstream ETL tools,this paper summarizes the working principles of ETL and the characteristics of ETL process,and designs a distributed ETL framework based on message-oriented middleware for data integration.When using this framework to process date integration,data can be processed through submitting description files in the course of data integration.In this paper,data task execution engine and control model are designed based on the meta-model driven and section-oriented design.Tools developed under this framework can make data developers free from a large number of repeated data operations,and put more emphasis on the logic processing of data.
出处
《软件导刊》
2017年第11期197-199,共3页
Software Guide
关键词
数据集成
分布式
ETL
消息中间件
data integration
distributed
ETL
message-oriented middleware