期刊文献+

一种基于Storm平台的ETL方案实现 被引量:2

Implementation of ETL Scheme Based on Storm Platform
下载PDF
导出
摘要 随着互联网在各个领域的不断发展,数据开始呈现结构多样化与体积海量化。面对海量数据的冲击,如何提高ETL的效率至关重要。针对“信息孤岛”中数据来源及格式皆不统一、数据采集实时性差的问题,提出垂直切分ETL工作流和水平切分待处理数据集,建立一种基于Storm平台的流式ETL处理方案。同时,针对Storm在进行任务分配时对工作节点CPU负载不敏感的缺点,通过定时任务记录工作节点的CPU负载信息,对Storm调度器的slot分配方式进行优化,使得Storm集群的负载更加均衡。实验结果证明该方案可有效提高ETL的处理效率,同时针对slot分配优化可有效地提高系统稳定性与处理效率。 With the continuous development of the Internet in various fields,data begin to show the characteristics of structural diversity and volumetric quantification.In the face of the impact of massive data,how to improve the efficiency of ETL is crucial.In view of the problem of inconsistent data source and format and poor real-time data collection in“information island”,this paper proposed a vertical segmentation ETL workflow and horizontal segmentation pending data set,and established a flow-based ETL processing scheme based on Storm platform.At the same time,for the shortcomings of Storm,which is insensitive to the CPU load of the working node during task assignment,the CPU load information of the working node is recorded by the timing task to optimize the slot allocation mode of the Storm scheduler,so that the load of the Storm cluster is more balanced.T he experimental results show that the scheme can effectively improve the processing efficiency of ETL,and the system stability and processing efficiency for slot allocation optimization.
作者 梁奎奎 LIANG Kui-kui(College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China)
出处 《计算机科学》 CSCD 北大核心 2019年第S11期208-211,240,共5页 Computer Science
关键词 ETL 垂直切分 水平切分 STORM 负载优化 ETL Vertical segmentation Horizontal segmentation Storm Load optimization
  • 相关文献

参考文献13

二级参考文献203

共引文献211

同被引文献7

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部