期刊文献+

基于Spark分布式ETL在海量后勤数据的应用 被引量:3

Application of massive logistics data based on Spark distributed ETL
下载PDF
导出
摘要 在处理某大型企业的海量后勤大数据时,传统的基于MapReduce的ETL技术在数据提取、转换过程中,因为频繁进行磁盘读取的原因,存在数据处理效率不足的问题。考虑到Spark是基于内存操作的计算引擎,不依赖于磁盘操作,对数据的提取、转换效率的提升有一定帮助,因此文中采用了基于Spark的分布式ETL技术来处理这些海量数据,并通过实验进行效率比较。 In dealing with the massive logistics big data of a large enterprise,the traditional MapReduce-based ETL technology has insufficient data processing efficiency in the process of data extraction and conversion because of frequent disk reading.Considering that Spark is a computational engine based on memory operation,it does not depend on disk operations,which is helpful for data extraction and conversion efficiency.Therefore,Spark-based distributed ETL technology is used in this paper to process these massive data and the efficiency is compared though experiments.
作者 张野 姚文明 ZHANG Ye;YAO Wen-ming(North China Institute of Computing Technology,Beijing 100083,China)
出处 《信息技术》 2019年第12期165-168,共4页 Information Technology
关键词 大数据 SPARK ETL 分布式 big data Spark ETL distributed
  • 相关文献

参考文献4

二级参考文献58

  • 1鲍玉斌,孙焕良,冷芳玲,王大玲,于戈.数据仓库环境下以用户为中心的数据清洗过程模型[J].计算机科学,2004,31(5):52-55. 被引量:15
  • 2钟华,冯文澜,谭红星,黄涛.面向数据集成的ETL系统设计与实现[J].计算机科学,2004,31(9):87-89. 被引量:21
  • 3Vassiliadis P, Simitsis A, Skiadopoulos S. Conceptual Modeling for ETI. Processes [C].//Proceedings of the 5th ACM International Workshop on Data Warehousing and OLAP. New York.. ACM, 2002 : 14-21.
  • 4Simitsis A. Mapping Conceptual to Logical Models for ETL Processes[C] .// Proceedings of the 8th ACM International Workshop on Data Warehousing and OI.AP. New York: ACM, 2005: 67-76.
  • 5Inmon W H. The Data Warehouse Budget [J/OL]. DM Review Magazine. http://www, datawarehouse, inf. br/Papers/inmon% 20budget 1. pdf, 2010-4-12.
  • 6Shilakes C,Tylman J. Enterprise Information Portals [R]. New York:Merrill Lynch, 1998.
  • 7Demare.st M. The Politics of Data Warehousing [EB/OL]. http://www, hevanet, com/demarest/marc/dwpol, html, 2009 6- 12.
  • 8Simitsis A, Vassiliadis P. A Methodology for the Conceptual Modeling of ETI. Processes [C] // Proceedings of the Decision Systems Engineering Workshop. Klagenfurt: CAiSE, 2003 : 501- 505.
  • 9SkoutasD. Designing ETL Processes Using Semantic Web Technologies [C]//Proceedings of the 9th ACM International Workshop on DataWarehousing and OLAP. NewYork:ACM, 2006:67-74.
  • 10Sellis T. Formal Specification and Optimization of ETL Scenarios [C] // Proceedings of the 9th ACM International Workshop on Data Warehousing and OLAP. New York: ACM, 2006:1-2.

共引文献117

同被引文献30

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部