期刊文献+

浅谈Spark性能优化方法

Analysis of Spark Performance Optimization Method
下载PDF
导出
摘要 随着物联网的快速发展和科技的进步,社会各行业的数据量正以前所未有的速度和规模在增长,如何在海量数据中快速获得有价值的数据也成为企业关注的重点。Spark作为目前最流行的开源大数据处理框架,受底层机制复杂和集群资源的限制,常出现内存不足、任务执行时间长等问题。为此,本文从开发原则、分区和读取数据的格式、集群并行度以及结构化API这4个方面对Spark应用程序性能进行分析和总结,以优化资源配置、提高开发效率。 With the rapid development of the Internet of Things and the advancement of science and technology, the amount of data in various industries in society is growing at an unprecedented speed and scale. How to quickly obtain valuable data from the massive data has become the focus of enterprises. Spark, as the most popular open source big data processing framework, is limited by the complexity of the underlying mechanism and cluster resources, and often has problems such as insufficient memory and long task execution time. To this end, this paper analyzes and summarizes the performance of Spark applications from four aspects: development principles, partition and read data formats, cluster parallelism, and structured API, in order to optimize resource allocation and improve development efficiency.
作者 韦统边 吴江波 苏德 张亮 韦通明 WEI Tongbian;WU Jiangbo;SU De;ZHANG Liang;WEI Tongming(Guangxi Key Laboratory of Automobile Four New Features,SAIC GM Wuling Automoblie Co.,Ltd.,Liuzhou Guangxi 545007,China)
出处 《信息与电脑》 2022年第2期53-55,共3页 Information & Computer
关键词 物联网 价值 计算 SPARK 并行度 Internet of Things value calculation spark parallelism
  • 相关文献

参考文献2

二级参考文献49

  • 1周丽娟,王慧,王文伯,张宁.面向海量数据的并行KMeans算法[J].华中科技大学学报(自然科学版),2012,40(S1):150-152. 被引量:32
  • 2刘刚,侯宾,翟周伟.Hadoop开源云计算平台[M].北京:北京邮电大学出版社,2011.
  • 3Yun D, Lee J. Research in green network for future Inter- net. Journal of KIISE, 2010, 28(1): 41-51.
  • 4Barroso L A, Holzle U. The case for energy-proportional computing. Computer, 2007, 40(12):33-37.
  • 5Ghemawat S, Gobioff H, Leung ST. The Google File Sys tem//Proceedings of the 19th ACM Symposium on Operating System Principles (SOSP2003). New York, USA, 2003: 29-43.
  • 6Dean J, Ghemawat S. MapReduce: Simplifed data processing on large clusters//Proceedings of the Conference on Operat- ing System Design and Implementation (OSDI). San Francis- co, USA, 2004: 137-150.
  • 7Chang F, Dean J, Ghemawat S, et al. Bigtable: A distribu- ted storage system for structured data//Proceedings of the 7th Symposium on Operating Systems Design and Implemen- tation (OSDI). Seattle, USA, 2006:205-218.
  • 8Benini L, Bogliolo A, Mieheli G D. A survey of design tech- niques for system level dynamic power management. IEEE Transactions on Very Large Scale Integration (VLSI) Sys- tems, 2000, 8(3): 299 -316.
  • 9Albers S. Energy efficient algorithms. Communications of the ACM, 2010, 53(5): 86-96.
  • 10Srivastava M B, Chandrakasan A P, Brodersen R W. Predic- tive system shutdown and other architectural techniques for energy efficient programmable computation. IEEE Transac- tions on Very Large Scale Integration (VLSI) Systems, 1996, 4(1): 42-55.

共引文献65

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部