浅谈Spark性能优化方法

Analysis of Spark Performance Optimization Method

下载PDF

导出

摘要随着物联网的快速发展和科技的进步,社会各行业的数据量正以前所未有的速度和规模在增长,如何在海量数据中快速获得有价值的数据也成为企业关注的重点。Spark作为目前最流行的开源大数据处理框架,受底层机制复杂和集群资源的限制,常出现内存不足、任务执行时间长等问题。为此,本文从开发原则、分区和读取数据的格式、集群并行度以及结构化API这4个方面对Spark应用程序性能进行分析和总结,以优化资源配置、提高开发效率。 With the rapid development of the Internet of Things and the advancement of science and technology, the amount of data in various industries in society is growing at an unprecedented speed and scale. How to quickly obtain valuable data from the massive data has become the focus of enterprises. Spark, as the most popular open source big data processing framework, is limited by the complexity of the underlying mechanism and cluster resources, and often has problems such as insufficient memory and long task execution time. To this end, this paper analyzes and summarizes the performance of Spark applications from four aspects: development principles, partition and read data formats, cluster parallelism, and structured API, in order to optimize resource allocation and improve development efficiency.

作者韦统边吴江波苏德张亮韦通明 WEI Tongbian;WU Jiangbo;SU De;ZHANG Liang;WEI Tongming(Guangxi Key Laboratory of Automobile Four New Features,SAIC GM Wuling Automoblie Co.,Ltd.,Liuzhou Guangxi 545007,China)

机构地区上汽通用五菱汽车股份有限公司广西汽车新四化重点实验室

出处《信息与电脑》 2022年第2期53-55,共3页 Information & Computer

关键词物联网价值计算 SPARK 并行度 Internet of Things value calculation spark parallelism

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献2

1廖彬,于炯,张陶,杨兴耀.基于分布式文件系统HDFS的节能算法[J].计算机学报,2013,36(5):1047-1064. 被引量：58
2孟海东,任敬佩.基于云计算平台的聚类算法[J].计算机工程与设计,2015,36(11):2990-2994. 被引量：10

二级参考文献49

1周丽娟,王慧,王文伯,张宁.面向海量数据的并行KMeans算法[J].华中科技大学学报（自然科学版）,2012,40(S1):150-152. 被引量：32
2刘刚,侯宾,翟周伟.Hadoop开源云计算平台[M].北京:北京邮电大学出版社,2011.
3Yun D, Lee J. Research in green network for future Inter- net. Journal of KIISE, 2010, 28(1): 41-51.
4Barroso L A, Holzle U. The case for energy-proportional computing. Computer, 2007, 40(12):33-37.
5Ghemawat S, Gobioff H, Leung ST. The Google File Sys tem//Proceedings of the 19th ACM Symposium on Operating System Principles (SOSP2003). New York, USA, 2003: 29-43.
6Dean J, Ghemawat S. MapReduce: Simplifed data processing on large clusters//Proceedings of the Conference on Operat- ing System Design and Implementation (OSDI). San Francis- co, USA, 2004: 137-150.
7Chang F, Dean J, Ghemawat S, et al. Bigtable: A distribu- ted storage system for structured data//Proceedings of the 7th Symposium on Operating Systems Design and Implemen- tation (OSDI). Seattle, USA, 2006:205-218.
8Benini L, Bogliolo A, Mieheli G D. A survey of design tech- niques for system level dynamic power management. IEEE Transactions on Very Large Scale Integration (VLSI) Sys- tems, 2000, 8(3): 299 -316.
9Albers S. Energy efficient algorithms. Communications of the ACM, 2010, 53(5): 86-96.
10Srivastava M B, Chandrakasan A P, Brodersen R W. Predic- tive system shutdown and other architectural techniques for energy efficient programmable computation. IEEE Transac- tions on Very Large Scale Integration (VLSI) Systems, 1996, 4(1): 42-55.

共引文献65

1李川,陶波.多任务并行处理框架下的雷达信息处理设计[J].中国电子科学研究院学报,2023,18(5):438-443.
2廖彬,于炯,张陶,杨兴耀,英昌甜.一种适应节能的云存储系统元数据动态建模与管理方法[J].小型微型计算机系统,2013,34(10):2407-2412. 被引量：7
3周文琼,王乐球,叶玫.云环境下Hadoop平台的作业调度算法[J].计算机系统应用,2014,23(5):177-181. 被引量：1
4于炯,蒲勇霖,鲁亮,刘粟.分布式处理平台节能计算研究综述[J].新疆大学学报（自然科学版）,2018,35(4):389-401. 被引量：1
5王政英,于炯,英昌甜,鲁亮,班爱琴.基于用户访问特征的云存储副本动态管理节能策略[J].计算机应用,2014,34(8):2256-2259. 被引量：2
6廖彬,于炯,张陶,杨兴耀.数据依赖约束下的任务调度资源选择算法[J].计算机应用,2014,34(8):2260-2266. 被引量：5
7张陶,廖彬,孙华,李丰军,姬金虎.基于数据分类存储的云存储系统节能算法[J].计算机应用,2014,34(8):2267-2272. 被引量：3
8于炯,廖彬,张陶,孙华,国冰磊,杨兴耀.云存储系统节能研究综述[J].计算机科学与探索,2014,8(9):1025-1040. 被引量：10
9王菁.云计算中心节能评估分析[J].机电信息,2014(24):153-154. 被引量：2
10朱建波,李萍,于炯,廖彬.改进的Kerberos协议在HDFS环境下的研究[J].计算机工程与设计,2014,35(10):3392-3398. 被引量：4

1谷程鹏,封悦来.集中式光伏电站发电效率提升策略分析[J].数字化用户,2020(4):109-111.
2宋志平.大变局下的企业经营之道[J].国企,2021(23):28-29. 被引量：1
3程亚萍.疫情形势下减轻学生心理压力的校本课程开发原则[J].安徽教育科研,2022(9):121-122.
4杨珍珍,张坚君.基于Spark技术的高校校史编研系统研究与实现[J].浙江档案,2022(1):51-53. 被引量：1
5震雄SPARK玩具专用注塑机成为工厂“爆单密码”[J].橡塑智造与节能环保,2022,6(1):41-42.
6邵迎春.基于人工智能的互联网络数据安全优化算法研究[J].网络安全技术与应用,2022(3):37-38. 被引量：1
7高琳.传统企业跨境电商物流的发展困境与策略研究——以万邑通为例[J].老字号品牌营销,2022(1):142-144. 被引量：2
8曹晓阳,苗红波,刘安蓉,彭现科,李莉.从世界科学中心转移看中美科技之争[J].科技中国,2021(12):15-19. 被引量：1
9汪时斌.新型干法水泥生产节能减排方案设计和技术应用[J].建材发展导向,2022,20(1):192-194.
10杨直.电竞入亚:体育回归本位的最好契机[J].电子竞技,2021(12):72-73.

信息与电脑

2022年第2期

浏览历史

内容加载中请稍等...

浅谈Spark性能优化方法

参考文献2

二级参考文献49

共引文献65

相关作者

相关机构

相关主题

浏览历史