期刊文献+

一种基于Spark的配置优化技术

A Spark-based Configuration Optimization Technology
下载PDF
导出
摘要 在快速进行海量数据处理的电力应用中,Spark变得越来越重要,但其配置参数空间大且参数之间关联关系复杂,基于经验通过手动调整参数以获得最佳性能极其困难,故而提出一种基于Spark的配置优化方法。选取对Spark性能影响活跃的配置参数,通过MCMC采样和生成对抗网络(GAN)生成数据集;通过分层建模构建性能模型;通过粒子群算法在参数空间有效搜索应用程序的最佳配置。实验结果表明,采用所提出的方法使得Spark的性能相比经验调优平均提高了25%。 Spark is becoming more and more important in power applications where massive data should be rapidly processed,but its configuration parameter space is large and the relationship between parameters is complex.It is extremely difficult to manually adjust parameters based on experience to obtain the best performance.Therefore,this paper proposes a configuration optimization method based on Spark.The configuration parameters that have an active impact on Spark performance are selected,and the dataset is generated through MCMC sampling and generative adversarial network(GAN);The performance model is constructed through hierarchical modeling.The optimal configuration of the application is efficiently searched in the parameter space by the particle swarm optimization(PSO)algorithm.The experimental results show that the performance of Spark is improved by an average of 25%compared with empirical tuning by the method based on experience.
作者 沈伍强 沈桂泉 许明杰 杨春松 王召 SHEN Wuqiang;SHEN Guiquan;XU Mingjie;YANG Chunsong;WANG Zhao(Information Center of Guangdong Power Grid Co.,Ltd.,Guangzhou 510000,China;Guodian NARI Technology Co.,Ltd.,Nanjing 210000,China)
出处 《微型电脑应用》 2024年第2期93-96,105,共5页 Microcomputer Applications
基金 南方电网公司科技项目资助(037800KK52190012)。
关键词 SPARK 参数配置 MCMC算法 分层建模 粒子群算法 Spark parameter configuration MCMC algorithm hierarchical modeling PSO algorithm
  • 相关文献

参考文献6

二级参考文献43

  • 1邓丰义,刘震宇.基于模式矩阵的FP-growth改进算法[J].厦门大学学报(自然科学版),2005,44(5):629-633. 被引量:17
  • 2White T. Hadoop: The definitive guide[J]. O'reilly Media Inc Gravenstein Highway North,2010,215(11):1-4.
  • 3Lakshman A,Malik P. Cassandra..A decentralized structured storage system[J]. Acre Sigops Operating Systems Review, 2010,44(2) :35-40.
  • 4Zaharia M,Chowdhury M,Franklin M J,et al. Spark:Cluster computing with working sets[C]//Proc of the 2nd USENIX Conference on Hot Topics in Cloud Computing, 2010:1765- 1773.
  • 5Seo S, Jang I, Woo K, et al. HPMR: Prefetching and pre- shuffling in shared MapReduce computation envlronment[C] //Proc of the 2009 IEEE International Conference on Cluster Computing, 2009 : 1-8.
  • 6Jiang D,Ooi B C, Shi L, et al. The performance of MapRe- duce:An in-depth study[J]. Proceedings of the VLDB En- dowment, 2010,3 (12) : 472-483.
  • 7Dittrich J. Hadoopq-q- :Making a yellow elephant run like a cheetah (without it even noticing)[J]. Proceedings of the VLDB Endowment, 2010,3 (12) : 518-529.
  • 8Shivnath B. Towards automatic optimization of MapReduce programs[C]//Proc of the 1st ACM Symposium on Cloud Computing, 2010 : 137-142.
  • 9Herodotou H,Lim H, Luo G, et al. Starfish: A self-tuning system for big data analytics[C]//Proc of the 5th Cidr Conf, 2011 : 261-272.
  • 10Shi Ju-wei,Zhou Jia, Lu Jia-heng, et al. MRTuner:A toolkit to enable holistic optimization for MapReduce )obs[C]//Proc of the VLDB Endowment, 2014,7(13) : 1319-1330.

共引文献46

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部