摘要
在快速进行海量数据处理的电力应用中,Spark变得越来越重要,但其配置参数空间大且参数之间关联关系复杂,基于经验通过手动调整参数以获得最佳性能极其困难,故而提出一种基于Spark的配置优化方法。选取对Spark性能影响活跃的配置参数,通过MCMC采样和生成对抗网络(GAN)生成数据集;通过分层建模构建性能模型;通过粒子群算法在参数空间有效搜索应用程序的最佳配置。实验结果表明,采用所提出的方法使得Spark的性能相比经验调优平均提高了25%。
Spark is becoming more and more important in power applications where massive data should be rapidly processed,but its configuration parameter space is large and the relationship between parameters is complex.It is extremely difficult to manually adjust parameters based on experience to obtain the best performance.Therefore,this paper proposes a configuration optimization method based on Spark.The configuration parameters that have an active impact on Spark performance are selected,and the dataset is generated through MCMC sampling and generative adversarial network(GAN);The performance model is constructed through hierarchical modeling.The optimal configuration of the application is efficiently searched in the parameter space by the particle swarm optimization(PSO)algorithm.The experimental results show that the performance of Spark is improved by an average of 25%compared with empirical tuning by the method based on experience.
作者
沈伍强
沈桂泉
许明杰
杨春松
王召
SHEN Wuqiang;SHEN Guiquan;XU Mingjie;YANG Chunsong;WANG Zhao(Information Center of Guangdong Power Grid Co.,Ltd.,Guangzhou 510000,China;Guodian NARI Technology Co.,Ltd.,Nanjing 210000,China)
出处
《微型电脑应用》
2024年第2期93-96,105,共5页
Microcomputer Applications
基金
南方电网公司科技项目资助(037800KK52190012)。