摘要
云计算MapReduce并行编程模型广泛应用于数据密集型应用领域,基于该模型的开源平台Hadoop在大数据领域获得了成功应用。然而,对于计算密集型任务,特别是迭代运算,频繁启动Map和Reduce过程将导致负载过大,影响计算效率。弹性分布式数据集(RDD)是一种基于内存的集群计算模型,有效地支持迭代运算,能够克服负载过大的问题。因此提出基于RDD模型的并行差分进化算法SparkDE。SparkDE首先将整个种群划分为若干个独立岛,然后将一个岛对应RDD中的一个分区,每个岛在RDD的一个分区中独立进化指定代数后,利用迁移算子在岛之间交换信息。利用标准测试问题对SparkDE、基于MapReduce模型的MRDE和基本DE进行对比实验研究。实验结果表明SparkDE求解精度高,计算速度快,加速效果明显,可以作为云计算平台的下一代优化器。
MapReduce is a popular cloud computing model which has been applied in data-intensive fields, and Hadoop which is based on MapReduce has been successfully used in dealing with big data. However, when dealing with compu- tation-intensive tasks, particularly iterative computation, frequent loading of Map and Reduce processes will lead to overload. Resilient distributed dataset has been implemented in Spark, and it is an in-memory clustering computing mod- el which can overcome this shortcoming efficiently. In this paper, a parallel version of differential evolution based on RDD (resilient distributed datasets) model named SparkDE was proposed. In SparkDE, the whole population is divided into several islands which evolve on their own, and then each island is deployed into a partition of RDD. After evolution for predefined generation in each island,migration operator is used calculation between islands. A wide range of bench- mark problems are adopted to conduct numerical experiments. Compared with MapReduce (MRDE) based DE and clas- sical DE,the results show that SparkDE can achieve higher accuracy of solution and faster speed of computation. The speedup of SparkDE is obvious. Thus SparkDE can serve as the next generation of optimizer in cloud computing.
出处
《计算机科学》
CSCD
北大核心
2016年第9期116-119,139,共5页
Computer Science
基金
国家自然科学基金资助项目(61364025)
武汉大学软件工程重点实验室开放基金资助项目(SKLSE2012-09-39)
江西省教育厅科学技术资助项目(GJJ13729
GJJ14742)资助
关键词
并行差分进化算法
岛模型
弹性分布式数据集
转换操作
控制操作
Parallel differential evolution, Island model, Resilient distributed datasets, Transformation operation, Actionoperation