摘要
针对Hadoop开源云计算平台下MapReduce并行编程模型中间数据分配不均衡的问题,提出基于抽样的改进型MapReduce模型,即SMR(Sample MapReduce)模型.SMR模型采用MapReduce作业方式对各分块数据集进行并行抽样,基于抽样结果,利用LAB(leen and balance)均衡算法对Map端输出的中间数据进行均衡分配,以改善Reduce端处理数据负载不均衡问题.实验结果表明:改进型MapReduce模型可以有效减少作业运行时间,Reduce端输入数据达到负载均衡.
The existing intermediate data allocation strategy of MapReduce parallel programming model on Hadoop open source computing platform was investigated , to consider the problem of partitioning imbal-ance .The improved MapReduce model ,SMR ( Sample-MapReduce ) was proposed .In order to provide a load-balanced partition scheme ,the MapReduce method was adopted to parallelly sample the data block , and LAB ( leen and balance ) balancing allocation strategy was used to distribute the output intermediate data of Map task .The results show that the improved MapReduces model significantly reduces the running time of the MapReduce job ,and the input data of reduce task achieve load balance .
出处
《湖北民族学院学报(自然科学版)》
CAS
2015年第2期205-209,共5页
Journal of Hubei Minzu University(Natural Science Edition)