摘要
针对Hadoop平台Map Reduce分布式计算模型运行机制中的顺序制约而产生的计算资源浪费问题,从提高平台中每个执行节点的细粒度并行数据处理角度出发,结合Java共享内存多线程编程技术,对该模型进行了优化,提出一种Map Reduce+Open MP粗细粒度相结合的分布式并行计算模型。并在由四个节点组成的Hadoop集群环境下对不同规模大小的出租车GPS轨迹数据分析处理,验证该模型的性能和效率,实验结果证明Map Reduce+Open MP分布式并行计算模型确实能够提高针对大数据集的计算效率,是对Hadoop平台大数据分析处理模型有效的完善和优化。
Sequential control of running mechanism of MapReduce model on Hadoop platform can lead to waste of computingresources. From the perspective of the fine-grained parallel data processing of each node, combined withmulti-threads technique of Java shared memory, this paper optimizes MapReduce model and puts forward a MapReduce+OpenMP framework. This model is a distributed and parallel computing architecture based on Hadoop cloud platform,which combines computing resources of coarse and fine granularity. After programming and realizing on the GPS trajectorydata of the taxi in the Hadoop distributed cluster environment, the results show that this distributed parallel computingmodel can really improve the computing efficiency of processing big data set, and it is an effective optimization andimprovement to the MapReduce model of big data processing.
作者
张红
王晓明
曹洁
马彦宏
郭义戎
王慜
ZHANG Hong;WANG Xiaoming;CAO Jie;MA Yanhong;GUO Yirong;WANG Min(College of Electrical & Information Engineering, Lanzhou University of Technology, Lanzhou 730050, China;College of Computer & Communication, Lanzhou University of Technology, Lanzhou 730050, China;State Grid Gansu Electric Company, Lanzhou 730030, China)
出处
《计算机工程与应用》
CSCD
北大核心
2016年第22期22-25,共4页
Computer Engineering and Applications
基金
甘肃省自然科学基金(No.148RJZA019)
甘肃省科技支撑计划基金(No.1304GKCA023)