期刊文献+

面向MapReduce计算的大规模集群通信优化 被引量:4

Communication optimization for large-scale cluster aiming MapReduce computing
下载PDF
导出
摘要 为了优化大规模集群运行MapReduce作业时的通信效率和减少shuffle数据传输量,首先采用存储局部性换取通信局部性的策略建立一个分布式协同数据映射模型;其次通过随机抽样和机器学习方法来提取作业数据的局部性特征,实现map计算数据的有效部署;最后,利用软件定义网络的全局灵活控制能力,优选通信链路好的节点并将计算任务映射到该类节点中。实验表明对于中间数据混洗密集类作业有较好的优化效果,通信延迟降低了4.3%~5.8%。该方案能减少shuffle流量和数据迁移延迟,并且适合各种调度策略和网络拓扑结构。 To optimize communication efficiency and reduce the data transmission of shuffle in large-scale clusters running for MapReduce jobs,this paper built a distributed collaborative data mapping model by replacing the communication locality using storage locality.Then it extracted the local features of jobs by random sampling and machine learning method in order to realize the effective deployment of map tasks.Finally,it selected the nodes with good communication links based on the software define network technology due to its global flexible control capabilities,and scheduled the map tasks to such nodes.Experimental results show that the model has better optimization effect on shuffle-intensive jobs.The communication delay is reduced by 4.3% to 5.8%.This solution can reduce shuffle traffic and data migration delay and it is suitable for various scheduling strategies and network topologies.
作者 曹云鹏 王海峰 刘海涛 何淑庆 Cao Yunpeng;Wang Haifeng;Liu Haitao;He Shuqing(School of Information Science&Engineering,Linyi University,Linyi Shandong 276000,China;Shandong Provincial Key Laboratory of Network-based Intelligent Computing,Linda Institute,Linyi Shandong 276000,China)
出处 《计算机应用研究》 CSCD 北大核心 2020年第4期1174-1178,共5页 Application Research of Computers
基金 山东省自然科学基金面上项目(ZR2017MF050) 山东省高等学校科学技术计划项目(J17KA049) 山东省重点研发项目(2019GGX1005,2018GGX101005,2017CXGC0701,2016GGX109001)。
关键词 数据通信优化 MAPREDUCE 软件定义网络 协同数据映射 data communication optimization MapReduce software-defined network collaborative data mapping
  • 相关文献

参考文献3

二级参考文献33

  • 1杨家军 曹喜滨.模糊神经网络在小卫星成本估算领域的应用[J].哈尔滨工业大学学报,1998,.
  • 2张帆.光学遥感卫星总体参数优化研究:硕士学位论文[M].哈尔滨工业大学,1998..
  • 3杨家军,哈尔滨工业大学学报,1998年
  • 4张帆,硕士学位论文,1998年
  • 5Dean J - Ghemawat S. MapReduce: Simplified data processingon large clusters//Proceedings of the OSDI- 04. Berkeley,U SA, 2004: 27-39.
  • 6Isard M , Budiu M , Yu Y , et al. Dryad: distributed dataparallelprograms from sequential building blocks//Proceedingsof the EuroSys-07. New York, USA, 2 0 0 7 : 59-72.
  • 7Murray D G, Schwarzkopf M, Smowton C, et al. CIEL: Auniversal execution engine for distributed data-flow computing//Proceedings of the N SD P11. Boston, U SA , 2011 : 227236.
  • 8MalewiczG, Austern M II, Bik A J C, et al. Pregel: Asystem for large-scale graph processing//Proceedings of theSIGMOD'10, Indiana, U SA, 2 0 10: 135-146.
  • 9Zaharia M , Chowdhury M , Franklin M J , et al. Spark:Cluster computing with working sets//Proceedings of theIlotCloud,10. Boston, U S A ,2010: 27-39.
  • 10Chowdhury M , Zaharia M , Ma J , et al. Managing datatransfers in computer clusters with orchestra//Proceedings ofthe SIGCOMM'll . Toronto, Canada, 2 0 1 1 : 98-109.

共引文献14

同被引文献45

引证文献4

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部