期刊文献+

适应节能与异构环境的MapReduce数据布局策略 被引量:2

An Energy-Efficient and Heterogeneous Environment Adaptive Data Layout Strategy for MapReduce
下载PDF
导出
摘要 大数据处理过程中产生的高能耗问题亟待解决,尤其是在数据量规模剧增的背景下。在对已有数据布局策略存在问题分析的基础上,分析了与基于存储区域划分的节能模式及与异构HDFS集群的不适应、数据块切分算法不灵活、存储节点选择的随机性等几个方面的问题,继而提出面向节能的MapReduce数据布局策略。首先,新策略适应将集群划分为不同存储区域(Active-Zone与Sleep-Zone)的节能模式;其次,新策略对传统的数据块数计算方法进行了改进,提出作业截止时间约束下的最小任务数计算方法确定数据块数量;最后,新的存储策略增加了对异构集群环境的适应能力,并能根据不同的作业类型进行存储节点的选择。实验结果表明:新的数据布局策略能够适应异构集群环境,达到减小MapReduce作业能耗的目的。 The problem of high energy consumption producing from big data processing is an important issue that needs to be solved,especially under the background of data explosion. Based on analyzing problems of the existing data layout policy,the problems of the in adaptation of energy-saving mode based on storage area division and heterogeneous HDFS cluster,the inflexibility of data block segmentation algorithm,the randomness of storage node selection,proposing a data layout strategy orienting to energy conservation are analyzed. Firstly,the new strategy divides the cluster into two different storage areas to meet the needs of saving energy: Active-Zone and Sleep-Zone; secondly,the new strategy has made im-provements on traditional data block computing method,proposes a minimum number of jobs calculation method to determine the number of data blocks; at last,the new strategy can increase the adaptability of the heterogeneous cluster environment and can choose the appropriate storage nodes according to different job types. Experimental results show that the new data layout strategy can adapt to the heterogeneous cluster environment and reach the goal of reducing energy consumption for MapReduce jobs.
出处 《中山大学学报(自然科学版)》 CAS CSCD 北大核心 2015年第6期55-66,共12页 Acta Scientiarum Naturalium Universitatis Sunyatseni
基金 国家自然科学基金资助项目(61562078 61262088 71261025) 新疆财经大学博士启动基金资助项目(2015BS007)
关键词 绿色计算 MAPREDUCE 异构环境 数据布局 green computing MapReduce heterogeneous environment data layout
  • 相关文献

参考文献27

  • 1孟小峰,慈祥.大数据管理:概念、技术与挑战[J].计算机研究与发展,2013,50(1):146-169. 被引量:2396
  • 2DEAN J, GHEMAWAT S. MapReduce: Simplifed data processing on large clusters [ C l //Proceedings of the Conference on Operating System Design and Implementa- tion (OSDI) , New York: ACM, 2004: 137- 150.
  • 3GHEMAWAT S, GOBIOFF H, LEUNG S T. The google gile system [ C]//Proceedings of 19th ACM Symposium on Operating System Principles, New York : ACM, 2003 : 29 - 43.
  • 4BORTHAKU D. The hadoop distributed file system: Ar- chitecture and design [ EB/OL ]. (2007 - 07 - 01 ) [ 2011 - 2 - 12 ], http :///hadoop. apache, org/common/ docs/r0.18.2/hdfs_design, pdf.
  • 5CHANG F, DEAN J, GHEMAWAT S, et al. Bigtable: A Distributed Storage System for Structured Data [ C ]// Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI), Seattle, WA, USA, 2006 : 205 - 218.
  • 6王鹏,孟丹,詹剑锋,涂碧波.数据密集型计算编程模型研究进展[J].计算机研究与发展,2010,47(11):1993-2002. 被引量:39
  • 7GANTZ J, CHUTE C, MANFREDIZ A, et al. The di- verse and exploding digital universe: An updated forecast of worldwide information growth through 2011 [ EB/OL]. [2013 - 5 - 25 ], http: //wwww. ifap. ru/ library/ book268, pdf.
  • 8Global Action Plan. An inefficient truth [ EB/OL ]. Global action plan report, 2007 [2011 -02 - 12] , http: //globalaetionplan. org. uk.
  • 9TIMES N Y. Power, Pollution and the Internet [ EB/ OL]. [2013 - 5 - 20], http: //www. nytimes, corn/ 2012/09/23/ technology/ data-eeneters-waste-vast-a- mounts-of-energy-belying-industry-image, html.
  • 10于炯,廖彬,张陶,孙华,国冰磊,杨兴耀.云存储系统节能研究综述[J].计算机科学与探索,2014,8(9):1025-1040. 被引量:10

二级参考文献308

  • 1Wikipedia. Cloud computing [EB/OL]. [ 2008-11 -16 ]. http ://en. wikipedia, org/wiki/Cloud computing.
  • 2Ghemawat S, Gobioff H, Leung S. The Google file system [C] //Proc of the 19th ACM Symp on Operating System Principles(SOSP). New York, ACM, 2003:29-43.
  • 3Dean J, Ghemawat S. MapReduee: Simplified data processing on large clusters [C] //Proc of the 6th USENIX Symp on Operating Systems Design and Implementation (OSDI). San Francisco: USENIX Association, 2004: 137- 150.
  • 4Chang F, Dean J, Ghemawat S. et al. Bigtable: A distributed storage system for structured data [C] //Proc of the 7th USENIX Syrup on Operating Systems Design and Implementation(OSDI). San Francisco: USENIX Association, 2006:205-218.
  • 5Amazon Web Services. Amazon Elastic Compute Cloud [EB/OL]. [2008-12-01]. http://aws, amazon, com/cc2/.
  • 6Amazon Web Services. Amazon Simple Storage Service [EB/OL]. [2008- 12-01]. http://aws, amazon, com/s3/.
  • 7Patterson D, Technical perspective: The data center is the computer[J]. Communications of the ACM, 2008, 51(1) 105-105.
  • 8Bryant R. Data intensive supercomputing: The case for DISC [R/OL]. [2008-12- 10]. http://www, cs. cmu. edu/-bryant/ pubdir/emu cs 07-128. pdf.
  • 9Bell G, Gray J, Alex S. Petascale computational systems: Balanced cyberInfrastructure in a data centric world [J].Computer, 2006, 39(1): 110-112.
  • 10Newman H, Ellisman M, Orcutt J. Data-intensive e-science frontier research [J]. Communications of the ACM, 2003, 46(11) :68-77.

共引文献2519

同被引文献12

引证文献2

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部