期刊文献+

一种异构集群中能量高效的大数据处理算法 被引量:12

An Energy Efficient Algorithm for Big Data Processing in Heterogeneous Cluster
下载PDF
导出
摘要 集群的能量消耗已经超过了其本身的硬件购置费用,而大数据处理需要大规模的集群耗费大量时间,因此如何进行能量高效的大数据处理是数据拥有者和使用者亟待解决的问题,也是对能源和环境的一个巨大挑战.现有的研究一般通过关闭部分节点以减少能量消耗,或者设计新的数据存储策略以便实施能量高效的数据处理.通过分析发现即便使用最少的节点也存在很大的能源浪费,而新的数据存储策略对于已经部署好的集群会造成大规模的数据迁移,消耗额外的能量.针对异构集群下I/O密集型的大数据处理任务,提出一种新的能量高效算法MinBalance,将问题分为节点选择和负载均衡两个步骤.在节点选择阶段采用4种不同的贪心策略,充分考虑到节点的异构性,尽量选择最合适的节点进行任务处理;在负载均衡阶段对选择的节点进行负载均衡,以减少各个节点因为等待而造成的能量浪费.该方法具有通用性,不受数据存储策略的影响.实验表明MinBalance方法在数据集较大的情况下相对于传统关闭部分节点的方法可以减少超过60%的能量消耗. It is reported that the electricity cost to operate a cluster may well exceed its acquisition cost, and the processing of big data requires large scale cluster and long period. Therefore, energy efficient processing of big data is essential for the data owners and users, and it is also a great challenge for the energy use and environment protection. Existing methods powered down some nodes to reduce energy consumption or developed new strategies of data storage in the cluster. However, we can find that much energy is still wasted even minimal nodes are used to process the task, and new storage strategies do not suit for the deployed clusters for the extra cost of data transformation. In this paper, we propose a novel algorithm MinBalance to processing I/O intensive big data tasks energy efficiently in heterogeneous cluster. The algorithm can be divided into two steps, node selection and workload balance. In the former step, four greedy policies are used to select the proper nodes considering heterogeneity of the cluster. While in the latter step, the workloads of the selected nodes will be well balanced to avoid the energy wastes caused by waiting. MinBalance is a universal algorithm and cannot be affected by the data storage strategies. Experimental results indicate that MinBalanee can achieve over 60% energy reduction for large data sets over the traditional methods of powering down partial nodes.
出处 《计算机研究与发展》 EI CSCD 北大核心 2015年第2期377-390,共14页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61373015 61300052 41301407 61402014 61402225) 教育部高等学校博士学科点博导基金资助项目(20103218110017) 江苏高校优势学科建设工程资助项目(PAPD) 中央高校基本科研业务费专项基金项目(NP2013307 NZ2013306)
关键词 大数据 能量高效 异构性 云计算 负载均衡 big data energy efficiency heterogeneity cloud computing workload balance
  • 相关文献

参考文献23

  • 1U. S. Energy Information Administration. Net generation by energy source: Total (all sectors)[2014-12-08]. http:// www. eia. gov/electricity/monthly/epm_table_grap-her. cfm?t =epmt_1_01.
  • 2Barroso L A. The price of performance[J]. Queue, 2005, 3 (7): 48-53.
  • 3Forrest W. How to cut data centre carbon emissions? 2008[2014-12-08]. http://www.computerweekly.com/Articles/ 2008/12/05/2337 48/how-tocut-data-centre-carbon-emissions. htm.
  • 4U S Environmental Protection Agency. Report t ocongress on server and data center energy efficiency: Public Law 109-431[R/OL]. 2007[2014-10-15]. http://www. energystar. gov/ ia/ partners/ prod_ development/ downloads/EP A_Report _ Exec_ Summary_Final. pdf.
  • 5Dean J, Ghemawat S. Maplceduce , Simplified data processing on large clusters[C]//Proc of the 6th Symp on Operating System Design and Implementation. Berkeley: USENIX, Association, 2004: 137-150.
  • 6Lang W, Patel J M. Energy management for MapReduce clusters[J]. Proceedings of the VLDB Endowment, 2010, 3 (1): 129-139.
  • 7Lang W, Patel J M, Naughton J F.' On energy management, load balancing and replication[J]. ACM SIGMOD Record, 2009,38(4): 35-42.
  • 8Kim J, Rotem D. Energy proportionality for disk storage using replication[C]//Proc of the 14th Int Conf on Extending Database Technology. New York: ACM, 2011: 81-92.
  • 9Beloglazov A, Buyya R, Lee Y C, et al. A taxonomy and survey of energy-efficient data centers and cloud computing systems[J]. Advances in Computers, 2011, 82: 47-111.
  • 10Hanumaiah V, Vrudhula S. Temperature-aware DVFS for hard real-time applications on multi core processors[J]. IEEE Trans on Computers, 2012, 61(0): 1484-1494.

同被引文献79

  • 1何骏,赵远方.外军遥感测绘卫星现状与趋势研究[J].测绘与空间地理信息,2013,36(2):118-120. 被引量:1
  • 2Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters [J]. Communications of the ACM, 2008, 51(1): 107-113.
  • 3Doulkeridis C, Norvag K. A survey of large-scale analytical query processing in MapReduce [J]. The VLDB Journal, 2013, 23(3): 355-380.
  • 4Chen M S, Yu P S, Wu K. Optimization of parallel execution for multi join queries [J]. IEEE Trans on Knowledge and Data Engineering, 1996, 8(3): 416-428.
  • 5Wu S, Li F, Mehrotra S, et al. Query optimization for massively parallel data processing [C]//Proc of the 2nd ACM Symp on Cloud Computing. New York: ACM, 2011:1-13.
  • 6Zhang X F, Chen L, Wang M. Efficient multi way theta-join processing using MapReduce [J]. Proceedings of the VLDB Endowment, 2012, 5(11) : 1184-1195.
  • 7Gufler B, Augsten N, Reiser A, et al. Load balancing in MapReduce based on scalable cardinality estimates [C] //Proc of the Int Conf on Data Engineering. Piscataway, NJ: IEEE, 2012:522-533.
  • 8Yan W, Xue Y, Malin B. Scalable and robust key group size estimation for reducer load balancing in MapReduce [C]// Proc of the IEEE Int Conf on Big Data. Piscataway, NJ: IEEE, 2013:156-162.
  • 9Gufler B, Augsten N, Reiser A, et al. Handling data skew in MapReduce [C]//Proc of the 1st Int Conf on Cloud Computing and Services Science. Boca Raton, Florida: CRC Press, 2011: 574-583.
  • 10Zhou M Q, Zhang R, Zeng D D, et al. Join optimization in the MapReduce environment for column-wise data store [C] //Proc of the 6th Int Conf on Semantics Knowledge and Grid. Piscataway, NJ: IEEE, 2010:97-104.

引证文献12

二级引证文献53

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部