期刊文献+

一种改进的Hadoop数据放置策略 被引量:35

An Improved Data Placement Strategy for Hadoop
下载PDF
导出
摘要 采用现有的Hadoop默认数据放置策略时,若本地数据副本失效,从远程结点上恢复数据需要耗费大量数据传输时间,且随机选取数据放置结点可能会影响数据放置的负载均衡.为此,文中提出一种改进的数据放置策略.该策略基于结点网络距离与数据负载计算每个结点的调度评价值,据此选择一个最佳的远程数据副本的放置结点,从而既能实现数据放置的负载均衡,又能实现良好的数据传输性能.在Hadoop平台上实现了所提出的数据副本放置改进策略,结果表明,与系统默认策略相比,文中提出的策略不仅可以改进数据放置的负载均衡,而且可以减少数据副本放置的时间. In the existing default data placement strategy for Hadoop,much time is needed to restore data from a remote DataNode when the local replicas become unavailable,and the load balancing may be destroyed due to the random selection of DataNode for data storage.In order to solve these problems,an improved data placement strategy is proposed,which chooses the most appropriate DataNode to place remote replicas according to the scheduling evaluation value of each DataNode based on DataNodes' network distance and data load.Thus,the load balancing for data storage is implemented and excellent data transmission is achieved.The proposed data placement strategy is then implemented in the Hadoop platform and the results show that the proposed strategy is superior to the existing default data placement strategy because it improves the local balancing for data storage and reduces the time for data placement.
作者 林伟伟
出处 《华南理工大学学报(自然科学版)》 EI CAS CSCD 北大核心 2012年第1期152-158,共7页 Journal of South China University of Technology(Natural Science Edition)
基金 国家自然科学基金资助项目(61070015) 广东省自然科学基金资助项目(10451064101005155 S2011010001754 9451063101002213) 广东省科技计划项目(2010B010600032)
关键词 HADOOP 数据放置 负载均衡 策略 Hadoop data placement load balancing strategy
  • 相关文献

参考文献12

  • 1Rajkumar Buyya,Chee Shin Yeo,Srikumar Venugopal,et al.Cloud computing and emerging IT platforms:vision,hype,and reality for delivering computing as the 5 th utility[J].Future Generation Computer Systems,2009,25 (6):599-616.
  • 2林伟伟,齐德昱,刘波.基于资源融合的网格任务调度模型与算法[J].华南理工大学学报(自然科学版),2008,36(1):32-37. 被引量:4
  • 3Dean J,Ghemawat S.MapReduce:simplified data processing on large clusters[J].Communications of the ACM,2008,51 (1):107-113.
  • 4Borthakur D.Hadoop[EB/OL].[2011-06-15].http://lucene.apache.org/hadoop.
  • 5栾亚建,黄翀民,龚高晟,赵铁柱.Hadoop平台的性能优化研究[J].计算机工程,2010,36(14):262-263. 被引量:51
  • 6Ghemawat S,Gogioff H,Leung P T.The google file system[C] //Proc of the 19 th ACM Symp on Operating Systems Principles.New York:ACM,2003:29-43.
  • 7曹宁,吴中海,刘宏志,张齐勋.HDFS下载效率的优化[J].计算机应用,2010,30(8):2060-2065. 被引量:23
  • 8John Dorion.Applications powered by Hadoop[EB/OL].[2011-06-15].http://wiki.apache.org/hadoop/PoweredBy.
  • 9Amazon.Amazon elastic compute cloud[EB/OL].[2011-06-15].http://aws.amazon.com/ec2.
  • 10Borthakur D.The hadoop distributed file system:architecture and design[EB/OL].[2011-06-15].http://hadoop.apache.org/hdfs/docs/current/hdfs _ design.html.

二级参考文献28

  • 1虞云翔.嵌入式Linux系统中Overlay文件系统的实现[J].微电子学与计算机,2005,22(10):175-178. 被引量:3
  • 2林伟伟,齐德昱.树型网格环境TGrid的模型及算法[J].华南理工大学学报(自然科学版),2007,35(1):89-93. 被引量:4
  • 3Dean J,Ghemawat S.MapReduce:Simplified Data Processing on Large Cluster[C] //Proc.of OSDI'04.Boston,MA,USA:[s.n.] ,2004.
  • 4Hadoop Distributed Filesystem[EB/OL].(2008-12-13).http://hadoop.apache.org/hdfs/.
  • 5IBM Research.Cloud Analytics:Do We Really Need to Reinvent the Storage Stack?[Z].2009.
  • 6Apache Hadoop[EB/OL].(2009-09-12).http://hadoop.apache.org/.
  • 7HADOOP Wi-ki[EB/OL].[2009-07-01].http://wiki.apache.org/hadoop/.
  • 8GHEMAWAT S,GOBIOFF H,LEUNG S T.The google file system.[EB/OL].[2009-07-01].http://labs.google.com/papers/gfs.html.
  • 9DEAN Jean,GHEMAWAT S.Map/reduce:simplified data processing on large clusters[EB/OL].[2009-07-01].http://static.googleusercontent.com/external _ content/untrusted _ dlcp/labs.google.com/zh-CN//papers/mapreduce-osdi04.pdf.
  • 10Map/Reduce[EB/OL].[2009 -07 -01].http://wiki.apache.org/hadoop/HadoopMapReduce.

共引文献71

同被引文献263

引证文献35

二级引证文献124

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部