期刊文献+

大规模分布式存储系统中数据修复策略的研究

Analysis of Data Recovery Strategy in Large Scale Distributed Storage System
原文传递
导出
摘要 在大规模分布式存储系统中,为了保证数据的可用性和可靠性,需要对数据进行一定的冗余存储。当节点失效后,有必要对失效节点所存储的数据进行修复以提供数据的可用性保证。然而,由于节点失效行为的不可预测性,何时对数据进行修复成为难题。目前,许多系统采用了立即修复的策略,但是这种方式会给系统负载带来大量不必要的浪费。通过对节点失效行为和副本数量的分析,提出了基于平均偏移的两阶段数据修复策略。实验证明,该策略在保证系统副本可用性的前提下,有效地降低了数据修复过程对系统的负载压力,提高了集群系统的系统稳定性。 Large scale distributed storage system provide data availability and reliability by means of a given level of redundancy. To assure data availability in case of node failures, the data stored on the failed node need to be recovered. However, since the unpredictability of node failures, deciding when to recover the data is difficult. At present, many systems adopt a reactive approach which tends to waste the system resources profusely. According to the analysis of the behaviors of node failures and the number of replicas, this paper presents a staged data recovery stratery base on average offset, and the experiment shows that in the case of availability. It reduces the workloads of the process of data recovery on the system effectively and enhances the stability of system.
出处 《互联网天地》 2013年第2期7-12,共6页 China Internet
基金 国家科技重大专项基金资助项目(No.2010ZX03004-001-02 No.2011ZX03002-003-02 No.2012ZX03002-004-004) 四川省战略性新兴产业发展促进项目(No.SC2011510703006)
关键词 数据修复 副本冗余度 节点失效 平均偏移 分布式存储 data recovery, replica redundancy, node failure, average offset, distributed storage
  • 相关文献

参考文献13

  • 1Elvin S,Alexandru C,Valentin C. Fault Tolerance and Recovery in Grid Workflow Management Systems[M].CISIS,2010.
  • 2Qin Xin,Ethan L M,Thomas J E. Evalution of distributed recovery in large-scale storage system[J].High Performance Distributed Computing,2004.
  • 3Weatherspoon H,Kubiatowicz J. Erasure Coding vs replication:a quantitative comparison[A].Berkeley,CA,USA,2002.
  • 4Steve C C,Alok N C,Mahmut T K. Fault Recovery Designs for Processor-Emhedded Distributed Storage Architectures with I/O-Intensive DB Workloads[M].MSST,2005.
  • 5Bhagwan R,Tati K,Chen Y. Total Recall:System Support for Automated Availability Management[A].San Francisco,California,USA,2004.
  • 6Paul Stelling,Ian Foster,Carl Kesselman. A fault detection service for wide area distributed computations[J].Cluster Computing,1999,(02):117-128.
  • 7Zhang Xianan,Dmitrii Zagorodnov,Matti Hiltuen. Fault-tolerant grid services using primary-backup:feasibility and performance[A].2004.
  • 8Sanjay Ghemawat,Howard Gobioff,Shun-Tak Leung. The Google File System[A].2003.
  • 9Richard Golding,Elizabeth Borowsky. Fault-tolerant replication management in large-scale distributed storage systems[A].1999.
  • 10Michael Treaster. A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems[M].

二级参考文献56

  • 1Zhang Z,Lin S,Lian Q,Jin C.RepStore:A self-managing and self-tuning storage backend with smart bricks.In:Proc.of the Int'l Conf.on Autonomic Computing.2004.122-129.http://ieeexplore.ieee.org/xpl/freeabs_all.jsp-arnumber=1301355&fromcon
  • 2Stoica I,Morris R,Karger D,Kaashoek M,Balakrishnan H.Chord:A scalable peer-to-peer lookup service for internet applications.Proc.of the 2001 SIGCOMM Conf.,2001,31(4):149-160.
  • 3Zhao B,Kubiatowicz J,Joseph A.Tapestry:An infrastructure for fault-tolerant wide-area location and routing.Technical Report,UCB//CSD-01-1141,Berkeley Computer Science Division,University of California,2001.
  • 4Ratnasamy S,Francis P,Handley M,Karp R,Schenker S.A scalable content-addressable network.In:Proc.of the ACM SIGCOMM Symp.on Communication,Architecture,and Protocols.ACM SIGCOMM,2001.161-172.http://www.acm.org/sigs/ sigcomm/sigcomm/sigcomm2001/p13-ratnasamy.pdf
  • 5Rowstron A,Druschel P.Pastry:Scalable,distributed object location and routing for large-scale peer-to-peer systems.In:Proc.of the IFIP/ACM Int'l Conf.on Distributed Systems Platforms (Middleware).2001.329-350.http://citeseer.ist.psu.edu/ rowstron01pastry.html
  • 6Maymounkov P,Mazieres D.Kademlia:A peer-to-peer information system based on the XOR metric.In:Proc.of the 1st Int'l Workshop on Peer-to-Peer Systems.2002.258-263.http://citeseer.ist.psu.edu/maymounkov02kademlia.html
  • 7Schlosser M,Sintek M,Decker S,Nejdl W.HyperCuP-Hypercubes,ontologies and efficient search on P2P networks.In:Proc.of the Int'l Workshop on Agents and Peer-to-Peer Computing.2002.112-124.http://citeseer.ist.psu.edu/532386.html
  • 8Mitzenmacher M.Digital fountains:A survey and look forward.In:Proc.of the Information Theory Workshop.2004.271-276.http://ieeexplore.ieee.org/xpls/abs_all.jsp-arnumber=1405313
  • 9Plank J.A tutorial on reed-solomon coding for fault-tolerance in RAID-like systems.Software Practice and Experience,1997,27(9):995-1012.
  • 10Chun B,Dabek F,Haeberlen A,Sit E,Weatherspoon H,Kaashoek M,Kubiatowicz J,Morris R.Efficient replica maintenance for distributed storage systems.In:Proc.of the 3rd Symp.on Networked Systems Design and Implementation.2006.45-58.http://oceanstore.cs.berkeley.edu/publications/papers/pdf/carbonite06.pdf

共引文献51

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部