大规模分布式存储系统中数据修复策略的研究

Analysis of Data Recovery Strategy in Large Scale Distributed Storage System

导出

摘要在大规模分布式存储系统中,为了保证数据的可用性和可靠性,需要对数据进行一定的冗余存储。当节点失效后,有必要对失效节点所存储的数据进行修复以提供数据的可用性保证。然而,由于节点失效行为的不可预测性,何时对数据进行修复成为难题。目前,许多系统采用了立即修复的策略,但是这种方式会给系统负载带来大量不必要的浪费。通过对节点失效行为和副本数量的分析,提出了基于平均偏移的两阶段数据修复策略。实验证明,该策略在保证系统副本可用性的前提下,有效地降低了数据修复过程对系统的负载压力,提高了集群系统的系统稳定性。 Large scale distributed storage system provide data availability and reliability by means of a given level of redundancy. To assure data availability in case of node failures, the data stored on the failed node need to be recovered. However, since the unpredictability of node failures, deciding when to recover the data is difficult. At present, many systems adopt a reactive approach which tends to waste the system resources profusely. According to the analysis of the behaviors of node failures and the number of replicas, this paper presents a staged data recovery stratery base on average offset, and the experiment shows that in the case of availability. It reduces the workloads of the process of data recovery on the system effectively and enhances the stability of system.

作者任飞王念秋段翰聪

机构地区电子科技大学计算机科学与工程学院

出处《互联网天地》 2013年第2期7-12,共6页 China Internet

基金国家科技重大专项基金资助项目(No.2010ZX03004-001-02 No.2011ZX03002-003-02 No.2012ZX03002-004-004) 四川省战略性新兴产业发展促进项目(No.SC2011510703006)

关键词数据修复副本冗余度节点失效平均偏移分布式存储 data recovery, replica redundancy, node failure, average offset, distributed storage

分类号 TP309.3 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献13

1Elvin S,Alexandru C,Valentin C. Fault Tolerance and Recovery in Grid Workflow Management Systems[M].CISIS,2010.
2Qin Xin,Ethan L M,Thomas J E. Evalution of distributed recovery in large-scale storage system[J].High Performance Distributed Computing,2004.
3Weatherspoon H,Kubiatowicz J. Erasure Coding vs replication:a quantitative comparison[A].Berkeley,CA,USA,2002.
4Steve C C,Alok N C,Mahmut T K. Fault Recovery Designs for Processor-Emhedded Distributed Storage Architectures with I/O-Intensive DB Workloads[M].MSST,2005.
5Bhagwan R,Tati K,Chen Y. Total Recall:System Support for Automated Availability Management[A].San Francisco,California,USA,2004.
6Paul Stelling,Ian Foster,Carl Kesselman. A fault detection service for wide area distributed computations[J].Cluster Computing,1999,(02):117-128.
7Zhang Xianan,Dmitrii Zagorodnov,Matti Hiltuen. Fault-tolerant grid services using primary-backup:feasibility and performance[A].2004.
8Sanjay Ghemawat,Howard Gobioff,Shun-Tak Leung. The Google File System[A].2003.
9Richard Golding,Elizabeth Borowsky. Fault-tolerant replication management in large-scale distributed storage systems[A].1999.
10Michael Treaster. A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems[M].

二级参考文献56

1Zhang Z,Lin S,Lian Q,Jin C.RepStore:A self-managing and self-tuning storage backend with smart bricks.In:Proc.of the Int'l Conf.on Autonomic Computing.2004.122-129.http://ieeexplore.ieee.org/xpl/freeabs_all.jsp-arnumber=1301355&fromcon
2Stoica I,Morris R,Karger D,Kaashoek M,Balakrishnan H.Chord:A scalable peer-to-peer lookup service for internet applications.Proc.of the 2001 SIGCOMM Conf.,2001,31(4):149-160.
3Zhao B,Kubiatowicz J,Joseph A.Tapestry:An infrastructure for fault-tolerant wide-area location and routing.Technical Report,UCB//CSD-01-1141,Berkeley Computer Science Division,University of California,2001.
4Ratnasamy S,Francis P,Handley M,Karp R,Schenker S.A scalable content-addressable network.In:Proc.of the ACM SIGCOMM Symp.on Communication,Architecture,and Protocols.ACM SIGCOMM,2001.161-172.http://www.acm.org/sigs/ sigcomm/sigcomm/sigcomm2001/p13-ratnasamy.pdf
5Rowstron A,Druschel P.Pastry:Scalable,distributed object location and routing for large-scale peer-to-peer systems.In:Proc.of the IFIP/ACM Int'l Conf.on Distributed Systems Platforms (Middleware).2001.329-350.http://citeseer.ist.psu.edu/ rowstron01pastry.html
6Maymounkov P,Mazieres D.Kademlia:A peer-to-peer information system based on the XOR metric.In:Proc.of the 1st Int'l Workshop on Peer-to-Peer Systems.2002.258-263.http://citeseer.ist.psu.edu/maymounkov02kademlia.html
7Schlosser M,Sintek M,Decker S,Nejdl W.HyperCuP-Hypercubes,ontologies and efficient search on P2P networks.In:Proc.of the Int'l Workshop on Agents and Peer-to-Peer Computing.2002.112-124.http://citeseer.ist.psu.edu/532386.html
8Mitzenmacher M.Digital fountains:A survey and look forward.In:Proc.of the Information Theory Workshop.2004.271-276.http://ieeexplore.ieee.org/xpls/abs_all.jsp-arnumber=1405313
9Plank J.A tutorial on reed-solomon coding for fault-tolerance in RAID-like systems.Software Practice and Experience,1997,27(9):995-1012.
10Chun B,Dabek F,Haeberlen A,Sit E,Weatherspoon H,Kaashoek M,Kubiatowicz J,Morris R.Efficient replica maintenance for distributed storage systems.In:Proc.of the 3rd Symp.on Networked Systems Design and Implementation.2006.45-58.http://oceanstore.cs.berkeley.edu/publications/papers/pdf/carbonite06.pdf

共引文献51

1王文奎,吴国新.一种对等式存储系统的设计与实现[J].计算机技术与发展,2008,18(4):236-238. 被引量：1
2龚星耀,张强,姜志宽.基于对等网络的分布式存储系统的设计与实现[J].现代电子技术,2008,31(16):116-118.
3程耀东,汪璐,刘爱贵,陈刚.面向高能物理计算的网格文件系统[J].计算机科学,2008,35(11):36-38. 被引量：2
4邵清,丁永生,胡志华,魏赟.延迟容忍网络中路径失效问题的容错研究[J].计算机工程与应用,2009,45(10):102-105. 被引量：1
5蔡亮,黄浩.基于存储资源主题分组的P2P存储系统[J].计算机工程,2009,35(7):76-77.
6宋玮,赵跃龙,曾文英,王文丰.一种面向服务的P2P存储系统模型[J].计算机工程,2009,35(8):91-93. 被引量：2
7张宇翔,杨冬,张宏科.P2P网络中Churn问题研究[J].软件学报,2009,20(5):1362-1376. 被引量：21
8武腾,薛磊,郑东,柳晓光.P2P持久存储系统可靠性分析与数据维护优化[J].信息安全与通信保密,2009(8):149-153.
9夏磊,刘鹏,袁致晓,赵梦欣.P2P环境下存储系统的构架与可靠性分析[J].计算机技术与发展,2009,19(9):79-82.
10孟宪福,王敏.基于改进免疫克隆选择的对等网络任务调度机制[J].计算机集成制造系统,2009,15(9):1795-1802. 被引量：4

1金鑫,吕振肃,祝婧.一种图象前背景分离的方法及实现[J].微计算机信息,2010,26(8):192-194.
2熊平,白云鹏.带宽自适应Mean Shift图像分割算法[J].计算机工程与应用,2013,49(23):174-176. 被引量：5
3王晋东,沈柳青,王坤,王娜.网络安全态势预测及其在智能防护中的应用[J].计算机应用,2010,30(6):1480-1482. 被引量：18
4吴锡,何晋,王玉,谢明元,周激流.改进Wiener滤波弥散加权磁共振图像Rician噪声复原[J].中国生物医学工程学报,2013,32(2):135-140.
5郇丹丹,李祖松,胡伟武,刘志勇.结合访存失效队列状态的预取策略[J].计算机学报,2007,30(7):1104-1114. 被引量：3
6王强,陆阳,吴雷,魏臻.包含错误恢复的软件可靠性仿真研究[J].系统仿真学报,2013,25(5):887-893.
7张丹青,江建慧,陈林博.一种对程序故障行为和失效行为的聚类有效性验证方法[J].中国科学：信息科学,2014,44(10):1323-1344. 被引量：3
8郇丹丹,李祖松,胡伟武,刘志勇.Cache自适应写分配策略[J].计算机研究与发展,2007,44(2):348-354. 被引量：2
9张婷婷,张德平,刘国强.基于EM方法的隐Markov软件可靠性模型[J].计算机科学,2016,43(8):159-164. 被引量：2
10徐炜珊,于磊.一种基于构件失效传播的软件可靠性建模方法[J].信息工程大学学报,2015,16(5):619-624. 被引量：1

互联网天地

2013年第2期

浏览历史

内容加载中请稍等...

大规模分布式存储系统中数据修复策略的研究

参考文献13

二级参考文献56

共引文献51

相关作者

相关机构

相关主题

浏览历史