期刊文献+

面向纠删码存储集群的节点并发重构 被引量:1

Concurrent Node Reconstruction for Erasure-Coded Storage Clusters
下载PDF
导出
摘要 纠删码存储集群的一个关键设计目标是降低重构I/O所引起的网络流量,因为降低网络流量有助于缩短重构时间,进而提高可靠性.针对2个或多个失效节点并发重构这一研究话题,提出一种交叉式重构方案(interleaved reconstruction scheme,IRS).所有替换节点能协同、并行地重构所有失效分块.通过对现有集中式重构方案(centralized reconstruction scheme,CRec)和分散式重构方案(decentralizedreconstruction scheme,DRec)的I/O流进行分析,分析发现CRec中存储管理器和DRec中替换节点是重构性能的瓶颈.针对此,IRS从2个方面进行改进:1)替换节点充当重构节点进行并行式重构,消除CRec中管理器这一重构瓶颈;2)利用纠删码的编码结构特性,所有替换节点协同地重构所有失效分块,确保重构时只传输一次所需存活分块.在Reed-Solomon码存储集群上实现了上述3个重构方案,并用真实I/O trace进行对比测试.实验结果表明:当纠删码存储集群的编码参数为k=9和r=3时,IRS方案的双节点重构性能是其他2种重构方案的1.63倍;而3节点重构性能是其他2种重构方案的2.14倍. A key design goal of erasure-coded storage clusters is to minimize network traffic incurred by reconstruction I/Os, because reducing network traffic helps to shorten reconstruction time, which in turn leads to high system reliability. An interleaved reconstruction scheme (IRS) is proposed to address the issue of concurrently recovering two and more failed nodes. With analyzing the I/O flows of centralized reconstruction scheme (CRec) and decentralized reconstruction scheme (DRec), it is revealed that reconstruction performance bottleneck lies in the manager node for CRec and replacement nodes for DRec. IRS improves CRec and DRec from two aspects: 1) acting as rebuilding nodes, replacement nodes deal with reconstruction I/Os in a parallel manner, thereby bypassing the storage manager in CRec; 2) all replacement nodes collaboratively rebuild all failed blocks, exploiting structural properties of erasure codes to transfer each surviving block only once during the reconstruction process, and achieving high reconstruction I/O parallelism. The three reconstruction schemes (i.e., CRec, DRec, and IRS) are implemented under (k+r, k) Reed-Solomon-coded storage clusters where real-world I/O traces are replayed. Experimental results show that, under an erasure-coded storage cluster with parameters k=9 and r=3, IRS outperforms both CRec and DRec schemes in terms of reconstruction time by a factor of at least 1.63 and 2.14 for double-node and triple-node on-line reconstructions, respectively.
出处 《计算机研究与发展》 EI CSCD 北大核心 2016年第9期1918-1929,共12页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61572209) 国家"八六三"高技术研究发展计划基金项目(2013AA013203) 国家"九七三"重点基础研究发展计划基金项目(2011CB302303)~~
关键词 纠删编码 集群存储 存储可靠性 节点重构 交叉式重构 erasure codes clustered storage storage reliability node reconstruction interleavedreconstruction
  • 相关文献

参考文献25

  • 1Fan B, Tantisiriroj W, Xiao L, et al. Diskreduce: Replication as a prelude to erasure coding in data-intensive scalable computing [C] //Proc of 2011 Int Conf for High Performance Computing Networking, Storage and Analysis. New York: ACM, 2011:6-10.
  • 2Ford D, Labelle F, Popovici F, et al. Availability in globally distributed storage systems [C] //Proc of the 9th Symp on Operating Systems Design and Implementation(OSDI 2010). Berkeley, CA: USENIX Association, 2010:61-74.
  • 3Huang C, Simitei H, Xu Y, et al. Erasure coding in windows azure storage [C] //Proc of the 2012 USENIX Annual Technical Conf (ATC 2012). Berkeley, CA: USENIX Association, 2012:15-26.
  • 4Thusoo A, Shao Z, Anthony S, et al. Data warehousing and analytics infrastructure at Facebook[C] //Proc of the 2010 ACM SIGMOD Int Conf on Management of Data(SIGMOD 2010). New York: ACM, 2010:1013-1020.
  • 5罗象宏,舒继武.存储系统中的纠删码研究综述[J].计算机研究与发展,2012,49(1):1-11. 被引量:93
  • 6Rao K, Hafner J, Golding R. Reliability for networked storage nodes [J]. IEEE Trans on Dependable and Secure Computing, 2011, 8(3): 404-418.
  • 7Schroeder B, Gibson G. A large-scale study of failures in high performance computing systems [J]. IEEE Trans on Dependable and Secure Computing, 2011, 7(4): 337-350.
  • 8Bhagwan R, Tati K, Cheng Y, et ah Total recall: System support for automated availability management [C] //Proc of the 1st Symp on Networked Systems Design and Implementation ( NSDI 2004). Berkeley, CA: USENIX Association, 2004 : 337-350.
  • 9Calder B, Wang J, Ogus A, et al. Windows azure storage: A highly available cloud storage service with strong consistency [C] //Proc of the 23rd ACM Symp on Operating Systems Principles (SOSP2011). New York: ACM, 2011:143-157.
  • 10Holland M, Gibson G, Siewiorek D. Architectures and algorithms for on-line failure recovery in redundant disk arrays [J]. Distributed and Parallel Databases, 1994, 2(3) : 295-335.

二级参考文献49

  • 1Layman P, Varian H R. How much information 2003? [EB/OL]. [2010 10-18]. http://www2, sims. berkeley. edu/research/proiects/how-mueh-info-2003.
  • 2Pinheiro E, Weber W D, Barroso L A. Failure trends in a large disk drive population [C] //Proc of the 5th USENIX Conf on File and Storage Technologies. Berkeley, CA: USENIX Association, 2007 : 17-28.
  • 3Schroeder B, Gibson G A. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? [C] //Proc of the 5th USENIX Conf on File and Storage Technologies. Berkeley, CA: USENIX Association, 2007: 1-16.
  • 4Bairavasundaram L N, Goodson G R, Pasupathy S, et al. An analysis of latent sector errors in disk drives [C]//Proc of 2007 ACM SIGMETRICS Int Conf on Measurement and Modeling of Computer Systems. New York: ACM, 200: 289-300.
  • 5Hafner J M, Deenadhayalan V, Rao K, et al. Matrix methods for lost data reconstruction in erasure codes [C] // Proc of the 4th USENIX Conf on File and Storage Technologies. Berkeley, CA: USENIX Association, 2005: 183-196.
  • 6Hafner J M, Deenadhayalan V, Kanungo T, et al. Performance metrics for erasure codes in storage systems, RJ 10321 [R]. San Jose, [A] IBM Research, 2004.
  • 7Li M, Shu J, Zheng W. GRID Codes: Strip based erasure codes with high fault tolerance for storage systems [J].ACM Transon Storage, 2009, 4(4): 1-22.
  • 8Blaum M, Brady J, Bruek J, et al. EVENODD: An efficient scheme for tolerating double disk failures in RAID architectures [J].IEEE Trans on Computer, 1995, 44 (2) 192-202.
  • 9Corbett P, English B, Goel A, et al. Row-diagonal redundant for double disk failure correction [C] //Proc of the 3rd USENIX Conf on File and Storage Technologies. Berkeley, CA: USENIX Association, 2004:2-15.
  • 10Xu L, Bruck J. X-code: MDS array codes with optimal encoding[J]. IEEE Trans on Information Theory, 1999, 45 (1) : 272-276.

共引文献93

同被引文献6

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部