随着纠删码在分布式存储系统中的实际应用,纠删码为存储系统提供了更加优秀的存储效率,但当节点丢失时,相较于传统副本技术更多的网络传输带宽开销成为了造成系统性能瓶颈的关键因素。为了解决MDS编码高带宽开销对系统性能的影响,一类...随着纠删码在分布式存储系统中的实际应用,纠删码为存储系统提供了更加优秀的存储效率,但当节点丢失时,相较于传统副本技术更多的网络传输带宽开销成为了造成系统性能瓶颈的关键因素。为了解决MDS编码高带宽开销对系统性能的影响,一类新型编码方案——分组码被应用在分布式存储系统中,相较于传统MDS编码能够有效地降低节点修复时的数据传输量,从而减少网络带宽需求。在Pyramid分组码的基础上进行层次扩展,提出一种HLRC(hierarchical local repair codes)纠删码。HLRC相较于LRC引入了层次编码模型,将原始数据块构建为编码矩阵,根据层次进行分别编码,生成包含数据块范围不同的局部校验块;每个层次包含的数据块数量不同,可以保证修复节点时的低修复成本,同时还拥有较高的存储效率。HLRC相较于Pyramid拥有额外的校验块冗余,能够降低校验块出错和多节点出错时的恢复开销。在基于Ceph的分布式存储系统中的实验结果表明,HLRC与Pyramid等分组码相比,单节点修复开销最高可降低48.56%,多节点修复开销最高可降低25%。展开更多
In distributed cloud storage systems, inevitably there exist multiple node failures at the same time. The existing methods of regenerating codes, including minimum storage regenerating(MSR) codes and minimum bandwidth...In distributed cloud storage systems, inevitably there exist multiple node failures at the same time. The existing methods of regenerating codes, including minimum storage regenerating(MSR) codes and minimum bandwidth regenerating(MBR) codes, are mainly to repair one single or several failed nodes, unable to meet the repair need of distributed cloud storage systems. In this paper, we present locally minimum storage regenerating(LMSR) codes to recover multiple failed nodes at the same time. Specifically, the nodes in distributed cloud storage systems are divided into multiple local groups, and in each local group(4, 2) or(5, 3) MSR codes are constructed. Moreover, the grouping method of storage nodes and the repairing process of failed nodes in local groups are studied. Theoretical analysis shows that LMSR codes can achieve the same storage overhead as MSR codes. Furthermore, we verify by means of simulation that, compared with MSR codes, LMSR codes can reduce the repair bandwidth and disk I/O overhead effectively.展开更多
文摘随着纠删码在分布式存储系统中的实际应用,纠删码为存储系统提供了更加优秀的存储效率,但当节点丢失时,相较于传统副本技术更多的网络传输带宽开销成为了造成系统性能瓶颈的关键因素。为了解决MDS编码高带宽开销对系统性能的影响,一类新型编码方案——分组码被应用在分布式存储系统中,相较于传统MDS编码能够有效地降低节点修复时的数据传输量,从而减少网络带宽需求。在Pyramid分组码的基础上进行层次扩展,提出一种HLRC(hierarchical local repair codes)纠删码。HLRC相较于LRC引入了层次编码模型,将原始数据块构建为编码矩阵,根据层次进行分别编码,生成包含数据块范围不同的局部校验块;每个层次包含的数据块数量不同,可以保证修复节点时的低修复成本,同时还拥有较高的存储效率。HLRC相较于Pyramid拥有额外的校验块冗余,能够降低校验块出错和多节点出错时的恢复开销。在基于Ceph的分布式存储系统中的实验结果表明,HLRC与Pyramid等分组码相比,单节点修复开销最高可降低48.56%,多节点修复开销最高可降低25%。
基金supported in part by the National Natural Science Foundation of China (61640006, 61572188)the Natural Science Foundation of Shaanxi Province, China (2015JM6307, 2016JQ6011)the project of science and technology of Xi’an City (2017088CG/RC051(CADX002))
文摘In distributed cloud storage systems, inevitably there exist multiple node failures at the same time. The existing methods of regenerating codes, including minimum storage regenerating(MSR) codes and minimum bandwidth regenerating(MBR) codes, are mainly to repair one single or several failed nodes, unable to meet the repair need of distributed cloud storage systems. In this paper, we present locally minimum storage regenerating(LMSR) codes to recover multiple failed nodes at the same time. Specifically, the nodes in distributed cloud storage systems are divided into multiple local groups, and in each local group(4, 2) or(5, 3) MSR codes are constructed. Moreover, the grouping method of storage nodes and the repairing process of failed nodes in local groups are studied. Theoretical analysis shows that LMSR codes can achieve the same storage overhead as MSR codes. Furthermore, we verify by means of simulation that, compared with MSR codes, LMSR codes can reduce the repair bandwidth and disk I/O overhead effectively.