To ensure the reliability and availability of data,redundancy strategies are always required for distributed storage systems.Erasure coding,one of the representative redundancy strategies,has the advantage of low stor...To ensure the reliability and availability of data,redundancy strategies are always required for distributed storage systems.Erasure coding,one of the representative redundancy strategies,has the advantage of low storage overhead,which facilitates its employment in distributed storage systems.Among the various erasure coding schemes,XOR-based erasure codes are becoming popular due to their high computing speed.When a single-node failure occurs in such coding schemes,a process called data recovery takes place to retrieve the failed node’s lost data from surviving nodes.However,data transmission during the data recovery process usually requires a considerable amount of time.Current research has focused mainly on reducing the amount of data needed for data recovery to reduce the time required for data transmission,but it has encountered problems such as significant complexity and local optima.In this paper,we propose a random search recovery algorithm,named SA-RSR,to speed up single-node failure recovery of XOR-based erasure codes.SA-RSR uses a simulated annealing technique to search for an optimal recovery solution that reads and transmits a minimum amount of data.In addition,this search process can be done in polynomial time.We evaluate SA-RSR with a variety of XOR-based erasure codes in simulations and in a real storage system,Ceph.Experimental results in Ceph show that SA-RSR reduces the amount of data required for recovery by up to 30.0%and improves the performance of data recovery by up to 20.36%compared to the conventional recovery method.展开更多
基金the National Natural Science Foundation of China(No.62172327)。
文摘To ensure the reliability and availability of data,redundancy strategies are always required for distributed storage systems.Erasure coding,one of the representative redundancy strategies,has the advantage of low storage overhead,which facilitates its employment in distributed storage systems.Among the various erasure coding schemes,XOR-based erasure codes are becoming popular due to their high computing speed.When a single-node failure occurs in such coding schemes,a process called data recovery takes place to retrieve the failed node’s lost data from surviving nodes.However,data transmission during the data recovery process usually requires a considerable amount of time.Current research has focused mainly on reducing the amount of data needed for data recovery to reduce the time required for data transmission,but it has encountered problems such as significant complexity and local optima.In this paper,we propose a random search recovery algorithm,named SA-RSR,to speed up single-node failure recovery of XOR-based erasure codes.SA-RSR uses a simulated annealing technique to search for an optimal recovery solution that reads and transmits a minimum amount of data.In addition,this search process can be done in polynomial time.We evaluate SA-RSR with a variety of XOR-based erasure codes in simulations and in a real storage system,Ceph.Experimental results in Ceph show that SA-RSR reduces the amount of data required for recovery by up to 30.0%and improves the performance of data recovery by up to 20.36%compared to the conventional recovery method.