针对Ceph分布式存储系统中可扩展哈希下的受控复制(Controlled Replication Under Scalable Hashing,CRUSH)数据分布算法导致设备间存储数据容量之差达到40%,进而在数据量大、高并发情况下“热点”成为系统性能瓶颈的问题,本文对CRUSH...针对Ceph分布式存储系统中可扩展哈希下的受控复制(Controlled Replication Under Scalable Hashing,CRUSH)数据分布算法导致设备间存储数据容量之差达到40%,进而在数据量大、高并发情况下“热点”成为系统性能瓶颈的问题,本文对CRUSH算法进行深入研究,设计并实现了Writing_Balance算法来对数据分布进行性能优化,以达到消除“热点”所导致的负载失衡以及磁盘利用率过高的问题。通过实验发现,Writing_Balance算法可使“热点”的PG数量分布优化率较之前提升4.4%;磁盘利用率稳定性提高了3%左右;并且在较小输入key空间下对于数据整体均衡度优化也有明显的提升。展开更多
Base on the character of the cluster of workstation(COW)and the latest development of the parallel computer,this paper analyzes the data deflexion problem in data distributing of parallel DB in COW. On the basis of th...Base on the character of the cluster of workstation(COW)and the latest development of the parallel computer,this paper analyzes the data deflexion problem in data distributing of parallel DB in COW. On the basis of this analysis,we get a dynamic data balanced distributing algorithm which has adaptability.展开更多
To improve data distribution efficiency a load-balancing data distribution LBDD method is proposed in publish/subscribe mode.In the LBDD method subscribers are involved in distribution tasks and data transfers while r...To improve data distribution efficiency a load-balancing data distribution LBDD method is proposed in publish/subscribe mode.In the LBDD method subscribers are involved in distribution tasks and data transfers while receiving data themselves.A dissemination tree is constructed among the subscribers based on MD5 where the publisher acts as the root. The proposed method provides bucket construction target selection and path updates furthermore the property of one-way dissemination is proven.That the average out-going degree of a node is 2 is guaranteed with the proposed LBDD.The experiments on data distribution delay data distribution rate and load distribution are conducted. Experimental results show that the LBDD method aids in shaping the task load between the publisher and subscribers and outperforms the point-to-point approach.展开更多
文摘针对Ceph分布式存储系统中可扩展哈希下的受控复制(Controlled Replication Under Scalable Hashing,CRUSH)数据分布算法导致设备间存储数据容量之差达到40%,进而在数据量大、高并发情况下“热点”成为系统性能瓶颈的问题,本文对CRUSH算法进行深入研究,设计并实现了Writing_Balance算法来对数据分布进行性能优化,以达到消除“热点”所导致的负载失衡以及磁盘利用率过高的问题。通过实验发现,Writing_Balance算法可使“热点”的PG数量分布优化率较之前提升4.4%;磁盘利用率稳定性提高了3%左右;并且在较小输入key空间下对于数据整体均衡度优化也有明显的提升。
文摘Base on the character of the cluster of workstation(COW)and the latest development of the parallel computer,this paper analyzes the data deflexion problem in data distributing of parallel DB in COW. On the basis of this analysis,we get a dynamic data balanced distributing algorithm which has adaptability.
基金The National Key Basic Research Program of China(973 Program)
文摘To improve data distribution efficiency a load-balancing data distribution LBDD method is proposed in publish/subscribe mode.In the LBDD method subscribers are involved in distribution tasks and data transfers while receiving data themselves.A dissemination tree is constructed among the subscribers based on MD5 where the publisher acts as the root. The proposed method provides bucket construction target selection and path updates furthermore the property of one-way dissemination is proven.That the average out-going degree of a node is 2 is guaranteed with the proposed LBDD.The experiments on data distribution delay data distribution rate and load distribution are conducted. Experimental results show that the LBDD method aids in shaping the task load between the publisher and subscribers and outperforms the point-to-point approach.