摘要
现有分布式文件系统中处理节点失效时采用的恢复策略耗费较多的带宽与磁盘空间资源,且影响系统的稳定性。通过研究分布式文件系统HDFS集群结构、数据块存储机制、节点与数据块状态之间的关系,定义了集群节点矩阵、节点状态矩阵、文件分块矩阵、数据块存储矩阵与数据块状态矩阵为度量数据块可用性建立了基础数据模型。在实现数据块可用性度量基础上,设计了基于可用性度量的节点失效恢复算法并分析了算法的性能。实验结果表明:新算法在保证系统中所有数据块可用性的前提下比原恢复策略减少了恢复所需带宽与磁盘资源,缩短了节点恢复时间,提高了系统稳定性。
The strategy for distributed file system dealing with node failure needs much bandwidth and disk space resources and affects stability of the system.By studying HDFS's cluster structure,data blocks storage mechanism,the state relationship between node and block,we defined the cluster nodes matrix,node status matrix,file block partition matrix,block storage matrix and block state matrix.Those definitions enable us to model the availability of data block easily.Based on the measurement of data block's availability,we proposed the new node failure recovery algorithm and analyzed the performance of the algorithm.The experimental results show that compared with the original strategy,the new algorithm ensures the availability of all blocks in the system and reduces the bandwidth and disk space resources for recovery,shorts the recovery time,and improvs the stability of system.
出处
《计算机科学》
CSCD
北大核心
2013年第1期144-149,共6页
Computer Science
基金
国家自然科学基金项目(60863003
61063042)
新疆维吾尔自治区自然科学基金项目(2011211A011)资助
关键词
云计算
分布式文件系统
失效恢复
可用性度量
Cloud computing
Distributed file system
Failure recovery
Measurement of data availability