摘要
不断增长的海量数据需要被可靠存储,而分布式存储系统庞大的节点规模和数据规模,大大提升了发生节点失效的概率,容错技术成为大数据存储中不可忽视的关键技术。文中介绍了数据容错的两种基本策略:复制和纠删码,并分别总结了将这两种容错策略具体应用于大数据存储时所面对的问题和相关解决技术,如与基于复制的容错技术相关的副本系数设置、副本放置策略、副本一致性策略、副本修复策略和纠删码领域的再生码技术等。
The growing massive data needs to be reliably stored,but the large scale of the nodes and data of distributed storage system greatly enhance the probability of node failure.The fault tolerance technology has become the key technology which cannot be ignored in big data storage.This paper introduces two basic strategies of data fault tolerance:replication and erasure codes,and summarizes related technology using these two strategies for big data storage.Technologies include the coefficient setting of duplicates,the replica placement strategy,the replica consistency and the repair strategy related to replication technology,and regenerative code technology related to erasure codes.
出处
《南京邮电大学学报(自然科学版)》
北大核心
2014年第4期20-25,共6页
Journal of Nanjing University of Posts and Telecommunications:Natural Science Edition
基金
国家自然科学基金(60973140
61170276
61373135
)
江苏省产学研项目(BY2013011)
江苏省科技型企业创新基金(BC2013027)
江苏省高校自然科学研究重大项目(12KJA520003)资助项目
关键词
大数据存储
分布式存储
容错
复制
纠删码
再生码
big data storage
distributed storage
fault tolerant
replication
erasure code
regenerating code