期刊文献+

面向海量数据存储的Erasure-Code分布式文件系统I/O优化方法

I/O optimization in Erasure-Code distributed file system for massive data storage
下载PDF
导出
摘要 随着海量数据的快速膨胀,机群文件系统的存储方式正在逐步从复本向Erasure Code过渡。Erasure Code存储能够以更低的存储开销提供更高的可靠性。然而,由于Erasure Code存储需要通过编码生成编码数据,在存储原始数据和编码数据过程中更容易产生磁盘争用和不均衡负载,从而影响整个存储系统的I/O性能;同时,Erasure Code存储写回编码数据时,数据一致性和数据缓存之间存在冲突,传统处理数据的无缓存方式和全缓存方式在机群文件系统中都存在很大的局限性。针对这两个问题,提出了一种包括均衡负载的数据放置策略和编码缓存的一致性维护策略的Erasure Code机群文件系统I/O优化方法。通过在开发的Erasure Code分布式文件系统ECFS的实验测试表明,使用这种优化方法后机群文件系统的聚合带宽能够提高95.53%。 As the rapid growing of massive data, the storage method of cluster file system is developing from replication to Erasure Code. The storage system based on Erasure Code can provide higher reliability with less storage overhead. However, in the procedure of storing original data and the coded data, storage based on Erasure Code faces more disk I/O conflicts and unbalanced load, which jeopardizes the throughput of the system. Specially, there is a tradeoff between data consistency and data caching in the storage system based on Erasure Code when writing back the parity. And there are limitations for the use of non--data-caching machanism and all-data-caching machanism in the cluster file system. For these two issues, the paper proposed an I/O optimization method, which includes the data layout machanism balancing the load and the parity consistency machanism. In the cluster file system ECFS we developed base on Erasure Code, the throughput of the system can be improved by 95.53% after exploiting the I/O optimization.
出处 《计算机工程与科学》 CSCD 北大核心 2013年第5期20-27,共8页 Computer Engineering & Science
基金 国家973计划资助项目(2012CB316502)
关键词 机群文件系统 海量存储 Erasure-Code 数据放置 编码缓存 一致性 cluster file system massive storage erasure-code data placement parity caching consistency
  • 相关文献

参考文献20

  • 1From information to audiences:The emerging marketing data use cases[EB/OL]. [2012 03-281. http,//www, themarket- ingsite, com/live/content, php? Session _ ID = fagec9d0eca7a67de172b7ad26e9 lcal :Item ID= 19705.
  • 2230 million tweets per day, 50 million daily users and other twitter stats [ EB/OL]. [ 2011-09-09 ]. http ://www. medi- abistro, corn/alltwitter/230-million-tweets-per-day-50-mil- lion-daily-users-and-other-twitter-st at s_hl 3518.
  • 3Radieati S. Email statistics report, 2010-2014[R]. The Rad icatl Group, 2010.
  • 4Vesset D, Woo P,, Morris H 13, et al. Worldwide big data technology and services 2012 : 2015 forecast [ Z]. IDC, 2012.
  • 5PDSI Report [EB/OL]. E2009 03-253. http://www, pdsi- scidae, org/. DNA sequencing I-EB/OL3. E2010-05-133. http://en, wiki- pedia, org/wiki/DNA.
  • 6Weatherspoon H, Kubiatowicz J. Erasure coding vs. replica- tion:A quantitative eomparison[C]//Proe of the 1 st Interna- tional Workshop on Peer-to-Peer Systems, 2001:328-338.
  • 7Braam P J. The lustre storage architecture [Z]. Cluster File Systems Inc, 2002.
  • 8Ghemawat S, Gobiof: H, Leung S T. The Google file sys- tem[M]//Proc of the 19th ACM Symposium on Operating Systems Principles, 2003:29 43.
  • 9Braam P J. The lustre storage architecture [Z]. Cluster File Systems Ine, 2002.
  • 10Weil S A, Brandt S A, Miller E L, et al. Ceph:A scalable,high-performance distributed file system[M]// Proe of the 7th Symposium on Operating Systems Design and Imple- mentation, 2006 : 307-320.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部