摘要
随着海量数据应用的增多,机群文件系统需要具备对PB甚至EB级存储空间进行管理的能力.受数据位置信息维护方法的限制,在面对需要管理的百亿甚至万亿个对象时,对象存储服务器在数据定位、负载均衡和复本维护方面均存在可扩展处理的问题.为了满足日益增长的存储需求,提出了一种可扩展的存储空间管理方法.首先,该方法将存储空间中的对象位置信息通过Extendible Hashing以两级索引结构的方式进行组织,以此来支持对海量对象位置信息的扩展性管理;其次,该方法依靠对象位置信息在多服务器的分布结果来放置对象,文件系统能通过对索引结构的调整以较低开销实现负载均衡;最后,该方法以数据位置信息组织的索引结构为粒度进行复本维护以降低复本位置信息的维护开销.实验评测表明,存储空间管理方法能够支持海量数据的高效管理.在负载均衡方法的作用下,多存储服务器的I/O聚合带宽因负载均衡而能够取得10%的提升.相比Lustre和DCFS3,该系统在多客户端并发访问环境下具有更好的扩展性能.
With the increase of data intensive application, cluster file systems need to manage PB or even EB scale storage. Limited by the management of data location information, object storage servers have scalable problems at data location, load balance and replica maintenance. To deal with these problems, we present a scalable storage space management to support EB scale storage. Firstly, the presented method organizes object location information into a two-level indexed structure through extendible hashing. Scalable management of object location information can be achieved by this structure. Secondly, this method places object based on the distribution of object location structure. The system can adjust the distribution of data by adjusting the distribution of object location structure with little overhead. Thirdly, the method records the replica location at the granularity of object location structure to reduce maintenance overhead. The evaluation shows the storage space management can provide high efficient data management for massive storage. With load balance mechanism, the I/O throughput of the system can be increased by 10%. Under the concurrent workload, compared with Lustre and DCFS3, the system can achieve a more scalable performance.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2013年第8期1573-1582,共10页
Journal of Computer Research and Development
基金
国家"九七三"重点基础研究发展计划基金项目(2012CB316502)
国家"八六三"高技术研究发展计划基金项目(2013AA01A211)
国家自然科学基金面上项目(60970025)
关键词
机群文件系统
海量存储
数据放置
数据定位
多复本
存储空间管理
cluster file system
massive storage~ data placement~ data location~ replication~ storagespace management