摘要
文件放置问题一直是分布式存储领域的研究热点。分布式文件存储系统HDFS随机选择节点完成文件放置,存在访问负载分布不均衡的缺点。研究人员提出大量基于文件访问热度信息的放置算法;但是,文件的访问热度信息是动态变化的,难以准确预测。提出一种不依赖访问热度信息的分布式文件放置算法;该算法仅使用文件的创建时间信息,利用文件已创建时间与访问热度之间的相关性,首先将时间进行区间划分,然后统计出各节点在不同时间区间内所创建文件的数据量,放置过程中保持同一时间区间的数据量在不同节点间大致相同。实验结果表明,该算法不仅可以使各节点的存储负载达到均衡,还能够提升访问负载的均衡,消除因文件访问热度不均而导致的性能瓶颈。
File placement has always been a research hotspot in the field of distributed storage.The distributed file storage system HDFS places files by randomly selecting nodes,which leads to imbalance in accessing load.Researchers have proposed a large number of placement algorithms based on file popularity.However,file popularity is dynamically changing,and is difficult to accurately predict.A distributed file placement algorithm was proposed without depending on file popularity.According to the creation time of file and the correlation between creation time and file popularity,the algorithm firstly divides the time interval,and then counts the data of each node in different time intervals.It keeps the data of different nodes in the same time interval roughly the same.Experimental results show that the algorithm can balance not only the storage load,but also the access load on each node,and it eliminates the performance bottleneck caused by the uneven distribution of file popularity.
出处
《科学技术与工程》
北大核心
2018年第2期285-289,共5页
Science Technology and Engineering
基金
西安科技大学博士启动基金(2015QDJ031)
陕西省教育厅专项科学研究计划项目(15JK1468)资助
关键词
分布式文件存储系统
文件访问热度
文件放置
负载均衡
distr ibuted f i le storage system f i le p o pu lar ity f i le placement load balance