期刊文献+

一种面向HDFS中海量小文件的存取优化方法 被引量:3

Optimization of massive small files storage and accessing on HDFS
下载PDF
导出
摘要 为了解决HDFS(Hadoop distributed file system)在存储海量小文件时遇到的NameNode内存瓶颈等问题,提高HDFS处理海量小文件的效率,提出一种基于小文件合并与预取的存取优化方案。首先通过分析大量小文件历史访问日志,得到小文件之间的关联关系,然后根据文件相关性将相关联的小文件合并成大文件后再存储到HDFS。从HDFS中读取数据时,根据文件之间的相关性,对接下来用户最有可能访问的文件进行预取,减少了客户端对NameNode节点的访问次数,提高了文件命中率和处理速度。实验结果证明,该方法有效提升了Hadoop对小文件的存取效率,降低了NameNode节点的内存占用率。 In order to solve the problem of NameNode memory bottleneck when HDFS stored a massive amount of small files, this paper proposed an optimization of massive small files storage and accessing on HDFS to improve the efficiency of HDFS. First, it could get the relationship between small files by analyzing a large number of history access logs, and then merged these correlative small files into a big file which would be stored on HDFS. When the client read data from HDFS, the system would prefetch the related files which were most likely to be visited next according to the relevance of small files to reduce the number of request for NameNode,' thereby increasing the hit rate and processing speed. The results of experiment show that this method can effectively improve the efficiency of storing and accessing mass small files on HDFS, and cuts down the memo- ry utilization of NameNode.
出处 《计算机应用研究》 CSCD 北大核心 2017年第8期2319-2323,共5页 Application Research of Computers
基金 国家自然科学基金资助项目(11271057 61640211) 江苏省普通高校研究生科研创新计划项目(SCZ1412800004)
关键词 海量小文件 文件相关性 合并 预取 massive small files relationship between files merge prefetch
  • 相关文献

参考文献4

二级参考文献28

  • 1Liu Wei, Ou Xin-ming, Wu Min,Zheng Wei-min , Shen Meiming. A distributed naming mechanism in scalable cluster file system[C]. In z Proceeding of the Fourth International Conference on High Performance Computing in Asia-Pacific Region, Vol. I,Beijing, P.R. China, May 14-17, 2000,:37-41.
  • 2Liu Wei, Zheng Wei-min , Shen Mei-ming, Wu Min, Ou Xinming. Using a cluster file system-TH-CluFS-to construct a scalable cluster system of web servers[C]. InzProceeding of the 3rd Asia Pacific Web Conference(APWeb2000), Xi'an, China, Oct 27-29, 2000, 248-252.
  • 3Michael D Dahlin, Randolph Y Wang, Thomas E Anderson,David A Patterson. Cooperative cachings using remote client memory to improve file system performanceCC]. Proceedings of the First Symposium Operating Systems Design and Implementation, 1994, 267-280.
  • 4Michael D Dahlin, Randolph Y Wang, Thomas E Anderson,David A Patterson. A quantitative analysis of cache policies for scalavle network file system[C]. Proceedings of 1994 SIGMETRICS, May 1994, 150-160.
  • 5Elizabeth Shriver, Christopher Small, Keith A Smith. Why does file system prefetching work? CC3. Proceedings of the USENIX Technical Conference, June, 1999, 6-11.
  • 6Kenichi Kourai, Shigeru Chiba and Takashi Masuda. Operating system support for easy development of distributed file systems CD3. Proceedings of the 10th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS'98),October 1998, 551-554.
  • 7Wikipedia.CloudStorage.http://en.wikipedia.org/wiki/Cloud_ storage. 2012-5-9.
  • 8White T.周敏齐,王晓玲,金澈清,钱卫宁,译.Hadoop权威指南.北京:清华大学出版社,2010.
  • 9Ghemawat S,Gobioff H,Leung S T. The Google File System[A].New York,USA,2003.
  • 10Dean J,MapReduce S G. Simplified Data Processing on Large Clusters[[J].Communications of the ACM,2008,(01):107-111.

共引文献45

同被引文献25

引证文献3

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部