期刊文献+

一种基于HDFS小文件存储优化方案 被引量:3

A Small Files Optimized Schema Based on HDFS
下载PDF
导出
摘要 Hadoop分布式文件系统(HDFS)在大数据存储中具有优良的性能,适用于处理和存储大文件,但在海量小文件处理时性能显著下降,过多的小文件使得整个系统内存消耗过大。为了提高HDFS处理小文件的效率,改进了HDFS的存储方案,提出了海量小文件的存储优化方案。根据小文件之间的相关性进行分类,然后将同一类小文件合并上传,并生成索引文件,读取时采用客户端缓存机制以提高访问效率。实验结果表明,该方案在数据迅速增长的情况下能有效提高小文件访问效率,降低系统内存开销,提高HDFS处理海量小文件的性能。 The Hadoop distributed file system (HDFS) has excellent performance in the big data storage and is suitable for processing and storing big files, but when processing the mass small files the performance reduced significantly, too many small files consume excessive amount of memory.In order to improve the efficiency of processing small files in HDFS, this paper improved the HDFS storage solution, and proposed an optimization scheme.First, it Classified the small files according to the correlation, a set of correlated files is combined into a large file then stored in HDFS, and generate the index file, using client-side caching mechanism to improve the efficiency of access.The experimental results show that the proposed scheme can improve the store and access efficiency effectively with rapiding growth of small files, and reduce memory consumption, improve the performance of processing mass small files.
出处 《计算技术与自动化》 2017年第3期134-138,共5页 Computing Technology and Automation
基金 陕西省网络计算与安全技术重点实验室资助项目(15JS078) 西安市科技计划资助项目(CXY1518(1))
关键词 HADOOP HDFS 小文件 缓存 Hadoop, HDFS small file cache
  • 相关文献

参考文献2

二级参考文献19

  • 1BORTHAKUR D. The hadoop distributed file system:architecture and design [EB/OL]. [2010-08- 25]. http://hadoop, apache, org/core/docs/current/ hdfs_desigru pdf.
  • 2MACKEY G, SEHRI S, WANG Jun. Improving metadata management for small files in HDFS [C/ OL.] // Proceedings of 2009 IEEE International Conference on Cluster Computing and Workshops. [2010- 08- 10]. http://ieeexplore, ieee. org/stamp/stamp. jsp? tp=&arnumber=5289133.
  • 3LIU Xuhui, HAN Jizhong, ZHONG Yunqin, et al. Implementing WebGIS on hadoop: a case study of im- proving small file I/O performance on HDFS [C/OL] //Proceedings of 2009 IEEE International Conference on Cluster Computing and Workshops. [2010-08-10]. http://ieeexplore, ieee. org/stamp/stamp, jsp? tp= &arnumber= 5289196.
  • 4DONG Bo, QIU Jie, ZHENG Qinghua, et al. A novel approach to improving the efficiency of storing and accessing small files on hadoop: a case study by PowerPoint files EC]ffProceedings of the 7th International Conference on Services Computing. Piscataway, NJ, USA: IEEE, 2010: 65-72.
  • 5HUANG Ruwei, YU Si, ZHUANG Wei, et al. Design of privacy-preserving cloud storage framework [C]//Proceedings of the 9th International Conference on Grid and Cloud Computing. Piseataway, NJ, USA:IEEE, 2010: 128-132.
  • 6SATTY T L. Axiomatic foundation of the analytic hierarchy process [J]. Management Science, 1986, 32 (7) - 841-855.
  • 7HAN Jiawei, KAMBER N.Data mining: concepts and techniques [M]. San Francisco, CA, USA:Morgan Kaufmann, 2006.
  • 8kkdelta.告诉你 Hadoop 是什么[EB/OL]. [2014-06-17]. ht-tp://www. thebigdata. cn/Hadoop/10722. html.
  • 9周敏奇,王晓玲,金澈清,等.Hadoop权威指南(第2版)[M].北京:清华大学出版社,2011:8-20.
  • 10White T. The small files problem [EB/OL], [2009-2-2]. ht-tp:// www. cloudera. com/blog/2009/02/the-small-files-prob-lem.

共引文献57

同被引文献18

引证文献3

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部