期刊文献+

Hadoop中处理海量小文件的方法 被引量:1

Methods of Dealing With Massive Small Files in Hadoop
下载PDF
导出
摘要 针对Hadoop中提供底层存储的HDFS对处理海量小文件效率低下、严重影响性能的问题.设计了一种小文件合并、索引和提取方案,并与原始的HDFS以及HAR文件归档方案进行对比,通过一系列实验表明,本文的方案能有效减少Namenode内存占用,提高HDFS的I/O性能. HDFS provides the underlying storage for Hadoop, however, the HDFS deals with massive small files inefficiently and decreases system performance seriously. To solve this problem, we designed a file merging, indexing and retrieval solution. Then through a series of experiments compared to the original HDFS and HAR solution, it can be shown that our scheme can effectively reduce the memory usage of Namenode and improve the I / O performance of HDFS.
出处 《计算机系统应用》 2015年第11期157-161,共5页 Computer Systems & Applications
基金 2013年度科技部科技支撑计划(2013BAJ10B14-5)
关键词 HADOOP HDFS 小文件 HDFS的I/O性能 Hadoop HDFS small files I/O performance of HDFS
  • 相关文献

参考文献13

  • 1Hadoop official site.http://hadoop.apache.org,2012.
  • 2HDFS official wiki.http://en.wikipedia.org/wiki/HDFS.
  • 3Small-Files-Problem,http://www.cloudera.com/blog/2009/02/ the-small-files-problem/.
  • 4White T.周敏奇,王晓玲,金澈清,钱卫宁译.Hadoop权威指南.第2版.北京:清华大学出版社,2011.
  • 5George L.HBase:The Definitive Guide:Random Access to Your Planet-Size Data.O'Reilly,Ireland (2011).
  • 6赵晓永,杨扬,孙莉莉,陈宇.基于Hadoop的海量MP3文件存储架构[J].计算机应用,2012,32(6):1724-1726. 被引量:28
  • 7Liu XHs Han JZ,Zhong YQ,Han CD.Implementing WebGIS on Hadoop:a case study of improving small file I/O performance on HDFS.Proc.of the 2009 IEEE Conf.on Cluster Computing and Workshops.2009.1-8.
  • 8Dong B,Qiu J,Zheng QH,Zhong X,Li JW,Li Y.A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop:A Case Study by PowerPoint Files.In Proc.of IEEE SCC2010.pp.65-72.
  • 9Hadoop archives.http://hadoop.apache.org/common/ docs/rO.20.2/hadoop_archive.
  • 10Sequence file.http://wiki.apache.org/hadoop/SequenceFile.

二级参考文献12

  • 1巨鲸网[EB/OL].[2011-11-08].http://topl00.on/.
  • 2WHITE T. Hadoop: The definitive guide[ M]. [ S. 1. ] : O'Reilly Media, 2009.
  • 3Small files problem[ EB/OL]. [ 2011- 11 - 10]. http://www, cloud- era. conr/blog/2009/02/the-small-files-problem/.
  • 4MACKEY G, SEHRISH S, WANG JUN. Improving metadata man- agement for small files in HDFS[ C]//Proceedings of 2009 IEEE In- ternational Conference on Cluster Computing and Workshops. Piscat- away: IEEE Press, 2009:1 -4.
  • 5LIU XUHUI, HAN JIZHONG, ZHONG YUNQIN, et al. Implemen- ting WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS[ C]//2009 IEEE International Conference on Cluster Computing and Workshops. Piscataway: IEEE Press, 2009: 1-8.
  • 6DONG BO, QIU JIE, ZHENG QINGHUA, et al. A novel approach to improving the efficiency of storing and accessing small files on Ha- doop: a case study by PowerPoint flies[ C]// Proceedings of the 2010 IEEE International Conference on Services Computing. Wash- ington, DC: IEEE Computer Society, 2010:65 -72.
  • 7Hadoop sequence file[ EB/OL]. [ 2011- 11- 12]. http://hadoop, a- pache, org/common/docs/current/api/org/apache/hadoop/io/Se- quenceFile, htm.
  • 8MP3文件格式[EB/OL].[2011-11-13].http://en.wikipedia.org/wiki/MP3.
  • 9CouchDB[ EB/OL]. [ 2011 - 11 - 14]. http://couchdb, apache, org/ docs/overview, html.
  • 10Memcached[ EB/OL]. [ 2011 - 11 - 15]. http://memcached, org/.

共引文献27

同被引文献13

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部