摘要
针对Hadoop中提供底层存储的HDFS对处理海量小文件效率低下、严重影响性能的问题.设计了一种小文件合并、索引和提取方案,并与原始的HDFS以及HAR文件归档方案进行对比,通过一系列实验表明,本文的方案能有效减少Namenode内存占用,提高HDFS的I/O性能.
HDFS provides the underlying storage for Hadoop, however, the HDFS deals with massive small files inefficiently and decreases system performance seriously. To solve this problem, we designed a file merging, indexing and retrieval solution. Then through a series of experiments compared to the original HDFS and HAR solution, it can be shown that our scheme can effectively reduce the memory usage of Namenode and improve the I / O performance of HDFS.
出处
《计算机系统应用》
2015年第11期157-161,共5页
Computer Systems & Applications
基金
2013年度科技部科技支撑计划(2013BAJ10B14-5)