摘要
针对Hadoop分布式文件系统(Hadoop distributed file system,HDFS)存储海量图片效率低下的问题,在分析HDFS的基本框架以及其固有的文件读写流程基础上,提出了基于Cache的海量图片存储优化方案(HDFS based on Cache,CHDFS);该方案引入了Cache、预读、文件合并等机制,来提高图片读写的性能,弥补了HDFS存储海量图片时的缺陷;采用图片合并的方式减少Namenode中元数据的个数,同时提高Datanode存储空间的利用率;由于Cache、预读以及图片合并等操作对用户都是透明的,所以,该方案并没有增加用户使用HDFS的复杂性;实验结果表明,CHDFS方法可以有效地提高图片的存取效率。
To solve the problem of the low efficiency of HDFS (Hadoop Distributed File System) to store mass images, this paper studied the HDFS architecture and the flow of reading and writing files into HDFS, and then proposed an access optimization solution for mass pic- tures which is based on Cache. It is called CHDFS (HDFS Based on Cache). CHDFS adopts the following ideas to improve the performance, such as establishing appropriate cache , reading ahead pictures, merging more than one images into a big file and so on. File merge can de- crease the number of metadata in Namenode and improve the capacity factor of storage space in Datanode. To the client, this solution does not complex the operations to use the HDFS, due to the transparency of cache, read ahead and pictures merge. The experimental data indicates that CHDFS can increase the performance of storing and accessing mass pictures in HDFS without affecting the normal running of HDFS.
出处
《计算机测量与控制》
北大核心
2014年第8期2669-2672,2676,共5页
Computer Measurement &Control
基金
四川省教育厅科研项目(13ZA0135)