期刊文献+

一种面向海量小文件的文件接收和存储优化方案 被引量:5

Optimizationof Reception and Storage for Massive Small Files
下载PDF
导出
摘要 HDFS是目前最典型的云存储平台,它凭借其高容错、可伸缩和廉价存储的优点支持大规模数据集的存储.但是HDFS对于海量、高并发、连续、高速的小文件的接收和存储效率并不高.针对这一问题,提出一种优化方案RSMSF.在该方法中,文件缓存服务器不断地接收前端文件,给文件添加标识信息并存放到对应的文件队列.当文件队列满足某一窗口阈值时,根据一致性哈希算法将该队列中的文件发送到对应的文件处理服务器上进行文件合并处理,最后上传到HDFS.实验表明,RSMSF方法减少了文件的处理时间,降低了文件丢失率,同时降低了HDFS中内存的开销,节约了存储空间. Benefiting from its advantages of high fault-tolerance, scalability and low-cost storage capability, HDFS has been a repre- sentative cloud storage platform, it supports the storage of large data sets. But HDFS has an inefficient issue with reception and storage for massive ,high-speed and concurrent small files. In this paper, in order to solve this problem, we proposed an optimization approach RSMSF. In this method,the file cache servers receive files from front-end equipment constantly, and then add identifying information to files,then put them in the corresponding file queue. According to the consistent hashing algorithm, the filequeue which meet certain threshold of the window sends files to corresponding file processing server. Then the file processing servermergesthose small files into a large one, finally uploads it to HDFS. Experimental results show that RSMSF can reduce the processing time of files and therate of losingfile,mitigatethe memory overhead of HDFSobviously,and save storage space.
出处 《小型微型计算机系统》 CSCD 北大核心 2015年第8期1747-1751,共5页 Journal of Chinese Computer Systems
基金 北京市教育委员会科技计划面上项目(KM201310009003)资助 北京市教育委员会科技计划重点项目(KZ201310009009)资助 北京市属高等学校创新团队建设与教师职业发展计划项目(IDHT20130502)资助 北方工业大学博士启动基金资助
关键词 HDFS 海量 小文件 RSMSF 阈值 一致性哈希 HDFS massive small files RSMSF threshold consistent hashing
  • 相关文献

参考文献14

二级参考文献29

  • 1胡兴军.内容分发网络(CDN)技术及市场应用[J].当代通信,2005(17):65-66. 被引量:8
  • 2郭劲,李栋,张继征,贾惠波.iSCSI,CIFS,NFS协议的性能评测[J].小型微型计算机系统,2006,27(5):833-836. 被引量:5
  • 3Radkov P, Li Yin, Goya P, et al. A Performance Comparison of NFS and iSCS I for P2networked Storage[EB/OL]. (2009- 08-30). http://wwwl, cs. columbia, edu/-cs699810/nfs-iSCSI. pdf.
  • 4Laoutaris N, Zissimopoulos V, Stavrakakis I. Joint Object Placement and Node Dimensioning for Internet Content Distribution[J]. Information Processing Letters, 2004, 89(6): 273-279.
  • 5巨鲸网[EB/OL].[2011-11-08].http://topl00.on/.
  • 6WHITE T. Hadoop: The definitive guide[ M]. [ S. 1. ] : O'Reilly Media, 2009.
  • 7Small files problem[ EB/OL]. [ 2011- 11 - 10]. http://www, cloud- era. conr/blog/2009/02/the-small-files-problem/.
  • 8MACKEY G, SEHRISH S, WANG JUN. Improving metadata man- agement for small files in HDFS[ C]//Proceedings of 2009 IEEE In- ternational Conference on Cluster Computing and Workshops. Piscat- away: IEEE Press, 2009:1 -4.
  • 9LIU XUHUI, HAN JIZHONG, ZHONG YUNQIN, et al. Implemen- ting WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS[ C]//2009 IEEE International Conference on Cluster Computing and Workshops. Piscataway: IEEE Press, 2009: 1-8.
  • 10DONG BO, QIU JIE, ZHENG QINGHUA, et al. A novel approach to improving the efficiency of storing and accessing small files on Ha- doop: a case study by PowerPoint flies[ C]// Proceedings of the 2010 IEEE International Conference on Services Computing. Wash- ington, DC: IEEE Computer Society, 2010:65 -72.

共引文献163

同被引文献27

引证文献5

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部