期刊文献+

基于Hadoop的优化海量录音小文件存取性能的方法 被引量:1

The Approach for Optimizing Storing and Accessing Massive Recording Small Files on Hadoop
下载PDF
导出
摘要 Hadoop作为一个分布式计算框架,在处理大容量数据方面有着显著优势。然而,因其Name Node节点问题的内存有瓶颈等问题,对于处理海量小文件的存取不利。提出了一种针对海量录音小文件的优化方法,充分利用录音文件相关性的特点,通过预处理模块归类文件,把录音小文件合并成一系列的sequencefile,并建立全局索引,最后,采用缓存机制及缓存优化策略进行进一步优化。实验证明,该方法能有效提高大批量小文件的存取性能。 As a distributed calculating framework, Hadoop has its distinct advantage in processing large data. However, due to its storage bottleneck problem in NameNode. It does not work well in processing large numbers of small files. This paper proposes a optimization method on accounts of mass recording small flies. It makes full use of recording small files correlation characteristics, combining the recording small files into a series of sequence file via preprocessing module classification file and construct global index. Finally it further optimizes with cache mechanism and Cache optimization strategy. The experiment proves that the method can increase the efficiency of storing small files.
机构地区 浙江师范大学
出处 《微型电脑应用》 2015年第2期1-3,共3页 Microcomputer Applications
基金 国家自然科学基金资助项目(61272468)
关键词 HADOOP 小文件 优化 存储性能 Hadoop Small File Optimize Storage Performance
  • 相关文献

参考文献6

  • 1汤姆·怀特.Hadoop权威指南(第二版)[M].北京:清华大学出版社,2011.
  • 2Konstantin S,Hairing K,Sanyjy R,et al. The Hadoop Dis- tributed File System[C].//Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technol- ogies(MSST).May 03-07,2010:1-10.
  • 3MapReduce Tutorial[OL ].http://hadoop.apache.org/docs/ r1.2.1/mapred tutorial.html.
  • 4HadoopSequenceFile[OL].http://hadoop.apache.org/docs/ current/api/org/apache/hadoop/io/SequenceFile.html.
  • 5Sequencefile wiki[OL ].http://wiki.apache.org/hadoop/Se- quenceFile.
  • 6赵跃龙,谢晓玲,蔡咏才,王国华,刘霖.一种性能优化的小文件存储访问策略的研究[J].计算机研究与发展,2012,49(7):1579-1586. 被引量:20

二级参考文献10

  • 1金海,罗飞,章勤,张浩.一个基于P2P高性能计算的高效数据传输协议[J].计算机研究与发展,2006,43(9):1543-1549. 被引量:4
  • 2Baker M G,Hartman J H,Kupfer M D. Measurement of a distributed file system[A].New York:ACM,1991.198-212.
  • 3Carns P H,Ligon W B,Ross B R. PVFS:A parallel file system for Linux clusters[A].Berkeley,CA:USENIX Association,2000.28-38.
  • 4Ahn W H,Kim K,Choi Y. DFS:A de-fragmented file system[A].Piscataway,NJ:IEEE,2002.71-80.doi:10.1093/brain/awp141.
  • 5McKusick M K,Joy W N,Leffler S J. A fast file system for UNIX[J].ACM Transactions on Computer Systems,1984,(03):181-197.doi:10.1145/989.990.
  • 6Brandt S A,Miller E L,Long D D E. Efficient metadata management in large distributed storage systems[A].Piseataway,NJ:IEEE,2003.290-298.doi:10.1002/mus.22123.
  • 7Luo Min,Yokota H. Comparing Hadoop and fat-btree based access method for small file I/O applications[A].Beilin:Springer-Verlag,2010.182193.
  • 8Wang Feng. Storage management in large distributed objectbased storage system[D].Santa Cruz:University of California Santa Cruz,2006.
  • 9Carns P,Lang S,Ross R. Small-file access in parallel file systems[A].Piscataway,NJ:IEEE,2009.1-11.
  • 10Ganger G R,Kaashoek M F. Embedded inodes and explicit grouping:exploiting disk bandwidth for small files[A].Berkeley,CA:USENIX Association,1997.1-17.doi:10.1007/s10333-011-0289-8.

共引文献19

同被引文献9

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部