摘要
Hadoop作为一个分布式计算框架,在处理大容量数据方面有着显著优势。然而,因其Name Node节点问题的内存有瓶颈等问题,对于处理海量小文件的存取不利。提出了一种针对海量录音小文件的优化方法,充分利用录音文件相关性的特点,通过预处理模块归类文件,把录音小文件合并成一系列的sequencefile,并建立全局索引,最后,采用缓存机制及缓存优化策略进行进一步优化。实验证明,该方法能有效提高大批量小文件的存取性能。
As a distributed calculating framework, Hadoop has its distinct advantage in processing large data. However, due to its storage bottleneck problem in NameNode. It does not work well in processing large numbers of small files. This paper proposes a optimization method on accounts of mass recording small flies. It makes full use of recording small files correlation characteristics, combining the recording small files into a series of sequence file via preprocessing module classification file and construct global index. Finally it further optimizes with cache mechanism and Cache optimization strategy. The experiment proves that the method can increase the efficiency of storing small files.
出处
《微型电脑应用》
2015年第2期1-3,共3页
Microcomputer Applications
基金
国家自然科学基金资助项目(61272468)