摘要
HDFS是目前最典型的云存储平台,它凭借其高容错、可伸缩和廉价存储的优点支持大规模数据集的存储.但是HDFS对于海量、高并发、连续、高速的小文件的接收和存储效率并不高.针对这一问题,提出一种优化方案RSMSF.在该方法中,文件缓存服务器不断地接收前端文件,给文件添加标识信息并存放到对应的文件队列.当文件队列满足某一窗口阈值时,根据一致性哈希算法将该队列中的文件发送到对应的文件处理服务器上进行文件合并处理,最后上传到HDFS.实验表明,RSMSF方法减少了文件的处理时间,降低了文件丢失率,同时降低了HDFS中内存的开销,节约了存储空间.
Benefiting from its advantages of high fault-tolerance, scalability and low-cost storage capability, HDFS has been a repre- sentative cloud storage platform, it supports the storage of large data sets. But HDFS has an inefficient issue with reception and storage for massive ,high-speed and concurrent small files. In this paper, in order to solve this problem, we proposed an optimization approach RSMSF. In this method,the file cache servers receive files from front-end equipment constantly, and then add identifying information to files,then put them in the corresponding file queue. According to the consistent hashing algorithm, the filequeue which meet certain threshold of the window sends files to corresponding file processing server. Then the file processing servermergesthose small files into a large one, finally uploads it to HDFS. Experimental results show that RSMSF can reduce the processing time of files and therate of losingfile,mitigatethe memory overhead of HDFSobviously,and save storage space.
出处
《小型微型计算机系统》
CSCD
北大核心
2015年第8期1747-1751,共5页
Journal of Chinese Computer Systems
基金
北京市教育委员会科技计划面上项目(KM201310009003)资助
北京市教育委员会科技计划重点项目(KZ201310009009)资助
北京市属高等学校创新团队建设与教师职业发展计划项目(IDHT20130502)资助
北方工业大学博士启动基金资助