期刊文献+

相似索引:适用于重复数据删除的二级索引 被引量:1

Similar index:two-level index used for deduplication
下载PDF
导出
摘要 由于EB(extreme binning)使用文件的最小块签名作为文件的特征,它不适合处理主要包括小文件的数据负载,会导致较差的重复数据删除率。为了改进EB,提出了相似索引。它把相似哈希作为文件的特征,是一种适用于以小文件为主的数据负载的重复数据删除的二级索引。实验结果表明,相似索引的重复数据删除率比EB高24.8%;相似索引的内存使用量仅仅是EB的0.265%。与EB相比,相似索引需要更少的存储使用量和内存使用量。 However, since EB (extreme binning) utilized the minimum chunk ID of a file as the representative chunk signature, EB was not suitable for backup data stream mainly containing small files. To improve EB, this paper proposed simi index using simi hash as the feature of a file. It was a novel two-level index suitable for workload mainly consisting of small files. Experiment results show that, the deduplication efficiency of simi index is 24.8% better than EB, and the RAM usage of simiIndex only 0.265% of that of EB. Compared with EB,simi index needs less storage and less RAM.
出处 《计算机应用研究》 CSCD 北大核心 2013年第12期3614-3617,共4页 Application Research of Computers
基金 陕西省自然科学基金资助项目(2010JM8023) 航空科学基金资助项目(2010ZD53042)
关键词 重复数据删除 相似哈希 相似索引 块查找磁盘瓶颈问题 二级索引 deduplication simi hash similar index chunk-lookup disk bottleneck problem two-level index
  • 相关文献

参考文献14

  • 1ESHGHI K,LILLIBRIDGE M,WILCOCK L,et al.Jumbo store:pro-viding efficient incremental upload and versioning for a utility rendering service[C]//Proc of the 5 th USENIX Conference on File andStorage Technologies.Berkeley:USENIX,2007:123-138.
  • 2ZHU B,LI Kai,PATTERSON H.Avoiding the disk bottleneck in thedata domain deduplication file system[C]// Proc of the 6th USENIXConference on File and Storage Technologies.Berkeley:USENIX,2008:269-282.
  • 3LILLIBRIDGE M1ESHGHI K,BHAGWAT D,et al.Sparse indexing:large scale,inline deduplication using sampling and locality[C]//Proc of the 7th Conference on File and Storage Technologies.Berke-ley:USENIX,2009:111-123.
  • 4BHAGWAT D,ESHGHI K,L0NG D,et al.Extreme binning:scala-ble,parallel deduplication for chunk-based file backup[C]// Proc ofIEEE International Symposium on Modeling,Analysis & Simulation ofComputer and Telecommunication Systems.Washington DC:IEEEComputer Society,2009:1-9.
  • 5ARONOVICH L,ASHER RtBACHMAT E,et al.The design of asimilarity based deduplication system[C]// Proc of SYSTOR:TheIsraeli Experimental Systems Conference.New York:ACM Press,2009:6.
  • 6ROMANSKI B5HELDT LtKILIAN W,et al.Anchor-driven subchunkdeduplication[C]// Proc of SYSTOR 2011:The Israeli ExperimentalSystems Conference.New York:ACM Press,2011:16.
  • 7ZHANG Zhi-ke,BHAGWAT D,LITWIN W,et al.Improved dedupli-cation through parallel binning[C]// Proc of the 31st IEEE Interna-tional Performance Computing and Communications Conference.Washington DC:IEEE Compurter Society,2012:130-141.
  • 8ZHANG Zhi-ke,JIANG Ze-jun,LIU Zhi-qiang,ef al.LHs:a novelmethod of information retrieval avoiding an index using linear hashingwith key groups in deduplication[C]// Proc of International Confer-ence on Machine Learning and Cybernetics.Washington DC:IEEECompurter Society,2012:1312-1318.
  • 9DUBNICKI C,GRYZ L5HELDT L,et al.Hydrastor:a scalable sec-ondary storage[C]// Proc of the 7 th Conference on File and StorageTechnologies.Berkeley:USENIX,2009:97-210.
  • 10UNGUREANU C,ATKIN B5ARANYA A,et al.Hydrafs:a high-throughput file system for the hydrastor content-addressable storagesystem[C]// Proc of the 8th USENIX Conference on File and StorageTechnologies.Berkeley:USENIX,2010:225-238.

同被引文献1

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部