期刊文献+

Improving Metadata Caching Efficiency for Data Deduplication via In-RAM Metadata Utilization

Improving Metadata Caching Efficiency for Data Deduplication via In-RAM Metadata Utilization
原文传递
导出
摘要 We describe a data deduplication system for backup storage of PC disk images, named in-RAM metadata utilizing deduplication (IRwMUD). In-RAM hash granularity adaptation and miniLZO based data compression are firstly proposed to reduce the in-RAM metadata size and thereby reduce the space overheads required by the in-RAM metadata caches. Secondly, an in-RAM metadata write cache, as opposed to the traditional metadata read cache, is proposed for further reducing metadata-related disk I/O operations and improving deduplication throughput. During deduplication, the metadata write cache is managed following the LRU caching policy. For each manifest that is hit in the metadata write cache, an expensive manifest reloading operation from the disk is avoided. After deduplieation, all the manifests in the metadata write cache are cleared and stored on the disk. Our experimental results using 1.5 TB real-world disk image dataset show that I) IR-MUD achieved about 95% size reduction for the deduplication metadata, with a small time overhead introduced, 2) when the metadata write cache was not utilized, with the same RAM space size for the metadata read cache, IR-MUD achieved a 400% higher RAM hit ratio and a 50% higher deduplication throughput, as compared with the classic Sparse Indexing deduplication system where no metadata utilization approaches are utilized, and 3) when the metadata write cache was utilized and enough RAM space was available, IR-MUD achieved a 500% higher RAM hit ratio compared with Sparse Indexing and a 70% higher deduplication throughput compared with IR-MUD with only a single metadata read cache. The in-RAM metadata harnessing and metadata write caching approaches of IR-MUD can be applied in most parallel deduplication systems for improving metadata caching efficiency. We describe a data deduplication system for backup storage of PC disk images, named in-RAM metadata utilizing deduplication (IRwMUD). In-RAM hash granularity adaptation and miniLZO based data compression are firstly proposed to reduce the in-RAM metadata size and thereby reduce the space overheads required by the in-RAM metadata caches. Secondly, an in-RAM metadata write cache, as opposed to the traditional metadata read cache, is proposed for further reducing metadata-related disk I/O operations and improving deduplication throughput. During deduplication, the metadata write cache is managed following the LRU caching policy. For each manifest that is hit in the metadata write cache, an expensive manifest reloading operation from the disk is avoided. After deduplieation, all the manifests in the metadata write cache are cleared and stored on the disk. Our experimental results using 1.5 TB real-world disk image dataset show that I) IR-MUD achieved about 95% size reduction for the deduplication metadata, with a small time overhead introduced, 2) when the metadata write cache was not utilized, with the same RAM space size for the metadata read cache, IR-MUD achieved a 400% higher RAM hit ratio and a 50% higher deduplication throughput, as compared with the classic Sparse Indexing deduplication system where no metadata utilization approaches are utilized, and 3) when the metadata write cache was utilized and enough RAM space was available, IR-MUD achieved a 500% higher RAM hit ratio compared with Sparse Indexing and a 70% higher deduplication throughput compared with IR-MUD with only a single metadata read cache. The in-RAM metadata harnessing and metadata write caching approaches of IR-MUD can be applied in most parallel deduplication systems for improving metadata caching efficiency.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2016年第4期805-819,共15页 计算机科学技术学报(英文版)
基金 This work is supported by the National Science Fund for Distinguished Young Scholars of China under Grant No. 61125102 and the Key Program of National Natural Science Foundation of China under Grant No. 61133008.
关键词 data deduplication CACHE metadata utilization data deduplication, cache, metadata utilization
  • 相关文献

参考文献30

  • 1Black J. Compare-by-hash: A reasoned analysis, in Proc. the USENIX Annual Technical Conference (ATC), May 2006, pp.85-90.
  • 2Meister D, Kaiser J, Brinkmann A, Cortes T, Kuhn M, Kunkel J. A study on data deduplication in HPC storage systems. In Proc. the International Conference for High Performance Computing, Networking, Storage and Anal- ysis, November 2012, Article No. 7.
  • 3Bloom B H. Space/time trade-offs in hash coding with al- lowable errors. Commun. ACM, July 1970, 13(7): 422-426.
  • 4Lillibridge M, Eshghi K, Bhagwat D, Deolalikar V, Trezise G, Camble P. Sparse Indexing: Large scale, inline dedu- plication using sampling and locality. In Proc. the 7$h USENIX Conference on File and Storage Technologies (FAST), February 2009, pp.111-123.
  • 5Tanenbaum A S. Modern Operating Systems (2nd edition). Prentice Hall PTR, 2001.
  • 6Zhou B, Wen J. Hysteresis re-chunking based metadata harnessing deduplication of disk images. In Proc. the 42nd IEEE International Conference on Parallel Process- ing (ICPP), October 2013, pp.389-398.
  • 7Rabin M O. Fingerprinting by random polynomials. Tech- nical Report, TR-15-81, Center for Research in Computing Technology, Harvard University, 1981.
  • 8Muthitacharoen A, Chen B, Mazi~res D. A low-bandwidth network file system. In Proc. the 18~h A CM Symposium on Operating Systems Principles, October 2001, pp.174-187.
  • 9Romafiski B, Heldt T, Kilian W et al. Anchor-driven sub- chunk deduplication. In Proc. the ~th Annual International Conference on Systems and Storage (SYSTOR), May 2011, pp.16:1-6:13.
  • 10Tolia N, Kozuch M, Satyanarayanan M, Karp B, Bressoud T, Perrig A. Opportunistic use of content addressable stor- age for distributed file systems. In Proc. the USENIX An- nual Technical Conference ( A TC), June 2003, pp.127-140.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部