期刊文献+

一种基于重复数据删除的备份系统 被引量:5

A Remote Data Backup System with Deduplication
下载PDF
导出
摘要 重复数据删除技术有效地提升了备份系统的备份效率,但重复数据的匹配开销也随之增加.针对该问题,设计并实现了一种基于重复数据删除的备份系统THBS,该系统提出了高精简的数据备份方法HAD(hierachical approach of data deduplication),依次从目录、文件、块、字节粒度分层多步,由粗及细地匹配删除重复数据,同时采用bloomfilter和倒排索引技术,以减少不必要的数据匹配与磁盘访问,提高匹配查找速度.通过两组真实数据集的实验发现,THBS在备份过程中节省了63.1%~96.7%的存储空间,比Scp和Rsync分别节约了71.3%~97.6%,41.2%~66.7%的网络带宽,累计备份时间分别为Scp和Rsync的75%~86%和91%~97%. 重复数据删除技术有效地提升了备份系统的备份效率,但重复数据的匹配开销也随之增加.针对该问题,设计并实现了一种基于重复数据删除的备份系统THBS,该系统提出了高精简的数据备份方法HAD(hierachical approach of data deduplication),依次从目录、文件、块、字节粒度分层多步,由粗及细地匹配删除重复数据,同时采用bloomfilter和倒排索引技术,以减少不必要的数据匹配与磁盘访问,提高匹配查找速度.通过两组真实数据集的实验发现,THBS在备份过程中节省了63.1%~96.7%的存储空间,比Scp和Rsync分别节约了71.3%~97.6%,41.2%~66.7%的网络带宽,累计备份时间分别为Scp和Rsync的75%~86%和91%~97%.
出处 《计算机研究与发展》 EI CSCD 北大核心 2012年第S1期206-210,共5页 Journal of Computer Research and Development
基金 国家"八六三"高技术研究发展计划基金项目(2009AA01A403) 国家自然科学基金项目(60873066) 高等学校博士学科点专项科研基金项目(200800030027)
关键词 备份系统 重复数据删除 层次化删冗 backup system data deduplication hierarchy approach for data deduplication
  • 相关文献

参考文献6

  • 1Policroniedes C,PraR I.Alternatives for detecting redundancy in storage systems data. Proc.of the 2004 USENIX AnnualTechnical Conf. (USENIX 2004) . 2004
  • 2Jain N,Dahlin M,Tewari R.Taper:Tiered approach for eliminating redundancy in replica synchronization. Proc of the4th Usenix Conf on File and Storage Technologies (FAST’’05) . 2005
  • 3Douglis P K F,Lavoie J,Tracey J M.Redundancy elimination within large collections of files. Usenix Annual Technical Conference . 2004
  • 4Bolosky W J,Corbin S,Goebel D,et al.Single instance storage in Windows2000. Proc of the4th Usenix Windows System Symposium . 2000
  • 5Langford J.Multiround rsync. http://www.cs.cmu.edu/-jcl/research/mrsync/mrsync.ps . 2012
  • 6敖莉,舒继武,李明强.重复数据删除技术[J].软件学报,2010,21(5):916-929. 被引量:119

二级参考文献42

  • 1Bhagwat D,Pollack K,Long DDE,Schwarz T,Miller EL,P-ris JF.Providing high reliability in a minimum redundancy archival storage system.In:Proc.of the 14th Int'l Symp.on Modeling,Analysis,and Simulation of Computer and Telecommunication Systems (MASCOTS 2006).Washington:IEEE Computer Society Press,2006.413-421.
  • 2Zhu B,Li K.Avoiding the disk bottleneck in the data domain deduplication file system.In:Proc.of the 6th Usenix Conf.on File and Storage Technologies (FAST 2008).Berkeley:USENIX Association,2008.269-282.
  • 3Bhagwat D,Eshghi K,Mehra P.Content-Based document routing and index partitioning for scalable similarity-based searches in a large corpus.In:Berkhin P,Caruana R,Wu XD,Gaffney S,eds.Proc.of the 13th ACM SIGKDD Int'l Conf.on Knowledge Discovery and Data Mining (KDD 2007).New York:ACM Press,2007.105-112.
  • 4You LL,Pollack KT,Long DDE.Deep store:An archival storage system architecture.In:Proc.of the 21st Int'l Conf.on Data Engineering (ICDE 2005).Washington:IEEE Computer Society Press,2005.804-815.
  • 5Quinlan S,Dorward S.Venti:A new approach to archival storage.In:Proc.of the 1st Usenix Conf.on File and Storage Technologies (FAST 2002).Berkeley:USENIX Association,2002.89-102.
  • 6Sapuntzakis CP,Chandra R,Pfaff B,Chow J,Lam MS,Rosenblum M.Optimizing the migration of virtual computers.In:Proc.of the 5th Symp.on Operating Systems Design and Implementation (OSDI 2002).New York:ACM Press,2002.377-390.
  • 7Rabin MO.Fingerprinting by random polynomials.Technical Report,CRCT TR-15-81,Harvard University,1981.
  • 8Rivest R.The MD5 message-digest algorithm.1992.http://www.python.org/doc/current/lib/module-md5.html.
  • 9U.S.National Institute of Standards and Technology (NIST).Federal Information Processing Standards (FIPS) Publication 180-1:Secure Hash Standard.1995.http://www.itl.nist.gov/fipspubs/fip180-1.htm.
  • 10U.S.National Institute of Standards and Technology (NIST).Federal Information Processing Standards (FIPS) Publication 180-2:Secure Hash Standard.2002.http://csrc.nist.gov/publications/fips/fips180-2/fips180-2.pdf.

共引文献118

同被引文献49

  • 1李超,周晓阳,王树鹏,云晓春.基于二级索引的重复数据删除系统中性能相关参数的量化分析与研究[J].计算机研究与发展,2012,49(S2):173-177. 被引量:3
  • 2贾志凯,王树鹏,陈光达,彭成.一种并行层次化的重复数据删除技术[J].计算机研究与发展,2011,48(S1):100-104. 被引量:3
  • 3韩德志,谢长生,李怀阳.存储备份技术探析[J].计算机应用研究,2004,21(6):1-4. 被引量:49
  • 4郭天杰,曹强,谢长生.远程镜像技术和方法研究[J].计算机工程与科学,2006,28(10):38-41. 被引量:6
  • 5Plumleigh M. Digital audio tape : New fuel stokes the smoldering hometaping fire[J].UCLA L. Rev, 1989,37:733.
  • 6Lignos D. Digital linear tape (DLT) Technology and product family o-verview[C]//NASA CONFERENCE PUBLICATION. NASA, 1995:211 -211.
  • 7Bobbarjung D R, Jagannathan S, Dubnicki C. Improving duplicate elim-ination in storage systems [ J ]. ACM Transactions on Storage ( TOS),2006,2(4) :424-448.
  • 8Walter Santos,Thiago Teixeira,Carla Machado,et al. A Scalable Paral-lel Deduplication Algorithm [ C ]//19th International Symposium onComputer Architecture and High Performance Computing, 2007 : 79-86.
  • 9Tin Thein Thwel,Ni Lar Thein. An Efficient Indexing Mechanism forData Deduplication [ C ] //International Conference on Intelligent Com-putation Technology and Automation,2010: 114 - 117.
  • 10Andrew Tridgell. Efficient Algorithms for Sorting and Synchronization[D] . The Australian National Univereity ,1999.

引证文献5

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部