期刊文献+

基于重复数据删除的远程备份系统 被引量:1

Remote backup system based on data de-duplication
下载PDF
导出
摘要 针对传统远程备份中大量冗余数据导致备份效率低下和存储空间浪费的问题,设计并实现了一个基于重复数据删除的远程备份系统。首先根据文件的内容用Rabin指纹将备份文件划分为变长的数据块,把每个数据块的相关信息发送到备份中心,在备份中心利用Google Bigtable及Leveldb的索引算法辅以布隆过滤器对数据块进行判重,最后只传输和存储不重复的数据块。实验结果表明,采用该系统备份相似的数据集能够有效删除其中的重复数据。对数据集进行增量备份,在增量数据变化不大时,相比Rsync备份有更少的网络流量。 To the problem that a large number of redundant data caused inefficient backup and storage waste in traditional remote backup, a remote backup system based on data de-duplication is designed and implemented. Backup files are divided into variable length chunks based on Rabin fingerprint of contents. Chunks' information is sent to backup centre where duplicate chunks are sought by using Google Bigtable and Leveldb index algorithm along with bloom filter. Finally, it only transmitted and stored unique chunks. Experimental results show that, it can remove duplicate data effectively to backup similar data sets. Compared with Rsync backup, it has less network flow when it does a incremental backup which has small incremental data.
作者 姜涛 刘晓洁
出处 《计算机工程与设计》 CSCD 北大核心 2012年第12期4546-4550,共5页 Computer Engineering and Design
基金 国家自然科学基金项目(61173159) 教育部重大项目培育基金项目(708075)
关键词 重复数据删除 变长分块 磁盘索引 远程备份 数据容灾 data de-duplication variable-length chunking disk index remote backup data tolerant
  • 相关文献

参考文献3

二级参考文献51

  • 1黄舒怀,蔡敏.超前进位加法器的一种优化设计[J].半导体技术,2004,29(8):65-68. 被引量:5
  • 2杨天奇,周晔.一种增量式并行Web信息采集方法[J].计算机工程,2006,32(20):97-99. 被引量:5
  • 3蒋宗礼,赵钦,肖华,王蕊.高性能并行爬行器[J].计算机工程与设计,2006,27(24):4762-4766. 被引量:7
  • 4Bhagwat D,Pollack K,Long DDE,Schwarz T,Miller EL,P-ris JF.Providing high reliability in a minimum redundancy archival storage system.In:Proc.of the 14th Int'l Symp.on Modeling,Analysis,and Simulation of Computer and Telecommunication Systems (MASCOTS 2006).Washington:IEEE Computer Society Press,2006.413-421.
  • 5Zhu B,Li K.Avoiding the disk bottleneck in the data domain deduplication file system.In:Proc.of the 6th Usenix Conf.on File and Storage Technologies (FAST 2008).Berkeley:USENIX Association,2008.269-282.
  • 6Bhagwat D,Eshghi K,Mehra P.Content-Based document routing and index partitioning for scalable similarity-based searches in a large corpus.In:Berkhin P,Caruana R,Wu XD,Gaffney S,eds.Proc.of the 13th ACM SIGKDD Int'l Conf.on Knowledge Discovery and Data Mining (KDD 2007).New York:ACM Press,2007.105-112.
  • 7You LL,Pollack KT,Long DDE.Deep store:An archival storage system architecture.In:Proc.of the 21st Int'l Conf.on Data Engineering (ICDE 2005).Washington:IEEE Computer Society Press,2005.804-815.
  • 8Quinlan S,Dorward S.Venti:A new approach to archival storage.In:Proc.of the 1st Usenix Conf.on File and Storage Technologies (FAST 2002).Berkeley:USENIX Association,2002.89-102.
  • 9Sapuntzakis CP,Chandra R,Pfaff B,Chow J,Lam MS,Rosenblum M.Optimizing the migration of virtual computers.In:Proc.of the 5th Symp.on Operating Systems Design and Implementation (OSDI 2002).New York:ACM Press,2002.377-390.
  • 10Rabin MO.Fingerprinting by random polynomials.Technical Report,CRCT TR-15-81,Harvard University,1981.

共引文献126

同被引文献16

  • 1陆游游,敖莉,舒继武.一种基于重复数据删除的备份系统[J].计算机研究与发展,2012,49(S1):206-210. 被引量:5
  • 2韩德志,谢长生,李怀阳.存储备份技术探析[J].计算机应用研究,2004,21(6):1-4. 被引量:49
  • 3郭天杰,曹强,谢长生.远程镜像技术和方法研究[J].计算机工程与科学,2006,28(10):38-41. 被引量:6
  • 4Plumleigh M. Digital audio tape : New fuel stokes the smoldering hometaping fire[J].UCLA L. Rev, 1989,37:733.
  • 5Lignos D. Digital linear tape (DLT) Technology and product family o-verview[C]//NASA CONFERENCE PUBLICATION. NASA, 1995:211 -211.
  • 6Bobbarjung D R, Jagannathan S, Dubnicki C. Improving duplicate elim-ination in storage systems [ J ]. ACM Transactions on Storage ( TOS),2006,2(4) :424-448.
  • 7Walter Santos,Thiago Teixeira,Carla Machado,et al. A Scalable Paral-lel Deduplication Algorithm [ C ]//19th International Symposium onComputer Architecture and High Performance Computing, 2007 : 79-86.
  • 8Tin Thein Thwel,Ni Lar Thein. An Efficient Indexing Mechanism forData Deduplication [ C ] //International Conference on Intelligent Com-putation Technology and Automation,2010: 114 - 117.
  • 9Andrew Tridgell. Efficient Algorithms for Sorting and Synchronization[D] . The Australian National Univereity ,1999.
  • 10Liu C,Lu Y,Shi C,et al. ADMAD:Application-Driven Metadata AwareDe-duplication Archival Storage System [ C ]//Storage Network Archi-tecture and Parallel l/0s,2008. SNAPI’ 08. Fifth IEEE InternationalWorkshop on. IEEE ,2008;29 -35.

引证文献1

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部