期刊文献+

一种基于预分块和滑动窗口的重复数据消除方法

Deduplication method based on content defined pre-chunking and sliding window
原文传递
导出
摘要 针对现有重复数据消除方法中提高压缩比和降低元数据开销之间的矛盾,提出了一种基于预分块和滑动窗口的重复数据消除方法并建立了性能分析通用模型.该方法首先对数据对象进行基于内容的预分块,再对数据变动区域和非变动区域采用不同的分块策略,从而在分块大小预期值较大时,仍能获得较高的压缩比并降低了元数据开销.真实数据集上的实验结果表明,该方法的平均压缩比高于现有最优值,而平均时间开销显著降低. To address the contradiction between improving compression ratio and reducing metadata cost, a deduplication method based on pre-chunking and sliding window is proposed. A universal performance-analyzing model is also given. In this method, the data objects are pre-chunked based on content, then different chunking strategies are used on the data changing regions and the non-changing regions respectively. A satisfying compression ratio and lower metadata cost can be achieved with a relatively larger expected chunk size. The experimental results on real data show that the average compression ratio of the method is higher than the current optimal value, and the average time cost is reduced significantly.
出处 《控制与决策》 EI CSCD 北大核心 2012年第8期1157-1162,1168,共7页 Control and Decision
基金 国家自然科学基金项目(60873075 60973118) 教育部培育基金项目(708078)
关键词 重复数据消除 数据压缩 滑动窗口 内容分块 deduplication data compression sliding window content defined chunking
  • 相关文献

参考文献15

  • 1敖莉,舒继武,李明强.重复数据删除技术[J].软件学报,2010,21(5):916-929. 被引量:119
  • 2Kruus E, Ungureanu C, Dubnicki C. Bimodal content defined chunking for backup streams[C]. Proc of 8th USENIX Conference on File and Storage Technologies. USENIX Association, 2010:18-31.
  • 3Yang T M, Jiang H, Feng D, et al. DEBAR: A scalable high-performance De-duplication storage system for backup and archiving[C]. Proc of 24th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2010. IEEE Computer Society, 2010:1-12.
  • 4朱恒民,王宁生.一种改进的相似重复记录检测方法[J].控制与决策,2006,21(7):805-808. 被引量:12
  • 5Eshghi K, Tang H K. A framework for analyzing and improving content-based chunking algorithms[R]. Hewlett-Packard Labs, 2005.
  • 6Quinlan S, Dorward S. Venti: a new approach to archival storage[C]. Proc of USENIX Conference on File and Storage Technologies, FAST 2002. USENIX Association, 2002:89-102.
  • 7Bobbarjung D R, Jagannathan S, Dubnicki C. Improving duplicate elimination in storage systems[J]. ACM Transactions on Storage, 2006, 2(4): 424-448.
  • 8Ports D, Clements A T, Demaine E D. PersiFS: a versioned file system with an efficient representation[C]. Proc of 20th ACM Symposium on Operating Systems Principles. Association for Computing Machinery, 2005:1-2.
  • 9Jain N, Dahlin M, Tewari R. Taper: Tiered approach for eliminating redundancy in replica synchronization[C]. Proc of 4th USENIX Conference on File and Storage Technologies. USENIX Association, 2005:281-294.
  • 10Muthitacharoen A, Chen B, Mazieres D. A low-bandwidth network file system[C]. Proc of 18th ACM Symposium on Operating Systems Principles (SOSP'01). Association for Computing Machinery, 2001:174-187.

二级参考文献48

  • 1Bhagwat D,Pollack K,Long DDE,Schwarz T,Miller EL,P-ris JF.Providing high reliability in a minimum redundancy archival storage system.In:Proc.of the 14th Int'l Symp.on Modeling,Analysis,and Simulation of Computer and Telecommunication Systems (MASCOTS 2006).Washington:IEEE Computer Society Press,2006.413-421.
  • 2Zhu B,Li K.Avoiding the disk bottleneck in the data domain deduplication file system.In:Proc.of the 6th Usenix Conf.on File and Storage Technologies (FAST 2008).Berkeley:USENIX Association,2008.269-282.
  • 3Bhagwat D,Eshghi K,Mehra P.Content-Based document routing and index partitioning for scalable similarity-based searches in a large corpus.In:Berkhin P,Caruana R,Wu XD,Gaffney S,eds.Proc.of the 13th ACM SIGKDD Int'l Conf.on Knowledge Discovery and Data Mining (KDD 2007).New York:ACM Press,2007.105-112.
  • 4You LL,Pollack KT,Long DDE.Deep store:An archival storage system architecture.In:Proc.of the 21st Int'l Conf.on Data Engineering (ICDE 2005).Washington:IEEE Computer Society Press,2005.804-815.
  • 5Quinlan S,Dorward S.Venti:A new approach to archival storage.In:Proc.of the 1st Usenix Conf.on File and Storage Technologies (FAST 2002).Berkeley:USENIX Association,2002.89-102.
  • 6Sapuntzakis CP,Chandra R,Pfaff B,Chow J,Lam MS,Rosenblum M.Optimizing the migration of virtual computers.In:Proc.of the 5th Symp.on Operating Systems Design and Implementation (OSDI 2002).New York:ACM Press,2002.377-390.
  • 7Rabin MO.Fingerprinting by random polynomials.Technical Report,CRCT TR-15-81,Harvard University,1981.
  • 8Rivest R.The MD5 message-digest algorithm.1992.http://www.python.org/doc/current/lib/module-md5.html.
  • 9U.S.National Institute of Standards and Technology (NIST).Federal Information Processing Standards (FIPS) Publication 180-1:Secure Hash Standard.1995.http://www.itl.nist.gov/fipspubs/fip180-1.htm.
  • 10U.S.National Institute of Standards and Technology (NIST).Federal Information Processing Standards (FIPS) Publication 180-2:Secure Hash Standard.2002.http://csrc.nist.gov/publications/fips/fips180-2/fips180-2.pdf.

共引文献128

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部