期刊文献+

基于文件相似性分簇的重复数据消除模型 被引量:2

Deduplication model based on file-similarity clustering
下载PDF
导出
摘要 为解决现有提高重复数据消除系统吞吐量方法的局部性依赖和多节点依赖问题,提出了一种基于文件相似性分簇的重复数据消除模型。该模型将传统平面型索引结构拓展为空间结构,并依据Broder定理仅选择少量最具代表性的索引驻留在内存中;同时对索引进行横向分片并分布到完全自治的多个节点。实验结果表明,该方法能有效提高大规模云存储环境下重复数据消除性能和平均吞吐量,且各节点数据负载量均衡,故该模型可扩展性强。 To resolve the locality dependence and multiple-nodes dependence problems of the current throughput improving methods for deduplication system,this paper proposed a deduplication model based on file-similarity clustering.This model expanded the traditional flat index structure into spatial structure.According to the Broder's theorem,it kept only a handful of the most representative indices in RAM.It partitioned the index horizontally and distributed on several totally autonomous storage nodes.The experimental results indicate that the model can effectively improve the deduplication performance and the throughput on average in the large scale cloud-storage environment,and the data loads are balanced.Therefore,the model can be extended smoothly.
出处 《计算机应用研究》 CSCD 北大核心 2012年第5期1684-1689,共6页 Application Research of Computers
基金 教育部培育基金资助项目(708078) 国家自然科学基金资助项目(60873075 60973118)
关键词 云存储 重复数据消除 吞吐量 文件相似性分簇 负载均衡 cloud-storage deduplication throughput file-similarity clustering load balancing
  • 相关文献

参考文献14

  • 1GANTZ J F,CHUTE C,MANFREDIZ A,et al.The diverse and ex-ploding digital universe:an updated forecast of worldwide informationgrowth through 2011[R].Framingham:International Data Corpora-tion,2008.
  • 2MEYER D T,BOLOSKY W J.A study of practical deduplication[C]//Proc of the 9th USENIX Conference on File and Storage Tech-nologies.Berkeley:USENIX Association,2011:1-13.
  • 3HARNIK D,PINKAS B,SHULMAN-PELEG A.Side channels incloud services:deduplication in cloud storage[J].IEEE Security&Privacy,2010,8(6):40-47.
  • 4王灿,秦志光,冯朝胜,彭静.面向重复数据消除的备份数据加密方法[J].计算机应用,2010,30(7):1763-1766. 被引量:4
  • 5BRODER A,MITZENMACHER M.Network applications of bloomfilters:a survey[J].Internet Mathematics,2004,1(4):485-509.
  • 6ZHU B,LI Kai,PATTERSON H.Avoiding the disk bottleneck in thedata domain deduplication file system[C]//Proc of the 6th USENIXConference on File and Storage Technologies.Berkeley:USENIX As-sociation,2008:269-282.
  • 7LILLIBRIDGE M,ESHGHI K,BHAGWAT D,et al.Sparse inde-xing:large scale,inline deduplication using sampling and locality[C]//Proc of the 7th USENIX Conference on File and Storage Tech-nologies.Berkeley:USENIX Association,2009:111-123.
  • 8KUBIATOWICZ J,BINDEL D,CHEN Yan,et al.Oceanstore:anarchitecture for global-scale persistent storage[C]//Proc of the 9thInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems.New York:ACM,2000:190-201.
  • 9COX L P,MURRAY C D,NOBLE B D.Pastiche:making backupcheap and easy[C]//Proc of the 5th Symposium on Operating Sys-tems Design and Implementation.New York:ACM,2002:285-298.
  • 10BRODER A Z.On the resemblance and containment of documents[C]//Proc of Compression and Complexity of Sequences.WashingtonDC:IEEE Computer Society,1997:21-29.

二级参考文献8

  • 1Menezes A J.应用密码学手册[M].北京:电子工业出版社,2005.
  • 2MEISTER D,BRINKMANN A.Multi-level comparison of data deduplication in a backup scenario[C] // Proceedings of SYSTOR 2009:The Israeli Experimental Systems Conference.New York:ACM,2009:623-629.
  • 3QUINLAN S,DORWARD S.Venti:A new approach to archival storage[C] // Proceedings of the 1st USENIX Conference on File and Storage Technologies.Berkeley:USENIX Association,2002:89-101.
  • 4ZHU B,LI K,PATTERSON H.Avoiding the disk bottleneck in the data domain deduplication file system[C] // Proceedings of the 6th USENIX Conference on File and Storage Technologies.Berkeley:USENIX Association,2008:269-282.
  • 5MUTHITACHAROEN A,CHEN B,MAZIERES D.A low-bandwidth network file system[C] // Proceedings of the 18th ACM Symposium on Operating Systems Principles.New York:ACM,2001:174-187.
  • 6DOUCEUR J R,ADYA A,BOLOSKY W J,et al.Reclaiming space from duplicate files in a serverless distributed file system[C] // Proceedings of the 22nd International Conference on Distributed Computing Systems.Washington,DC:IEEE Computer Society,2002:617-624.
  • 7STORER M W,GREENAN K,LONG D D E,et al.Secure data deduplication[C] // Proceedings of the 4th ACM International Workshop on Storage Security and Survivability.New York:ACM,2008:1-10.
  • 8KALLAHALLA M,RIEDEL E,SWAMINATHAN R,et al.Plutus:Scalable secure file sharing on untrusted storage[C] // Proceedings of the 2nd USENIX Conference on File and Storage Technologies.Berkeley:USENIX Association,2003:29-42.

共引文献3

同被引文献6

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部