期刊文献+

面向大数据备份的应用感知并行重删存储系统 被引量:2

Application-Aware Parallel Deduplication Storage System for Big Data Backup
下载PDF
导出
摘要 随着社会数字网络信息化进程的不断推进,全球IT企业需要管理的数据量急剧增长.当前大规模数据中心对海量复杂数据管理在扩展性、性能和成本等方面要求的不断提升.为了减缓企业存储容量的增长速度,传统的重复数据删除存储管理技术和方法已无法满足大数据备份应用的服务质量需求,新的软硬件技术进步为大数据管理能力的提升带来机遇.提出了一种面向大数据备份的应用感知并行重删存储系统,它利用新型非易失性存储来提升块索引的并发查询能力,并通过应用层丰富的文件语义信息设计应用感知的数据路由机制.通过实验论证,该并行重删存储系统不仅能实现单个节点内高性能的并行数据重删处理,还能通过横向扩展提升集群数据重删的吞吐量. With the continuous advancement of the informatization process in social digital network,the volume of data needs to be managed by the global IT enterprises is growing rapidly.The requirements of the massive complex data management are constantly enhanced in terms of scalability,performance and cost in the storage systems.To slow down the growth rate of storage capacity in enterprises,conventional deduplication based storage management techniques and methods cannot satisfy the QoS requirements of big data backup,while the progress of new software and hardware technologies brings opportunities to promote the ability of big data management.We provide an application aware parallel deduplication storage system for big data backup.It utilizes the novel nonvolatile storage to explore the concurrent query ability of chunk index structure,and an applicationaware data routing scheme is designed by leveraging file semantic informtion in the application layer.Our experiment results show that the proposed storage system can not only achieve high performance in parallel deduplication process,but also can improve the system throughput of cluster deduplication.
出处 《计算机研究与发展》 EI CSCD 北大核心 2015年第S2期139-147,共9页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61402518) 国家"八六三"高技术研究发展计划基金项目(2012AA01A509 2012AA01A510)
关键词 大数据备份 并行重删 应用感知 非易失存储 扩展性 big data backup parallel deduplication application awareness non-volatile storage scalability
  • 相关文献

参考文献23

  • 1Chen Feng,Lee Rubao,Zhang Xiaodong.Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing. Proc of IEEE HPCA’’11 . 2011
  • 2Vetter J,Mittal S.Opportunities for nonvolatile memory systems in extreme-scale high-performance computing. IEEE Computing in Science&Engineering . 2015
  • 3Aronovich L,Asher R,Bachmat E,et al.The design of a similarity based deduplication system. Proc of the SYSTOR’’09 . 2009
  • 4Biggar H.Experiencing data deduplication:Improving efficiency and reducing capacity requirements. . 2007
  • 5Davide Frey,Anne-Marie Kermarrec,Konstantinos Kloudas.Probabilistic Deduplication for Cluster-Based Storage Systems. Proceedings of the3rd ACM Symposium on Cloud Computing (SOCC’’12) . 2012
  • 6H. Jiang,K. Zhou,D. Feng, et al.MAD2: A Scalable High-Throughput ExactDeduplication Approach for Network Backup Services. 26th IEEE MSST . 2010
  • 7Meister D,Brinkmann A.dedupv1:Improving deduplication throughput using solid state drives. Proc of the MSST’’10 . 2010
  • 8Dubnicki C,Gryz L,Heldt L, et al.HYDRAstor: A Scalable Secondary Storage. The Federation Against Software Theft . 2009
  • 9Y.Fu,H.Jiang,N.Xiao,L.Tian,F.Liu,L.Xu.Application-aware local-global source deduplication for cloud backup services of personal storage. IEEE Transactions on Parallel and Distributed Systems . 2014
  • 10Lillibridge M,Eshghi K,Bhagwat D, et al.Sparse Indexing:Large Scale, Inline Deduplication Using Sampling and Locality. The Federation Against Software Theft . 2009

二级参考文献87

  • 1Bhagwat D,Pollack K,Long DDE,Schwarz T,Miller EL,P-ris JF.Providing high reliability in a minimum redundancy archival storage system.In:Proc.of the 14th Int'l Symp.on Modeling,Analysis,and Simulation of Computer and Telecommunication Systems (MASCOTS 2006).Washington:IEEE Computer Society Press,2006.413-421.
  • 2Zhu B,Li K.Avoiding the disk bottleneck in the data domain deduplication file system.In:Proc.of the 6th Usenix Conf.on File and Storage Technologies (FAST 2008).Berkeley:USENIX Association,2008.269-282.
  • 3Bhagwat D,Eshghi K,Mehra P.Content-Based document routing and index partitioning for scalable similarity-based searches in a large corpus.In:Berkhin P,Caruana R,Wu XD,Gaffney S,eds.Proc.of the 13th ACM SIGKDD Int'l Conf.on Knowledge Discovery and Data Mining (KDD 2007).New York:ACM Press,2007.105-112.
  • 4You LL,Pollack KT,Long DDE.Deep store:An archival storage system architecture.In:Proc.of the 21st Int'l Conf.on Data Engineering (ICDE 2005).Washington:IEEE Computer Society Press,2005.804-815.
  • 5Quinlan S,Dorward S.Venti:A new approach to archival storage.In:Proc.of the 1st Usenix Conf.on File and Storage Technologies (FAST 2002).Berkeley:USENIX Association,2002.89-102.
  • 6Sapuntzakis CP,Chandra R,Pfaff B,Chow J,Lam MS,Rosenblum M.Optimizing the migration of virtual computers.In:Proc.of the 5th Symp.on Operating Systems Design and Implementation (OSDI 2002).New York:ACM Press,2002.377-390.
  • 7Rabin MO.Fingerprinting by random polynomials.Technical Report,CRCT TR-15-81,Harvard University,1981.
  • 8Rivest R.The MD5 message-digest algorithm.1992.http://www.python.org/doc/current/lib/module-md5.html.
  • 9U.S.National Institute of Standards and Technology (NIST).Federal Information Processing Standards (FIPS) Publication 180-1:Secure Hash Standard.1995.http://www.itl.nist.gov/fipspubs/fip180-1.htm.
  • 10U.S.National Institute of Standards and Technology (NIST).Federal Information Processing Standards (FIPS) Publication 180-2:Secure Hash Standard.2002.http://csrc.nist.gov/publications/fips/fips180-2/fips180-2.pdf.

共引文献152

同被引文献16

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部