面向大数据备份的应用感知并行重删存储系统被引量：2

Application-Aware Parallel Deduplication Storage System for Big Data Backup

下载PDF

导出

摘要随着社会数字网络信息化进程的不断推进,全球IT企业需要管理的数据量急剧增长.当前大规模数据中心对海量复杂数据管理在扩展性、性能和成本等方面要求的不断提升.为了减缓企业存储容量的增长速度,传统的重复数据删除存储管理技术和方法已无法满足大数据备份应用的服务质量需求,新的软硬件技术进步为大数据管理能力的提升带来机遇.提出了一种面向大数据备份的应用感知并行重删存储系统,它利用新型非易失性存储来提升块索引的并发查询能力,并通过应用层丰富的文件语义信息设计应用感知的数据路由机制.通过实验论证,该并行重删存储系统不仅能实现单个节点内高性能的并行数据重删处理,还能通过横向扩展提升集群数据重删的吞吐量. With the continuous advancement of the informatization process in social digital network,the volume of data needs to be managed by the global IT enterprises is growing rapidly.The requirements of the massive complex data management are constantly enhanced in terms of scalability,performance and cost in the storage systems.To slow down the growth rate of storage capacity in enterprises,conventional deduplication based storage management techniques and methods cannot satisfy the QoS requirements of big data backup,while the progress of new software and hardware technologies brings opportunities to promote the ability of big data management.We provide an application aware parallel deduplication storage system for big data backup.It utilizes the novel nonvolatile storage to explore the concurrent query ability of chunk index structure,and an applicationaware data routing scheme is designed by leveraging file semantic informtion in the application layer.Our experiment results show that the proposed storage system can not only achieve high performance in parallel deduplication process,but also can improve the system throughput of cluster deduplication.

作者付印金胡谷雨倪桂强陈卫卫卢继荣

机构地区解放军理工大学指挥信息系统学院

出处《计算机研究与发展》 EI CSCD 北大核心 2015年第S2期139-147,共9页 Journal of Computer Research and Development

基金国家自然科学基金项目(61402518) 国家"八六三"高技术研究发展计划基金项目(2012AA01A509 2012AA01A510)

关键词大数据备份并行重删应用感知非易失存储扩展性 big data backup parallel deduplication application awareness non-volatile storage scalability

分类号 TP333 [自动化与计算机技术—计算机系统结构] TP309.3 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献23

1Chen Feng,Lee Rubao,Zhang Xiaodong.Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing. Proc of IEEE HPCA’’11 . 2011
2Vetter J,Mittal S.Opportunities for nonvolatile memory systems in extreme-scale high-performance computing. IEEE Computing in Science&Engineering . 2015
3Aronovich L,Asher R,Bachmat E,et al.The design of a similarity based deduplication system. Proc of the SYSTOR’’09 . 2009
4Biggar H.Experiencing data deduplication:Improving efficiency and reducing capacity requirements. . 2007
5Davide Frey,Anne-Marie Kermarrec,Konstantinos Kloudas.Probabilistic Deduplication for Cluster-Based Storage Systems. Proceedings of the3rd ACM Symposium on Cloud Computing (SOCC’’12) . 2012
6H. Jiang,K. Zhou,D. Feng, et al.MAD2: A Scalable High-Throughput ExactDeduplication Approach for Network Backup Services. 26th IEEE MSST . 2010
7Meister D,Brinkmann A.dedupv1:Improving deduplication throughput using solid state drives. Proc of the MSST’’10 . 2010
8Dubnicki C,Gryz L,Heldt L, et al.HYDRAstor: A Scalable Secondary Storage. The Federation Against Software Theft . 2009
9Y.Fu,H.Jiang,N.Xiao,L.Tian,F.Liu,L.Xu.Application-aware local-global source deduplication for cloud backup services of personal storage. IEEE Transactions on Parallel and Distributed Systems . 2014
10Lillibridge M,Eshghi K,Bhagwat D, et al.Sparse Indexing:Large Scale, Inline Deduplication Using Sampling and Locality. The Federation Against Software Theft . 2009

二级参考文献87

1Bhagwat D,Pollack K,Long DDE,Schwarz T,Miller EL,P-ris JF.Providing high reliability in a minimum redundancy archival storage system.In:Proc.of the 14th Int'l Symp.on Modeling,Analysis,and Simulation of Computer and Telecommunication Systems (MASCOTS 2006).Washington:IEEE Computer Society Press,2006.413-421.
2Zhu B,Li K.Avoiding the disk bottleneck in the data domain deduplication file system.In:Proc.of the 6th Usenix Conf.on File and Storage Technologies (FAST 2008).Berkeley:USENIX Association,2008.269-282.
3Bhagwat D,Eshghi K,Mehra P.Content-Based document routing and index partitioning for scalable similarity-based searches in a large corpus.In:Berkhin P,Caruana R,Wu XD,Gaffney S,eds.Proc.of the 13th ACM SIGKDD Int'l Conf.on Knowledge Discovery and Data Mining (KDD 2007).New York:ACM Press,2007.105-112.
4You LL,Pollack KT,Long DDE.Deep store:An archival storage system architecture.In:Proc.of the 21st Int'l Conf.on Data Engineering (ICDE 2005).Washington:IEEE Computer Society Press,2005.804-815.
5Quinlan S,Dorward S.Venti:A new approach to archival storage.In:Proc.of the 1st Usenix Conf.on File and Storage Technologies (FAST 2002).Berkeley:USENIX Association,2002.89-102.
6Sapuntzakis CP,Chandra R,Pfaff B,Chow J,Lam MS,Rosenblum M.Optimizing the migration of virtual computers.In:Proc.of the 5th Symp.on Operating Systems Design and Implementation (OSDI 2002).New York:ACM Press,2002.377-390.
7Rabin MO.Fingerprinting by random polynomials.Technical Report,CRCT TR-15-81,Harvard University,1981.
8Rivest R.The MD5 message-digest algorithm.1992.http://www.python.org/doc/current/lib/module-md5.html.
9U.S.National Institute of Standards and Technology (NIST).Federal Information Processing Standards (FIPS) Publication 180-1:Secure Hash Standard.1995.http://www.itl.nist.gov/fipspubs/fip180-1.htm.
10U.S.National Institute of Standards and Technology (NIST).Federal Information Processing Standards (FIPS) Publication 180-2:Secure Hash Standard.2002.http://csrc.nist.gov/publications/fips/fips180-2/fips180-2.pdf.

共引文献152

1张砚波,刘正伟,文中领,王永海.一种高效存储解决方案的分析与研究[J].计算机研究与发展,2012,49(S1):180-184. 被引量：9
2马井玮,王克宾,赵彬,马良,王刚,刘晓光.基于重复数据删除的连续数据保护系统的快速回滚[J].计算机研究与发展,2012,49(S1):196-200.
3陆游游,敖莉,舒继武.一种基于重复数据删除的备份系统[J].计算机研究与发展,2012,49(S1):206-210. 被引量：5
4彭成,王树鹏,贾志凯.基于纠删码的数据消冗存储系统可靠性增强研究[J].计算机研究与发展,2011,48(S1):1-6. 被引量：3
5刘厚贵,邢晶,霍志刚,安学军.一种支持海量数据备份的可扩展分布式重复数据删除系统[J].计算机研究与发展,2013,50(S2):64-70. 被引量：5
6尹玉冰,孙竞,余宏亮.一种广域网环境下的分布式冗余删除存储系统[J].中兴通讯技术,2010,16(5):20-23. 被引量：1
7申彦舒.重复数据删除技术在数字图书馆中的应用[J].图书馆学刊,2011,33(7):123-125. 被引量：2
8邓亮,胡晓勤,梁刚.基于重复数据删除技术的SQL Server数据库备份系统[J].计算机安全,2011(7):9-12. 被引量：1
9张鑫,丁志刚,郑树泉.基于Cortex-M3的M2M监控终端[J].计算机应用,2011,31(11):3165-3168. 被引量：7
10马建庭,杨频.基于重复数据删除的多用户文件备份系统[J].计算机工程与设计,2011,32(11):3586-3589. 被引量：2

同被引文献16

1敖莉,舒继武,李明强.重复数据删除技术[J].软件学报,2010,21(5):916-929. 被引量：119
2付印金,肖侬,刘芳.重复数据删除关键技术研究进展[J].计算机研究与发展,2012,49(1):12-20. 被引量：65
3谢平.存储系统重复数据删除技术研究综述[J].计算机科学,2014,41(1):22-30. 被引量：26
4程学旗,靳小龙,王元卓,郭嘉丰,张铁赢,李国杰.大数据系统和分析技术综述[J].软件学报,2014,25(9):1889-1908. 被引量：747
5毕朝国,徐小龙.一种云存储系统中重复数据删除机制[J].计算机应用研究,2014,31(10):3052-3055. 被引量：9
6席晔文,杨金民.基于双布鲁姆过滤器的数据排重技术[J].计算机工程与应用,2014,50(23):198-202. 被引量：2
7毛波,叶阁焰,蓝琰佳,张杨松,吴素贞.一种基于重复数据删除技术的云中云存储系统[J].计算机研究与发展,2015,52(6):1278-1287. 被引量：14
8姚文斌,叶鹏迪,李小勇,常静坤.基于压缩近邻的查重元数据去冗算法设计[J].通信学报,2015,36(8):1-7. 被引量：3
9毕凯,王晓丹,邢雅琼.基于证据空间有效性指标的聚类选择性集成[J].通信学报,2015,36(8):135-145. 被引量：5
10徐计,王国胤,于洪.基于粒计算的大数据处理[J].计算机学报,2015,38(8):1497-1517. 被引量：120

引证文献2

1王青松,葛慧.相似聚类的二级索引重复数据删除算法[J].小型微型计算机系统,2017,38(12):2797-2801. 被引量：2
2胡宁玉,赵青杉,张静.基于重复数据删除的快速恢复方案研究[J].忻州师范学院学报,2017,33(5):34-38.

二级引证文献2

1朱荣军.物联网感知信息采集过程重复数据批量剔除方法[J].齐齐哈尔大学学报（自然科学版）,2022,38(1):21-25. 被引量：2
2张兴兰,何丹丹.基于改进的Simhash算法的相似文档识别技术[J].计算机科学与应用,2020,10(2):371-378. 被引量：3

1许弘.物联网应用感知层关键技术[J].电子技术与软件工程,2016(20):34-34.
2贾雷.下一代网络入侵防御研究[J].网络安全技术与应用,2013(9):74-76. 被引量：4
3Riverbed：应用感知性能管理[J].网管员世界,2012(18):45-45.
4干丽萍,许易,楼宋江,陈盈.基于感知哈希的作业相似度检测[J].台州学院学报,2016,38(3):10-14. 被引量：4
5冯雁,陈文林,刘芳.应用感知的研究[J].北京电子科技学院学报,2009,17(4):38-42. 被引量：1
6闫彦.基于流控系统提升应用感知的摸索[J].电子技术与软件工程,2016(8):9-10.
7陆志刚,吴悦文,顾泽宇,吴启德.应用感知的容器资源调度优化方法[J].计算机系统应用,2017,26(3):134-138. 被引量：3
8曾刚,彭楚武,贺蓉,徐成,周辉.测控系统中实现数据安全的实用技术[J].单片机与嵌入式系统应用,2001,1(3):32-34. 被引量：3
9曾刚,彭楚武,贺蓉,徐成,周辉.测控系统中实现数据安全存储的实用技术[J].电测与仪表,2001,38(2):20-22.
10沈志荣,薛巍,舒继武.新型非易失存储研究[J].计算机研究与发展,2014,51(2):445-453. 被引量：14

计算机研究与发展

2015年第S2期

浏览历史

内容加载中请稍等...

面向大数据备份的应用感知并行重删存储系统被引量：2

参考文献23

二级参考文献87

共引文献152

同被引文献16

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

面向大数据备份的应用感知并行重删存储系统 被引量：2

参考文献23

二级参考文献87

共引文献152

同被引文献16

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

面向大数据备份的应用感知并行重删存储系统被引量：2