Using multi-threads to hide deduplication I/O latency with low synchronization overhead 被引量：1

Using multi-threads to hide deduplication I/O latency with low synchronization overhead

下载PDF

导出

摘要 Data deduplication,as a compression method,has been widely used in most backup systems to improve bandwidth and space efficiency.As data exploded to be backed up,two main challenges in data deduplication are the CPU-intensive chunking and hashing works and the I/O intensive disk-index access latency.However,CPU-intensive works have been vastly parallelized and speeded up by multi-core and many-core processors;the I/O latency is likely becoming the bottleneck in data deduplication.To alleviate the challenge of I/O latency in multi-core systems,multi-threaded deduplication (Multi-Dedup) architecture was proposed.The main idea of Multi-Dedup was using parallel deduplication threads to hide the I/O latency.A prefix based concurrent index was designed to maintain the internal consistency of the deduplication index with low synchronization overhead.On the other hand,a collisionless cache array was also designed to preserve locality and similarity within the parallel threads.In various real-world datasets experiments,Multi-Dedup achieves 3-5 times performance improvements incorporating with locality-based ChunkStash and local-similarity based SiLo methods.In addition,Multi-Dedup has dramatically decreased the synchronization overhead and achieves 1.5-2 times performance improvements comparing to traditional lock-based synchronization methods. Data deduplication, as a compression method, has been widely used in most backup systems to improve bandwidth and space efficiency. As data exploded to be backed up, two main challenges in data deduplication are the CPU-intensive chunking and hashing works and the I/0 intensive disk-index access latency. However, CPU-intensive works have been vastly parallelized and speeded up by multi-core and many-core processors; the I/0 latency is likely becoming the bottleneck in data deduplication. To alleviate the challenge of I/0 latency in multi-core systems, multi-threaded deduplication （Multi-Dedup） architecture was proposed. The main idea of Multi-Dedup was using parallel deduplication threads to hide the I/0 latency. A prefix based concurrent index was designed to maintain the internal consistency of the deduplication index with low synchronization overhead. On the other hand, a collisionless cache array was also designed to preserve locality and similarity within the parallel threads. In various real-world datasets experiments, Multi-Dedup achieves 3-5 times performance improvements incorporating with locality-based ChunkStash and local-similarity based SiLo methods. In addition, Multi-Dedup has dramatically decreased the synchronization overhead and achieves 1.5-2 times performance improvements comparing to traditional lock-based synchronization methods.

作者朱锐秦磊华周敬利郑寰

机构地区 School of Computer Wuhan National Lab for Optoelectronics

出处《Journal of Central South University》 SCIE EI CAS 2013年第6期1582-1591,共10页 中南大学学报（英文版）

基金 Project(IRT0725)supported by the Changjiang Innovative Group of Ministry of Education,China

关键词数据删除访问延迟同步方法多线程隐藏多核心处理器备份系统性能改进 multi-thread multi-core parallel data deduplication

分类号 TP309.3 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献25

1BIGGAR H. Experiencing data de-duplication: Improving efficiency and reducing capacity requirements [R] MA: The Enterprise Strategy Group, 2007.
2RHEA S, COX R, PESTEREV A. Fast, inexpensive content- addressed storage in foundation [C]// USENIX 2008.ANNUAL Technical Conference (USENIX ATC '08). Boston: USENIX Association, 2008: 143-156.
3POLICRONIADES C, PRATT I. Alternatives for detecting redundancy in storage systems data [C]// Proceedings of the 2004 USENIX Annual Technical Conference (USENIX ATC'04). Boston: USENIX Association, 2004: 73-86.
4MANBER U. Finding similar files in a large file system [C]// Proceedings of the USENIX Winter 1994 Technical Conference. San Fransisco: USENIX Association, 1994:17-21.
5ZIV J, LEMPEL A. A universal algorithm for sequential data compression [J]. IEEE Transactions on Information Theory, 1997, 23(3): 337-343.
6BIGGAR H. Experiencing data de-duplication: Improving efficiency and reducing capacity requirements [R]. MA: The Enterprise Strategy Group, 2007.
7ZHU B, LI Kai, PATTERSON H. Avoiding the disk bottleneck in the data domain deduplication file system [C]// Proceedings of the 6tb USENIX Conference on File and Storage Technologies (FAST '08). San Jose: USENIX Association, 2008:1-14.
8BHAGWAT D, ESHGHI K, LONG D DE, LILLIBRIDGE M. Extreme binning: Scalable, parallel deduplication for chunk-based file backup [C]// Proceedings of 17th IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS'09). London: IEEE Press, 2009: 1-9.
9XIA Wen, JIANG Hong, FENG Dan, HUA Yu. SiLo: A similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput [C]// Proceedings of the 2011 conference on USENIX Annual Technical conference (USENIX ATC' 11). Portland: USENIX Association, 2011 : 26-38.
10DEBNATH B, SENGUPTA S, LI Jin. ChunkStash: Speeding up inline storage deduplication using flash memory [C]// Proceedings of the 2010 conference on USENIX Annual Technical Conference (USEN1X ATC' 10). Boston: USENIX Association, 2010: 16-16.

二级参考文献11

1[1]Amoura A K, Bampis E, Kenyon C, et al. Scheduling independent multiprocessor tasks[A]. In: Burkard R, Woeginger G J eds. Proc 5th Ann European Symposium on Algorithms. Lecture Notes in Computer Science[C]. Berlin: Springer-Verlag Press, 1997. 1-12.
2[2]Chen J, Miranda A. A polynomial time approximation scheme for general multiprocessor job scheduling[J]. SIAM(Society for Industrial and Applied Mathematics) J Computer, 2001, 31: 1-7.
3[3]Jansen K, Porkolab L. General multiprocessor task scheduling: Approximate solutions in linear time[A]. Dehne F, Gupta A, Sack J, eds. 1999 Workshop on Algorithms and Data Structures, Lecture Notes in Computer Science[C]. Berlin: Springer-Verlag Press, 1999, 1663: 110-121.
4[4]Hoogeveen J A, Van de Velde S L, Veltman B. Complexity of scheduling multiprocessor tasks with prespecified processor allocations[J]. Discrete Applied Mathematics, 1994, 55: 259-272.
5[5]Blazewicz J, Dell ′Olmo P, Drozdowski M, et al. Scheduling multiprocessor tasks on the three dedicated processors[J]. Information Processing Letters, 1992, 41: 275-280.
6[6]Blazewicz J, Drozdowski M, Weglarz J. Scheduling multiprocessor tasks to minimize scheduling length[J]. IEEE Transaction on Computers, 1986, 35: 389-393.
7[7]Garey M R, Johnson D S. Computers and intractability: A guide to the theory of NP-completeness[M]. San Francisco: Freeman Press, 1979.
8[8]Dell′Olmo P, Speranza M G, Tuza Z S. Efficiency and effectiveness of normal schedules on three dedicated processors[J]. Discrete Mathematics, 1997, 164: 67-79.
9[9]Huang J, Chen J, Chen S. A simple linear time approximation algorithm for multi-processor job scheduling on four processors[A]. In: Hsu Tsan Sheng ed. 12th Annual International Symposium on Algorithms and Computation. Lecture Notes in Computer Science[C]. Berlin: Springer-Verlag Press, 2000, 1969: 60-71.
10[10]Goemans M X. An approximation algorithm for scheduling on three dedicated machines[J]. Discrete Applied Mathe-matics, 1995, 61: 49-59.

同被引文献18

1Nguyen Q T,Tuong N H.Virtual Machine Allocation in Cloud Computing for Minimizing Total Execution Time on Each Machine[C]//Proceedings of International Conference on Computing,Management and Telecommunications.Ho Chi Minh City,Vietnam:[s.n.],2013:241-245.
2Thomas B.Virtual Machines and Market Share Through2012[EB/OL].(2012-11-20).http://citeseerx.ist.psu.edu/showciting?cid=19262285.
3Thomas B.Q&A:Six Misconceptions about Server Virtualization Gartner[EB/OL].(2010-07-11).http://citeseerx.ist.psu.edu/showciting?cid=19262286.
4Nejad M M,Mashayekhy L,Grosu D.A Family of Truthful Greedy Mechanisms for Dynamic Virtual Machine Provisioning and Allocation in Clouds[C]//Proceedings of the6th IEEE International Conference on Cloud Computing.[S.l.]:IEEE Press,2013:188-195.
5Jin K,Miller E L.The Effectiveness of Deduplication on Virtual Machine Disk Images[C]//Proceedings of SYSTOR’09.Haifa,Israel:[s.n.],2009:57-69.
6Liguori A,Hensbergen E.Experiences with Content Addressable Storage and Virtual Disks[C]//Proceedings of WIOV’08.San Diego,USA:USENIX Association,2008:129-137.
7Meister D,Brinkmann A.File Recipe Compression in Data Deduplication Systems[C]//Proceedings of the11th USENIX Conference on File and Storage Technologies.San Jose,USA:USENIX Association,2013:175-182.
8Ng C H,Ma M,Wong T Y,et al.Live Deduplication Storage of Virtual Machine Images in an Open-source Cloud[C]//Proceedings of the12th International Middleware Conference.[S.l.]:USENIX Association,2011:80-99.
9候海翔.虚拟桌面环境下数据去冗余系统的设计与实现[D].武汉:华中科技大学,2011.
10Wright C P,Zadok E.Unionfs:Bringing Filesystems Together[EB/OL].(2004-11-28).http://www.linuxjournal.com/article/7714.

引证文献1

1郭婉,张晓,丰文雄.基于堆叠文件系统的云计算存储优化[J].计算机工程,2014,40(12):19-25. 被引量：2

二级引证文献2

1黄永洪,刘宇超,凌国玮,黄子彧.基于私有链的存证云盘系统方案设计[J].信息技术与标准化,2018(7):16-20. 被引量：3
2杨东日,陈跃,刘姝祎.面向健康大数据快速读写的存储系统设计[J].计算机工程与设计,2018,39(10):3063-3067. 被引量：3

1贾志凯,王树鹏,陈光达,彭成.一种并行层次化的重复数据删除技术[J].计算机研究与发展,2011,48(S1):100-104. 被引量：3
2邢玉轩,肖侬,刘芳,孙振,何晚辉.AR-Dedupe: An Efficient Deduplication Approach for Cluster Deduplication System[J].Journal of Shanghai Jiaotong university(Science),2015,20(1):76-81. 被引量：2
3重复数据删除利器[J].网管员世界,2010(5):15-15.
4为民.分析:采用重复数据删除技术十项注意[J].网络与信息,2012,26(3):63-63.
5刘志勇.备份Outlook Express的利器—Outlook Express Backup System[J].移动信息．新网络,2002(1):32-33.
6Yuehua DAI Yi SHI Yong QI Jianbao REN Peijian WANG.Design and verification of a lightweight reliable virtual machine monitor for a many-core architecture[J].Frontiers of Computer Science,2013,7(1):34-43. 被引量：4
7高剑刚.众核处理器研究现状及关键技术[J].高性能计算技术,2013,0(3):1-8.
8Bin LIN Shanshan LI Xiangke LIAO Jing ZHANG Xiaodong LIU.Leach： an automatic learning cache for inline primary deduplication system[J].Frontiers of Computer Science,2014,8(2):175-183. 被引量：2
9Bing Zhou,Jiang-Tao Wen.A Data Deduplication Framework of Disk Images with Adaptive Block Skipping[J].Journal of Computer Science & Technology,2016,31(4):820-835.
10申彦,朱玉全.CMP上基于数据集划分的K-means多核优化算法[J].智能系统学报,2015,10(4):607-614. 被引量：4

Journal of Central South University

2013年第6期

浏览历史

内容加载中请稍等...