Modern backup systems exploit data deduplication technology to save stor-age space whereas suffering from the fragmentation problem caused by deduplication.Fragmentation degrades the restore performance because of res...Modern backup systems exploit data deduplication technology to save stor-age space whereas suffering from the fragmentation problem caused by deduplication.Fragmentation degrades the restore performance because of restoring the chunks thatare scattered all over different containers. To improve the restore performance, thestate-of-the-art History Aware Rewriting Algorithm(HAR) is proposed to collect frag-mented chunks in the last backup and rewrite them in the next backup. However, dueto rewriting fragmented chunks in the next backup, HAR fails to eliminate internalfragmentation caused by self-referenced chunks(that exist more than two times in abackup) in the current backup, thus degrading the restore performance. In this paper,we propose Selectively Rewriting Self-Referenced Chunks(SRSC), a scheme that de-signs a buffer to simulate a restore cache, identify internal fragmentation in the cacheand selectively rewrite them. Our experimental results based on two real-world datas-ets show that SRSC improves the restore performance by 45% with an acceptable sac-rifice of the deduplication ratio.展开更多
Since 2005,dozens of geographical observational stations have been established in the Heihe River Basin(HRB),and by now a large amount of meteorological,hydrological,and ecological observations as well as data pertain...Since 2005,dozens of geographical observational stations have been established in the Heihe River Basin(HRB),and by now a large amount of meteorological,hydrological,and ecological observations as well as data pertaining to water resources,soil and vegetation have been collected.To adequately analyze these available data and data to be further collected in future,we present a perspective from complexity theory.The concrete materials covered include a presentation of adaptive multiscale filter,which can readily determine arbitrary trends,maximally reduce noise,and reliably perform fractal and multifractal analysis,and a presentation of scale-dependent Lyapunov exponent(SDLE),which can reliably distinguish deterministic chaos from random processes,determine the error doubling time for prediction,and obtain the defining parameters of the process examined.The adaptive filter is illustrated by applying it to obtain the global warming trend and the Atlantic multidecadal oscillation from sea surface temperature data,and by applying it to some variables collected at the HRB to determine diurnal cycle and fractal properties.The SDLE is illustrated to determine intermittent chaos from river flow data.展开更多
1 Introduction Graph processing has received significant attention for its ability to cope with large-scale and complex unstructured data in the real-world.However,most of the graph processing applications exhibit an ...1 Introduction Graph processing has received significant attention for its ability to cope with large-scale and complex unstructured data in the real-world.However,most of the graph processing applications exhibit an irregular memory access pattern which leads to a poor locality in the memory access stream[1].展开更多
1 Introduction and main contributions As so many machines are clustered in datacenters,failures occur frequently[1,2].Redundancy schemes,like replication and erasure coding,are deployed to store redundant data to tole...1 Introduction and main contributions As so many machines are clustered in datacenters,failures occur frequently[1,2].Redundancy schemes,like replication and erasure coding,are deployed to store redundant data to tolerate failures.For distributed systems with these two redundancy schemes,new data are hot in replication,and usually turn cold as time goes by.An encoder performing the conversion can cause large overheads on network and put data in low reliability when the encoder is not well designed.展开更多
Due to its low latency,byte-addressable,non-volatile,and high density,persistent memory(PM)is expected to be used to design a high-performance storage system.However,PM also has disadvantages such as limited endurance...Due to its low latency,byte-addressable,non-volatile,and high density,persistent memory(PM)is expected to be used to design a high-performance storage system.However,PM also has disadvantages such as limited endurance,thereby proposing challenges to traditional index technologies such as B(+)tree.B(+)tree is originally designed for dynamic random access memory(DRAM)-based or disk-based systems and has a large write amplification problem.The high write amplification is detrimental to a PM-based system.This paper proposes WO-tree,a write-optimized B(+)tree for PM.WO-tree adopts an unordered write mechanism for the leaf nodes,and the unordered write mechanism can reduce a large number of write operations caused by maintaining the entry order in the leaf nodes.When the leaf node is split,WO-tree performs the cache line flushing operation after all write operations are completed,which can reduce frequent data flushing operations.WO-tree adopts a partial logging mechanism and it only writes the log for the leaf node.The inner node recognizes the data inconsistency by the read operation and the data can be recovered using the leaf node information,thereby significantly reducing the logging overhead.Furthermore,WO-tree adopts a lock-free search for inner nodes,which reduces the locking overhead for concurrency operation.We evaluate WO-tree using the Yahoo!Cloud Serving Benchmark(YCSB)workloads.Compared with traditional B(+)tree,wB-tree,and Fast-Fair,the number of cache line flushes caused by WO-tree insertion operations is reduced by 84.7%,22.2%,and 30.8%,respectively,and the execution time is reduced by 84.3%,27.3%,and 44.7%,respectively.展开更多
基金supported in part by ZTE Industry-Academia-Research Cooperation Fundsthe National Natural Science Foundation of China under Grant Nos.61502191,61502190,61602197,and 61772222)+2 种基金Fundamental Research Funds for the Central Universities under Grant Nos.2017KFYXJJ065 and 2016YXMS085the Hubei Provincial Natural Science Foundation of China under Grant Nos.2016CFB226 and2016CFB192Key Laboratory of Information Storage System Ministry of Education of China
文摘Modern backup systems exploit data deduplication technology to save stor-age space whereas suffering from the fragmentation problem caused by deduplication.Fragmentation degrades the restore performance because of restoring the chunks thatare scattered all over different containers. To improve the restore performance, thestate-of-the-art History Aware Rewriting Algorithm(HAR) is proposed to collect frag-mented chunks in the last backup and rewrite them in the next backup. However, dueto rewriting fragmented chunks in the next backup, HAR fails to eliminate internalfragmentation caused by self-referenced chunks(that exist more than two times in abackup) in the current backup, thus degrading the restore performance. In this paper,we propose Selectively Rewriting Self-Referenced Chunks(SRSC), a scheme that de-signs a buffer to simulate a restore cache, identify internal fragmentation in the cacheand selectively rewrite them. Our experimental results based on two real-world datas-ets show that SRSC improves the restore performance by 45% with an acceptable sac-rifice of the deduplication ratio.
基金National Natural Science Foundation of China,No.71661002,No.41671532National Key R&D Program of China,No.2017YFB0504102The Fundamental Research Funds for the Central Universities
文摘Since 2005,dozens of geographical observational stations have been established in the Heihe River Basin(HRB),and by now a large amount of meteorological,hydrological,and ecological observations as well as data pertaining to water resources,soil and vegetation have been collected.To adequately analyze these available data and data to be further collected in future,we present a perspective from complexity theory.The concrete materials covered include a presentation of adaptive multiscale filter,which can readily determine arbitrary trends,maximally reduce noise,and reliably perform fractal and multifractal analysis,and a presentation of scale-dependent Lyapunov exponent(SDLE),which can reliably distinguish deterministic chaos from random processes,determine the error doubling time for prediction,and obtain the defining parameters of the process examined.The adaptive filter is illustrated by applying it to obtain the global warming trend and the Atlantic multidecadal oscillation from sea surface temperature data,and by applying it to some variables collected at the HRB to determine diurnal cycle and fractal properties.The SDLE is illustrated to determine intermittent chaos from river flow data.
基金This work was supported by NSFC(Grant Nos.61772216,82090044,61832020 and 61821003).
文摘1 Introduction Graph processing has received significant attention for its ability to cope with large-scale and complex unstructured data in the real-world.However,most of the graph processing applications exhibit an irregular memory access pattern which leads to a poor locality in the memory access stream[1].
基金supported in part by NSFC(61832020)National Key R&D Program of China(2018YFB10033005)+2 种基金Hubei Province Technical Innovation Special Project(2017AAA129)Wuhan Application Basic Research Project(2017010201010103)Fundamental Research Funds for the Central Universities.
文摘1 Introduction and main contributions As so many machines are clustered in datacenters,failures occur frequently[1,2].Redundancy schemes,like replication and erasure coding,are deployed to store redundant data to tolerate failures.For distributed systems with these two redundancy schemes,new data are hot in replication,and usually turn cold as time goes by.An encoder performing the conversion can cause large overheads on network and put data in low reliability when the encoder is not well designed.
基金supported in part by the National Natural Science Foundation of China under Grant Nos.U1709220,U2001203,61821003,61872413,and 61902137in part by the National Key Research and Development Program of China under Grant No.2018YFB1003305in part by the Key-Area Research and Development Program of Guangdong Province of China under Grant No.2019B010107001.
文摘Due to its low latency,byte-addressable,non-volatile,and high density,persistent memory(PM)is expected to be used to design a high-performance storage system.However,PM also has disadvantages such as limited endurance,thereby proposing challenges to traditional index technologies such as B(+)tree.B(+)tree is originally designed for dynamic random access memory(DRAM)-based or disk-based systems and has a large write amplification problem.The high write amplification is detrimental to a PM-based system.This paper proposes WO-tree,a write-optimized B(+)tree for PM.WO-tree adopts an unordered write mechanism for the leaf nodes,and the unordered write mechanism can reduce a large number of write operations caused by maintaining the entry order in the leaf nodes.When the leaf node is split,WO-tree performs the cache line flushing operation after all write operations are completed,which can reduce frequent data flushing operations.WO-tree adopts a partial logging mechanism and it only writes the log for the leaf node.The inner node recognizes the data inconsistency by the read operation and the data can be recovered using the leaf node information,thereby significantly reducing the logging overhead.Furthermore,WO-tree adopts a lock-free search for inner nodes,which reduces the locking overhead for concurrency operation.We evaluate WO-tree using the Yahoo!Cloud Serving Benchmark(YCSB)workloads.Compared with traditional B(+)tree,wB-tree,and Fast-Fair,the number of cache line flushes caused by WO-tree insertion operations is reduced by 84.7%,22.2%,and 30.8%,respectively,and the execution time is reduced by 84.3%,27.3%,and 44.7%,respectively.