摘要
针对现有重复数据消除方法中提高压缩比和降低元数据开销之间的矛盾,提出了一种基于预分块和滑动窗口的重复数据消除方法并建立了性能分析通用模型.该方法首先对数据对象进行基于内容的预分块,再对数据变动区域和非变动区域采用不同的分块策略,从而在分块大小预期值较大时,仍能获得较高的压缩比并降低了元数据开销.真实数据集上的实验结果表明,该方法的平均压缩比高于现有最优值,而平均时间开销显著降低.
To address the contradiction between improving compression ratio and reducing metadata cost, a deduplication method based on pre-chunking and sliding window is proposed. A universal performance-analyzing model is also given. In this method, the data objects are pre-chunked based on content, then different chunking strategies are used on the data changing regions and the non-changing regions respectively. A satisfying compression ratio and lower metadata cost can be achieved with a relatively larger expected chunk size. The experimental results on real data show that the average compression ratio of the method is higher than the current optimal value, and the average time cost is reduced significantly.
出处
《控制与决策》
EI
CSCD
北大核心
2012年第8期1157-1162,1168,共7页
Control and Decision
基金
国家自然科学基金项目(60873075
60973118)
教育部培育基金项目(708078)
关键词
重复数据消除
数据压缩
滑动窗口
内容分块
deduplication
data compression
sliding window
content defined chunking