期刊文献+

基于后缀结构进行数据块优化的重复数据删除系统 被引量:1

De-Duplication System Based on Suffix Structure for the Block Size Optimization
下载PDF
导出
摘要 为进一步提高重复数据删除系统的性能,提出基于数据分块的后缀数组SA和最长公共前缀LCP进行数据块优化的重复数据删除系统。系统首先将输入的数据流进行第一次分块,识别出相同的分块并给分块编号,创建分块编号序列的SA和LCP表,识别出最大重复队列和非重复数据块,进一步得出优化的超级块大小,然后以超级块为单元进行第二次数据分块并保存数据压缩结果。实验表明,相比于固定分块,该系统能实现给定输入流较好的压缩性和数据重构性。 To further improve the performance of data de-duplication system,the paper brings out a de-duplication system based on data blocks’ suffix array and longest common prefix for the block size optimization.The method first chunks input data into segments with a first size,then identifies the segments and create the indexes queue for the identifiers,and then create the suffix array and the longest common prefix structure from the indexes,next determines a second optimize size based on repeated indexes sequences and non-repeated indexes,finally chunks the input data into segments based on super chunks and saves the compression result.The result shows that it achieves a better compression ratio and object reconstruction for the given input data comparing the original fixed chunk segment size.
出处 《计算机系统应用》 2010年第11期75-78,70,共5页 Computer Systems & Applications
关键词 重复数据删除 后缀数组 最长公共前缀 块优化 de-duplication suffix array(SA) longest common prefix(LCP) block size optimization
  • 相关文献

参考文献3

二级参考文献18

  • 1Lander E S, Linton L M, Birren B, et al. Initial Sequencing and Analysis of the Human Genome [ J]. Nature, 2001, 409 : 860-921.
  • 2Kurtz S, Choudhuri J V, Ohlebusch E, et al. The Manifold Applications of Repeat Analysis on a Genomic Scale [ J ]. Nucl Acids Res, 2001, 29(22): 4633-4642.
  • 3Delcher A L, Kasif S, Fleischmann R D, et al. Alignment of Whole Genomes [ J ]. Nucl Acids Res, 1999, 27 ( 11 ) : 2369-2376.
  • 4Manzini G. An Analysis of the Burrows-Wheeler Transform [J]. Journal of the ACM, 2001,48(3) : 407-430.
  • 5Zamir O, Etrioni O. A Dynamic Clustering Interface to Web Search Results [J]. Computer Networks, 1999, 31(11/16) : 1361-1374.
  • 6Franek F, Smyth W F, TANG Yu-dong. Computing All Repeats Using Suffix Arrays [ J ]. Journal of Automata, Languages & Combinatorics, 2003, 8(4): 579-591.
  • 7btcCreight E M. A Space-economical Suffix Tree Construction Algorithm [ J]. Assoc Comput Math, 1976, 23(2): 262-272.
  • 8Crochmore M. An Optimal Algorithm for Computing the Repetition in a Word [J]. IPL, 1981, 12(5) : 244-250.
  • 9Main M G, Lorentz R J. An O ( nlog n) Algorithm for Finding M1 Repetitions in a String [ J ]. Algs, 1984, 5 (3) : 422-432.
  • 10Abouelhoda M I, Kurtz S, Ohlebusch E. Replacing Suffix Trees with Enhanced Suffix Arrays [ J ]. Journal of Discrete Algs, 2004, 2(1) : 53-86.

同被引文献7

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部