期刊文献+

基于聚类的重复数据去冗算法的研究 被引量:4

Research on Deduplication Algorithm Based on K-medoids Clustering
下载PDF
导出
摘要 数据的损坏和丢失会带来无法弥补的损失,数据备份系统可以将损失降到最低程度。随着收集的数据量的迅速增加,备份系统需要备份与恢复的数据也迅速增加,然而备份文件之间的相似度超过60%,全部存储在硬盘上十分浪费存储空间,故提出了一种基于K-medoids聚类的DELTA压缩方法,用来去除备份数据中的重复数据。该方法首先对文件进行切割分块,通过对文件块进行两两DELTA压缩,得出各自压缩文件的大小,作为两个文件块之间的相似度。通过得到的相似度进行K-medoids聚类,作为DELTA压缩前的预处理步骤。然后根据K-medoids的聚类结果,合并小文件块之后再进行DELTA压缩。测试结果表明,该方法提高了压缩率,并减少了DELTA压缩中查找指纹的次数,降低了压缩时间。 Data damage and loss will lead the irreparable losses which can be minimized by data backup system. With the increasing amountof data collection,data backup system has to deal with more and more data of backup and recovery,but the similarity between the variousbackup files is more than 60% so that all the data stored in the hard disk will be a waste of storage space. For this,we propose a DELTAcompression method based on K-medoids clustering to remove duplicate data from the backup data. It firstly segments and blocks the files,and then obtains the size of each compression file by means of DELTA compression between the two blocks as the similarity of them. K-medoids clustering is performed by the similarity obtained as preprocessing steps before DELTA compression. According to the K-medoidsclustering,we merge the small similar file blocks before DELTA compression. The tests show that the proposed method can improve thecompression rate,reduce the number of fingerprints in DELTA compression and shorten the compression time.
出处 《计算机技术与发展》 2018年第2期125-129,共5页 Computer Technology and Development
基金 国家电网公司总部科技项目(0711-150TL173)
关键词 DELTA压缩 数据压缩 聚类 K-medoids DELTA compression data compression clustering K-medoids
  • 相关文献

参考文献10

二级参考文献110

  • 1付印金,肖侬,刘芳,鲍先强.基于重复数据删除的虚拟桌面存储优化技术[J].计算机研究与发展,2012,49(S1):125-130. 被引量:12
  • 2谢长生,黄建忠,刘朝斌.堆叠式文件系统的研究及其在NAS整合中的实现[J].小型微型计算机系统,2005,26(3):515-518. 被引量:4
  • 3吕利娟,李静.霍夫曼算法在降低WSN系统功耗中的应用研究[J].电脑知识与技术,2007(5):735-735. 被引量:2
  • 4Menezes A J.应用密码学手册[M].北京:电子工业出版社,2005.
  • 5Mazieres D, Kaminsky M, Kaashoek M F, et al. Separating Key Management from File System Security[C]//Proc. of the 17th ACM Symposium on Operating Systems Principles. Kiawah Island Resort, SC, USA: [s. n.], 1999.
  • 6Cattaneo G, Catuogno L, Sorbo A D, et al. The Design and Implementation of a Transparent Cryptographic Filesystem for UNIX[C]//Proc. of the Annual USENIX Technical Conference. Boston, Massachusetts, USA: [s. n.], 2001.
  • 7Wright C R Martino M C, Zadok E. NCryptfs: A Secure and Convenient Cryptographic File System[C]//Proc. of General Track of the USENIX 2003 Annual Technical Conference. San Antonio, Texas, USA: [s. n.], 2003.
  • 8Backes M, Cachin C, Oprea A. Lazy Revocation in Cryptographic File Systems[C]//Proc. of SISW'05. San Francisco, USA: [s. n.], 2005.
  • 9MEISTER D,BRINKMANN A.Multi-level comparison of data deduplication in a backup scenario[C] // Proceedings of SYSTOR 2009:The Israeli Experimental Systems Conference.New York:ACM,2009:623-629.
  • 10QUINLAN S,DORWARD S.Venti:A new approach to archival storage[C] // Proceedings of the 1st USENIX Conference on File and Storage Technologies.Berkeley:USENIX Association,2002:89-101.

共引文献130

同被引文献39

引证文献4

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部