
Bloom Filter在重复数据删除技术中应用的研究

Research on Application of Bloom Filter in Data Deduplication
摘要 为了缓解存储系统中因为重复数据索引而引起的存储设备访问过于频繁的问题,深入研究重复数据删除技术,并针对目前重复数据删除技术中Bloom Filter的运用以及存在的存储设备访问性能问题进行分析和研究,提出一种基于Bloom Filter的高效去重优化模式。针对单一Bloom Filter固有的假阳性的缺陷,增加辅助Bloom Filter,从而减小误判率,达到减少存储设备访问次数的目的;针对因系统软件错误引起的Bloom Filter假阴性错误,引入单校验位的错误校验机制可以实现避免假阴性值存储的同时又能减小内存存储开销。仿真实验结果表明:改进方法能够兼顾Bloom Filter的误判率与存储设备访问开销问题。通过引入一种判断机制配合辅助Bloom Filter和单校验位机制,能够达到误判率降低、存储设备访问开销减小的高性能优化效果。 In order to alleviate the problem of the frequent access to storage device which caused by the indexes using in data deduplica- tion, data deduplication is deeply studied, making analysis and research on the application of Bloom Filter at the present situation of data deduplication and existing problems of the access of storage system performance and proposing a high-efficiency and optimal model based on Bloom Filter. Aiming at the situation that the probability of false positives is in the nature of Bloom Filters, an additional Bloom Filter is used to reduce false positive rate, achieving the purpose of reducing times of the access for storage system. In view of the situation that the system software errors may bring Bloom Filter false negative,single bit error checking mechanism is introduced to prevent it from happening,at the same time,it can reduce memory overhead. The simulation shows that the proposed method can balance the false posi- tive rate and the access of storage system costs. By introducing a judgment mechanism with complement Bloom Filter and single bit error checking mechanism, it can achieve the effects of the reducing of false positive rate and the access of storage system costs.
出处 《计算机技术与发展》 2016年第8期182-186,190,共6页 Computer Technology and Development
基金 国家自然科学基金资助项目(11501302)
关键词 BLOOM FILTER 假阳性 假阴性 单位校验 访问开销 Bloom Filter false positive false negative single bit error checking access costs
  • 相关文献


  • 1谢平.存储系统重复数据删除技术研究综述[J].计算机科学,2014,41(1):22-30. 被引量:26
  • 2黄慧群,何敏,兰巨龙.基于布鲁姆过滤器的未决兴趣表查找方法研究[J].信息工程大学学报,2015,16(1):84-89. 被引量:1
  • 3张星煜,张建,辛明军.相似性—局部性方法相关参数分析[J].计算机技术与发展,2014,24(11):47-50. 被引量:3
  • 4Labrinidis A ,Jagadish H V. Challenges and opportunities with big data [ J ]. Proceedings of the VLDB Endowment, 2012,5 (12) :2032-2033.
  • 5Gong Q, Yang T, Tong H, et al. Reducing the number of Bloom filters [ C ]//Proc of international conference on progress in in- formatics and computing. [ s. l. ] :IEEE ,2014.
  • 6Guo D, Wu J, Chen H, et al. The dynamic Bloom filters [ J ]. IEEE Transactions on Knowledge & Data Engineering,2010, 22(1) :120-133.
  • 7Tang S,Jin A ,Wang Y. A comment on "fast bloom filters and their generalization" [ J ]. Molecular Medicine, 2016, 13 ( 3 - 4) :303-304.
  • 8Saravanan K, Senthilkumar A, Chacko P. Modified whirlpoolhash based bloom filter for networking and security applica- tions[ C]//Proc of 2nd international conference on devices, cireuits and systems. [ s. l. ] :IEEE,2014.
  • 9Xia W, Jiang H, Feng D, et al. Similarity and locality based in- dexing for high performance data deduplication [ J ]. IEEE Transactions on Computers,2015,64 (4) : 1162-1176.
  • 10Kim M, Oh K H, Youn H Y, et al. Enhanced dual Bloom filter based on SSD for efficient directory parsing in cloud storage system[ C]//Proc of international conference on computing, networking and communications. [ s. l. ] : IEEE, 2015 : 413 - 417.


  • 1李超,周晓阳,王树鹏,云晓春.基于二级索引的重复数据删除系统中性能相关参数的量化分析与研究[J].计算机研究与发展,2012,49(S2):173-177. 被引量:3
  • 2陆游游,敖莉,舒继武.一种基于重复数据删除的备份系统[J].计算机研究与发展,2012,49(S1):206-210. 被引量:5
  • 3贾志凯,王树鹏,陈光达,彭成.一种并行层次化的重复数据删除技术[J].计算机研究与发展,2011,48(S1):100-104. 被引量:3
  • 4Gartner:IT数据量平均增长40%至60%[EB/OL].http://www.199it.com/archives/16863.html,2011-10-13/2012-06-05.
  • 5Greenan K M, Long D D E, et al. A spin-up save- d is energy earned: achieving power-efficient, erasurecoded storage [ A]// Proceedings of the 4th Conference on Hot Topics in System De- pendability[C]. Berkeley: USENIX, 2008 : 4-4.
  • 6郭平.消除冗余解放容量[EB/OL].http://www2.CCW.com.cn/07/0710/c/0710c24_4.html,2007-03-19/2012-06-07.
  • 7MeKnight J, Asaro T, et al. Digital archiving: end-user survey and market forecast 2006-2010 [EB/OL]. http://www, esg- global, eom/researeh-reports/digital-arehiving-end-user-survey- market-forecast-2006-2010/, 2006-03-15/2012-06-07.
  • 8Lessfs, Open source data deduplieation[EB/OL], http://www. lessfs, com/wordpress/, 2009-03-25/2012-07-05.
  • 9OpenDedup: Deduplication with OpenDedup [EB/OL]. http:// www. tuxlanding, net/deduplication- with-opendedup/, 2011-07- 13/2012-05-05.
  • 10FUSE: File systems using FUSE[EB/OL]. http://fuse, source- forge, net/, 2012-08-23/2012-08-25.









使用帮助 返回顶部