期刊文献+

基于移动指针的数据流冗余消除算法 被引量:6

Duplicate elimination algorithm for data streams with SKIP Bloom filter
下载PDF
导出
摘要 针对数据流的动态特性,提出了一种基于移动指针的数据流冗余消除算法—SKIP Bloom filter,其核心思想是通过动态指针和双Bloom filter来区分历史数据映射与当前数据映射,从而有效提升了算法的性能和准确度。理论证明,它具有O(n)的时间复杂度与O(1-(1-1/(2 m))w-k)k的假阳性误判率。实验结果表明,算法在实际网络环境中与已有算法相比,准确度提高了2-12倍。 According to the dynamic characteristics of data streams, a duplicate elimination algorithm was proposed with low time complexity and high accuracy based on SKIP Bloom filter. A moving cursor and double Bloom filter were used to differentiate history data and current data mapping. Theoretically, it proves that the algorithm has the time complexity of O(n) and the false positive rate of O (1- (1-1/(2m))w.k)k . The experiment shows that the new SKIP Bloom filter im- proves the accuracy of 2-12 times in real networks compared with other existing algorithm.
出处 《通信学报》 EI CSCD 北大核心 2012年第2期7-14,共8页 Journal on Communications
基金 国家自然科学基金资助项目(61073055) 科技部国家科技支撑计划基金资助项目(2012BAH01B03)~~
关键词 数据流 冗余消除 BLOOM FILTER 散列函数 data streams duplicate elimination Bloom filter hash function
  • 相关文献

参考文献24

  • 1BABCOCK B. BABU S. DATAR M. Models and issues in data streams[A]. Proc of the 21st ACMSIGACT-SIGMOD-SIGART Syrup on Principles of Database Systems[C]. Madison: ACM Press, 2002. 1-16.
  • 2金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 3ANAND A, MUTHUKRISHNAN C, AKELLA A. Redundancy in network traffic: findings and implications[A]. Proc of SIGMETRICS[C]. Seattle, WA, USA, 2009.37-48.
  • 4WANG X, ZHANG Q, JIA Y. Efficiently filtering duplicates over distributed data streams[A]. International Conference on Computer Science and Software Engineering (CSSE)[C]. 2008.631-634.
  • 5E DENG; D. RAFIEI. Approximately detecting duplicates for streaming data using stable bloom filters[A]. Proc 2006 ACM SIGMOD Intemational Conference on Management of Data (SIGMOD) [C]. Chi-cago, Illinois, USA, 2006.25-36.
  • 6METWALLY D, AGRAWAL A E. ABBADI. Duplicate detection in click streams[A]. Proc 14th International World Wide Web Conference (WWW)[C]. Chiba, Japan, 2005.12-21.
  • 7V. GARG~ A. NARANG~ S. BHATFACHERJEE. Real-time memory efficient data redundancy removal algorithm[A]. CIKM[C]. 2010. 1259-1268.
  • 8SEKAR V, REiTER M K, WILLINGER W. CSAMP: a system for network-wide flow monitoring[A]. Proceedings of the 5th USENIX Sympo-sium on Networked Systems Design and Implementation. Berkeley[C]. CA, USA: USENIX Association, 2008. 233-246.
  • 9SHARMA M, BYERS J. Scalable coordination techniques for distributed network monitoring[A]. Passive and Active Measurement Conference[C]. 2005.349-352.
  • 10COPPENS J, MARKATOS E, NOVOTNY J. Scampi-a scalable monitoring platform for the internet[A]. International Workshop on Inter-Domain Performance and Simulation(IPS)[C]. Budapest, Hungary, 2004.

二级参考文献74

  • 1Bilenko M, Mooney R J. Adaptive duplicate detection using learnable string similarity measures. In Proc. ACM SIGKDD Conf., Washington DC, USA, August 2003, pp.39-48.
  • 2Weis M, Naumann F. Dogmatix tracks down duplicates in XML. In Proc. ACM SIGMOD Conf., Baltimore, Maryland, USA, June 2005, pp.431-442.
  • 3Ananthakrishna R, Chaudhuri S, Ganti V. Eliminating fuzzy duplicates in data warehouses. In Proc. 28th Int. Conf. Very Large Data Bases, Hong Kong, China, 2002, pp.586-597.
  • 4Chaudhuri S, Ganti V, Motwani R. Robust identification of fuzzy duplicates. In Proc. 21st Int. Conf. Data Engineering (ICDE), Tokyo, Japan, 2005, pp.865-876.
  • 5Bloom B H. Space/time trade-offs in hash coding with allow-able errors. Communications of the ACM, July 1970, 13(7): 422-426.
  • 6Fan L, Cao P, Almeida J, Broder A Z. Summary cache: A scalable wide-area Web cache sharing protocol. IEEE/ACM Trans. Networking, 2000, 8(3): 281-293.
  • 7Mitzenmacher M. Compressed Bloom filters. In Proc. 20th ACM Symposium on Principles of Distributed Computing Netw. Rhode Island, 2001, pp.144-150.
  • 8Cohen S, Matias Y. Spectral Bloom filters. In Proc. ACM SIGMOD Conf., California, USA, June 2003, pp.241-252.
  • 9Whang K, Zenden B T V, Taylor H M. A linear-time prob-abilistic counting algorithm for database applications. ACM Trans. Database Syst., 1990, 15(2): 208-299.
  • 10Cheng K, Xiang L, Iwaihara M. Time-decaying Bloom filters for data streams with skewed distributions. In Proc. 15th Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA), Tokyo, Japan, 2005, pp.63-69.

共引文献162

同被引文献43

  • 1谢鲲,闵应骅,张大方,谢高岗,文吉刚.分档布鲁姆过滤器的查询算法[J].计算机学报,2007,30(4):597-607. 被引量:14
  • 2Anand A,Gupta A,Akella A,et al.Packet caches on routers:the implications of universal redundant traffic elimination[J].ACMSIGCOMM Computer Communication Review,2008,38(4).
  • 3SPRING N T,WETHERALL D.A protocol-independent technique for eliminating redundant network traffic[J]. ACM SIC-COMM Computer Communication Review,2000, 30(4):87-95.
  • 4Anand A,Sekar V,Akella A.SmartRE:An architecture fo coordinated network-wide redundancy elimination[C]. New York,NY,USA.Proceedings of the ACM SIGCOMM conference on Data communication,2009:87-98.
  • 5Mullin J K. Optimal semijoins for distributed database systems[J]. IEEE Trans. Software Eng, 1990, 16(5): 558-560.
  • 6Li J, Taylor J, Serban L, Seltzer M. Self-Organization in Peer-to-Peer systemiC]// Proc ACM SIGOPS, Sept, 2002.
  • 7Cuena-Acuna F M, Peery C, Martin R P, Nguyen T D. PlantP: using gossiping to build content addressable peer-to-peer information sharing communities[C]//Proc 12th IEEE Intymp. High performance Distributed Computing, 2003: 236-249.
  • 8Rhea S C, Kubiatowicz J. Probabilistic location and routing[C]//Proc of INFOCOM2002 NewYorK:IEEEComputer Soeiety, 2002: 1248-1257.
  • 9Whitaker A, Wetherall D. Forwarding without loops in Icarus[C]//Proc of open Architectures and Network Programming, NewYork: IEEE Computer Society, 2002: 63-75.
  • 10Peter C D, Panagiotis M. Bloom filters in probabilistie verification[C]// Proc Fifth Int'l Conf. Formal Methods in Computer-Aided Design, 2004: 367-381.

引证文献6

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部