期刊文献+

一种基于Bloom Filter的频繁模式挖掘算法

An Algorithm of Mining Frequent Itemsets Based on Bloom Filter
原文传递
导出
摘要 挖掘最大频繁项集是多种数据挖掘应用中的关键问题.针对频繁模式挖掘的可扩展性问题,基于Bloom Filter的相关理论,提出了一种"挖掘最频繁的K个元素"的Mining Top-K算法.该算法基于推广的Bloom Filter的数据结构,能够较为准确地筛选出数据流中出现最频繁的K个元素,并估算这K个元素的出现频数.实验结果表明:该方法在具有低空间复杂度特性的同时又不会失去准确性. Mining maximum frequent itemsets is a key problem in data mining. Aiming at solving the scalable problem for mining frequent itemsets, based on the theory of the Bloom Filter, an algorithm called Mining Top-K is proposed. It can not only mine the K-most frequent elements, but also circumvent the scalable problem of mining frequent itemsets. Especially, with the application of the extended Bloom Filter, the algorithm finding the K-most elements can compute the frequency of the K-most frequent elements. Experiments demonstrate that the algorithm can achieve space saving without sacrificing accuracy.
作者 林海
出处 《数学的实践与认识》 CSCD 北大核心 2009年第3期172-177,共6页 Mathematics in Practice and Theory
关键词 数据挖掘 频繁模式 TOP-K BLOOM FILTER data mining frequent itemsets Top-k bloom filter
  • 相关文献

参考文献10

  • 1Manku G, Motwani R. Approximate Frequency Counts over Data Streams [C]. In Proceedings of the 28st international conference on Very large data bases(VLDB), 2002. 346-35.
  • 2Charikar M, Chen K, Farach-Colton M. Finding Frequent Items in Data Streams[C], In Proceeding of the 29th International Colloquium on Automata, Language and Programming(ICALP), 2002. 693-703.
  • 3Metwally A, Agrawal D, A E1 Abbadi. Efficient Computation of Frequent and Top-k Elements in Data Streams[C], In Proceeding of the 10th International Conference on Database Theory(ICDT),2005. 398-412.
  • 4Metwalty A, Agrawal D, A E1 Abbadi. Using Association Rules for Fraud Detection in Web Advertising Networks[C], In Proceedings of the 31st international conference on Very Large Data Bases(VLDB), 2005. 169- 180.
  • 5Bloom B. Space/Time Trade-offs in Hash Coding with Allowable Errors[C], Communication of the ACM,1970, 13(7):422-426.
  • 6Metwally A, Agrawal D, A El Abbadi. Duplicate Detection in Click Streams[C], In Proceedings of the 14th international conference on World Wide Web(WWW),2005.12-21.
  • 7Cormen T H, Leiserson C E, Rivest R L. Introduction to Algorithms[M]. The MIT Press,1990. 229 231.
  • 8Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows[J].SIAM Journal on Computing, 2002,31 (6) : 1794 -1813.
  • 9Demain E, Lopez Ortiz A, Munro J. Frequency Estimation of Internet Packet Streams with Limited Spaces [C], In Proceeding of the 10th Annual European Symposium on Algorithms(ESA),2002. 348-360.
  • 10Gibbons P, Matias Y. New Sampling-Based Summary Statistics for Improving Approximate Query Answers[C], In ACM SIGMOD Proceeding of International Conference on Management of Data, 1998. 331-342.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部