一种基于Bloom Filter的频繁模式挖掘算法

An Algorithm of Mining Frequent Itemsets Based on Bloom Filter

导出

摘要挖掘最大频繁项集是多种数据挖掘应用中的关键问题.针对频繁模式挖掘的可扩展性问题,基于Bloom Filter的相关理论,提出了一种"挖掘最频繁的K个元素"的Mining Top-K算法.该算法基于推广的Bloom Filter的数据结构,能够较为准确地筛选出数据流中出现最频繁的K个元素,并估算这K个元素的出现频数.实验结果表明:该方法在具有低空间复杂度特性的同时又不会失去准确性. Mining maximum frequent itemsets is a key problem in data mining. Aiming at solving the scalable problem for mining frequent itemsets, based on the theory of the Bloom Filter, an algorithm called Mining Top-K is proposed. It can not only mine the K-most frequent elements, but also circumvent the scalable problem of mining frequent itemsets. Especially, with the application of the extended Bloom Filter, the algorithm finding the K-most elements can compute the frequency of the K-most frequent elements. Experiments demonstrate that the algorithm can achieve space saving without sacrificing accuracy.

作者林海

机构地区河南省体育运动学校

出处《数学的实践与认识》 CSCD 北大核心 2009年第3期172-177,共6页 Mathematics in Practice and Theory

关键词数据挖掘频繁模式 TOP-K BLOOM FILTER data mining frequent itemsets Top-k bloom filter

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献10

1Manku G, Motwani R. Approximate Frequency Counts over Data Streams [C]. In Proceedings of the 28st international conference on Very large data bases(VLDB), 2002. 346-35.
2Charikar M, Chen K, Farach-Colton M. Finding Frequent Items in Data Streams[C], In Proceeding of the 29th International Colloquium on Automata, Language and Programming(ICALP), 2002. 693-703.
3Metwally A, Agrawal D, A E1 Abbadi. Efficient Computation of Frequent and Top-k Elements in Data Streams[C], In Proceeding of the 10th International Conference on Database Theory(ICDT),2005. 398-412.
4Metwalty A, Agrawal D, A E1 Abbadi. Using Association Rules for Fraud Detection in Web Advertising Networks[C], In Proceedings of the 31st international conference on Very Large Data Bases(VLDB), 2005. 169- 180.
5Bloom B. Space/Time Trade-offs in Hash Coding with Allowable Errors[C], Communication of the ACM,1970, 13(7):422-426.
6Metwally A, Agrawal D, A El Abbadi. Duplicate Detection in Click Streams[C], In Proceedings of the 14th international conference on World Wide Web(WWW),2005.12-21.
7Cormen T H, Leiserson C E, Rivest R L. Introduction to Algorithms[M]. The MIT Press,1990. 229 231.
8Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows[J].SIAM Journal on Computing, 2002,31 (6) : 1794 -1813.
9Demain E, Lopez Ortiz A, Munro J. Frequency Estimation of Internet Packet Streams with Limited Spaces [C], In Proceeding of the 10th Annual European Symposium on Algorithms(ESA),2002. 348-360.
10Gibbons P, Matias Y. New Sampling-Based Summary Statistics for Improving Approximate Query Answers[C], In ACM SIGMOD Proceeding of International Conference on Management of Data, 1998. 331-342.

1吴琪.网络虚拟环境下不确定数据查询算法的改进[J].计算机光盘软件与应用,2014,17(11):95-95.
2李柰,王斌,关晶,王国仁.结构化网络中聚合Top-K查询优化技术[J].小型微型计算机系统,2007,28(11):2033-2037. 被引量：1
3李斌,郭雅娟,陈锦铭,袁晓冬.电能质量监测系统95概率大值的top-k优化研究[J].电力信息化,2013,11(1):20-24. 被引量：3
4潘林,齐庆芳.移动计算中概率数据集成的Top-k算法[J].德州学院学报,2014,30(6):63-67.
5李雷,李晓东,刘欣阳.分布式网络中的一种高效top-k求解方法研究[J].计算机工程与应用,2010,46(18):89-92. 被引量：1
6陈钦荣,刘顺来.基于Top-k查询算法改进的储存与NSDL调度算法研究[J].现代计算机（中旬刊）,2015(5):28-32.
7甄灵敏,杨晓春,王斌,Ahmed A Hussein.基于属性权重的实体解析技术[J].计算机研究与发展,2013,50(S1):281-289. 被引量：5
8于世龙,黄宏斌,邓苏.空间资源索引与top-k查询研究[J].计算机应用研究,2014,31(1):134-136.
9刘亦韬,胡维华.一种处理Top-k逆向查询的分支界定算法[J].杭州电子科技大学学报（自然科学版）,2014,34(6):76-79. 被引量：1
10潘林.数字图书馆联盟中概率数据集成系统上的top-k查询[J].网络安全技术与应用,2014(4):19-20.

数学的实践与认识

2009年第3期

浏览历史

内容加载中请稍等...

一种基于Bloom Filter的频繁模式挖掘算法

参考文献10

相关作者

相关机构

相关主题

浏览历史