摘要
挖掘最大频繁项集是多种数据挖掘应用中的关键问题.针对频繁模式挖掘的可扩展性问题,基于Bloom Filter的相关理论,提出了一种"挖掘最频繁的K个元素"的Mining Top-K算法.该算法基于推广的Bloom Filter的数据结构,能够较为准确地筛选出数据流中出现最频繁的K个元素,并估算这K个元素的出现频数.实验结果表明:该方法在具有低空间复杂度特性的同时又不会失去准确性.
Mining maximum frequent itemsets is a key problem in data mining. Aiming at solving the scalable problem for mining frequent itemsets, based on the theory of the Bloom Filter, an algorithm called Mining Top-K is proposed. It can not only mine the K-most frequent elements, but also circumvent the scalable problem of mining frequent itemsets. Especially, with the application of the extended Bloom Filter, the algorithm finding the K-most elements can compute the frequency of the K-most frequent elements. Experiments demonstrate that the algorithm can achieve space saving without sacrificing accuracy.
出处
《数学的实践与认识》
CSCD
北大核心
2009年第3期172-177,共6页
Mathematics in Practice and Theory