摘要
在Fp-growth算法的基础上,提出了一种新颖的关联规则挖掘算法.该算法将大型数据库分解成频繁1-项集的项总数个子集,然后对分解得到的各个数据库子集用Fp-growth算法进行约束项数据挖掘,待所有数据库子集的约束项数据挖掘进行完毕后,再合并这些约束频繁项得到大型数据库的频繁项集.实验结果表明新算法所采用的数据库划分策略克服了FP-growth算法对大型数据库进行挖掘时,占用内存大,运行速度慢的不足,是一种适合于大型数据库的关联规则挖掘算法.
Fp-growth algorithm has disadvantages such as lower space utilization rate and slower execution time when mining the large datasets. To overcome these drawbacks, based on the Fp-growth algorithm, this paper proposed a new algorithm for mining association rules from large datasets. The algorithm adopts a new strategy to divide the large datasets into many subsets, and then, carries out constrained frequent item sets mining for each subset. Experiments have been conducted to compare the proposed algorithm with the Fp-growth algorithm. Experimental results show that the algorithm has lower memory usage, and is faster than the Fp-growth algorithm when the datasets is very large.
出处
《湖南师范大学自然科学学报》
CAS
北大核心
2007年第2期30-34,共5页
Journal of Natural Science of Hunan Normal University
基金
国家技术创新资助项目[国经贸技术(2002)845号]