摘要
在经典的频繁闭合项集挖掘算法中,如Closet与Closet+,当条件模式数据库很庞大时,频繁项集的数目将会急剧增长,算法的效率会逐步恶化,并且算法挖掘结果的有效性也随着大量冗余模式的产生而下降.本文首先针对传统的FP-tree的算法,给出了一种改进的FP-tree算法,然后在新算法的基础上,提出新的频繁闭合项集挖掘算法,该算法只需把FP-Tree中所有由叶子结点到根结点的路径遍历一遍,就可以得到各项的所有子条件模式基,避免了传统FP-tree算法在同一条路径上向前回溯比较的繁琐.实验表明优化后的算法避免了资源的耗费,减少了频繁闭合项集挖掘的运算开销,大大提高了数据挖掘的效率.
The classic mining algorithms for mining frequent itemsets, such as Closet and Closet +, are proved to be inefficient and produce many redundant patterns, when mining extremely large datasets. This paper gives a new method to improve the performance of FP-tree firstly. Then based on the improved FP-tree a frequent closed itemsets mining algorithm is provided to improve the effectiveness of mining frequent close itemsets. The new algorithm optimizes the process of mining frequent itemsets and does not need to build conditional FP-tree recursively. The experimental results show that the new approach can save execution time. The feasibility and effectiveness of this new algorithm are also proved by experiments.
出处
《曲阜师范大学学报(自然科学版)》
CAS
2009年第2期57-61,共5页
Journal of Qufu Normal University(Natural Science)