摘要
在数据挖掘中发现关联规则是一个基本问题,而关联规则发现中最昂贵的步骤便是寻找频繁模式。FP_growth(FrequentPatern growth)方法在产生长短频繁项集时不产生候选项集,从而大大提高了挖掘的效率,但是FP_growth在挖掘频繁模式时候产生大量的条件FP树从而占用大量空间,对FP_growth进行研究并提出一种改进算法,该算法不仅利用FP_growth算法所有优点而且避免了FP_growth的缺陷。主要通过建立有限棵条件FP树(数目为事务数据库的属性个数)来挖据长短频繁模式,大大节省了FP_growth算法所需要空间,实验证明该文算法是有效的。
Discovering association rules is a basic problem in data mining.Finding frequent item sets is the most expensive step in association rule discovery.Analysing a frequent pattern growth(FP_growth) method is effieient for mining both long and short frequent patterns without candidate generation,but FP_growth would generate a huge number of conditional FP_trees and then occupied memory space,so proposing a new efficient algorithm not only heirs all the advantages in FP_growth method,but also avoids its bottleneck.By establishing several conditional FP_trees (the number is equal the number of database's items) to mine long and short frequent item sets,the improved algorithm could save memory space significantly.Performance study also shows that the improved method is efficient.
出处
《计算机工程与应用》
CSCD
北大核心
2007年第5期175-177,共3页
Computer Engineering and Applications
基金
福建省自然科学基金(the Natural Science Foundation of Fujian Province of China under Grant No.A0310008)
福建省高新技术研究开放计划重点项目(2003H043)