摘要
事务数据库中频繁模式的挖掘研究作为关联规则等许多数据挖掘问题的核心工作,已经研究了许多年。早期算法大都是Apriori型算法,即首先产生候选集,然后在候选集的基础上找出频繁模式,候选集的产生往往是耗时的,特别是挖掘富模式或长模式时。JianweiHan等人提出了一种新颖的数据结构FP-tree及基于其上的FP-growth算法,用于有效的富模式与长模式挖掘。由于不同的实现方法可能会导致不同的挖掘效率,该文在讨论FP-growth算法的基础上,采用了几种不同的方法来实现它,并用几个数据库对它们的性能进行了比较。
Mining frequent patterns in transaction databases,as an essential role in many data mining tasks such as the association rule mining,has been widely studied for many years.Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach.However,candidate set generation is costly if there exist prolific patterns or long patterns.Jianwei Han et al propose a frequent pattern tree structure and a FP-growth algorithm based on this structure that can mine the frequent patterns by pattern fragment growth.Due to different methods will result in different performance,in this paper several methods to implement the FP-growth algorithm are discussed.The performance is studied,analyzed and compared on several canonical datasets.
出处
《计算机工程与应用》
CSCD
北大核心
2004年第9期174-176,共3页
Computer Engineering and Applications
基金
国家973重点基础研究发展规划项目(编号:G1999032705)
留学回国人员科研启动基金资助
关键词
频繁模式
关联规则
数据挖掘
算法
Frequent Pattern,Association Rule,Data Mining,Algorithm