摘要
以Apriori类的候选集产生-检查迭代法为代表的频繁模式挖掘在数据挖掘中扮演了十分重要的角色,详细研究了FP-growth频繁模式 挖掘算法存在的瓶颈,提出了基于预制数据库的PFP-tree构造算法,该算法既能有效地利用FP-growth算法的优点,又能利用预制数据库技术 将大型数据库按时间区段化成多个视图分而论之,适于并行运算,大大提高了速度性能。该算法还能有效地避免重复运算,有效地支持数据 库的数据更新,具有良好的可伸缩性。
The frequent patterns mining method that is represented by the Apriori-like candidate set generation-and test approach plays an essential role in data mining. The FP-growth algorithm of frequent patterns mining is the most excellent fast algorithm.However, the program written by the algorithm will access a database frequently. It will take a long time greatly to access a large-scale database frequently. This paper analyzes the bottleneck of the algorithm and propose an algorithm to build the FP-tree on prefabricated database. The algorithm not only heirs all the advantages in FP-growth method, but also can be used to a side-by-side procedure efficiently. Base on the large-scale database is divided to small view by using prefabricated database technique; it will avoid the repeated calculation and support the update of the database efficiently.
出处
《计算机工程》
CAS
CSCD
北大核心
2004年第B12期58-61,共4页
Computer Engineering
关键词
数据挖掘
频繁模式树
预制数据库
频繁模式树合并
Data mining
Frequent patterns-tree
Prefabricated database
Consolidation of frequent patterns-tree