摘要
针对基于模式增长原理的嵌入式子树挖掘算法——TreeGrowth(TG)算法挖掘子树过大与内存消耗大缺点,在分区挖掘思想的基础上,提出了一种新算法——PTG(partition tree growth)算法。PTG算法将数据库划分成多个分区,先用TG算法进行挖掘,得到每个分区的局部频繁子树。根据全局支持数进行筛选,得到全局频繁子树,有效地减少了挖掘的子树,有效地降低了内存的开销。仿真实验结果表明,PTG算法能够解决在大数据集上挖掘时出现内存空间不足的问题,验证了其有效性与健壮性。
The TG (tree growth) algorithm based on pattern growth principle is analyzed, which is mining on a tree occupying too much memory. Based on partition principle, a new algorithm, named PTG (partition tree growth), is put forward. In the PTG algorithm, the database is divided into several partitions, the TG algorithm creates the local frequent subtrees of every partition, and then creates the global frequent subtrees according to the global support value for filtering. The tests show that PTG algorithm can deal with the memory problem while mining large dataset, and work effectively.
出处
《计算机工程与设计》
CSCD
北大核心
2011年第6期2054-2057,共4页
Computer Engineering and Design
关键词
模式挖掘
频繁子树
模式增长
投影
分区挖掘
pattern mining
frequent subtree
pattern growth
projection
partition mining