Constructing Projection Frequent Pattern Tree for Efficient Mining 被引量：1

Constructing Projection Frequent Pattern Tree for Efficient Mining

下载PDF

导出

摘要 Frequent Pattern mining plays an essential role in data mining. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. We introduce a novel frequent pattern growth (FP-growth) method, which is efficient and scalable for mining both long and short frequent patterns without candidate generation. And build a new projection frequent pattern tree (PFP-tree) algorithm, which not only heirs all the advantages in the FP-growth method, but also avoids it's bottleneck in database size dependence when constructing the frequent pattern tree (FP-tree). Efficiency of mining is achieved by introducing the projection technique, which avoid serial scan each frequent item in the database, the cost is mainly related to the depth of the tree, namely the number of frequent items of the longest transaction in the database, not the sum of all the frequent items in the database, which hugely shortens the time of tree-construction. Our performance study shows that the PFP-tree method is efficient and scalable for mining large databases or data warehouses, and is even about an order of magnitude faster than the FP-growth method. Frequent Pattern mining plays an essential role in data mining. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. We introduce a novel frequent pattern growth (FP-growth) method, which is efficient and scalable for mining both long and short frequent patterns without candidate generation. And build a new projection frequent pattern tree (PFP-tree) algorithm, which not only heirs all the advantages in the FP-growth method, but also avoids it's bottleneck in database size dependence when constructing the frequent pattern tree (FP-tree). Efficiency of mining is achieved by introducing the projection technique, which avoid serial scan each frequent item in the database, the cost is mainly related to the depth of the tree, namely the number of frequent items of the longest transaction in the database, not the sum of all the frequent items in the database, which hugely shortens the time of tree-construction. Our performance study shows that the PFP-tree method is efficient and scalable for mining large databases or data warehouses, and is even about an order of magnitude faster than the FP-growth method.

作者 XiangJian-wen HeYan-xiang KokichiFutatsugi KongWei-qiang

机构地区 SchoolofComputer StateKeyLabofSoftwareEngineering SchoolofInformationScience

出处《Wuhan University Journal of Natural Sciences》 EI CAS 2003年第02A期351-357,共7页 武汉大学学报（自然科学英文版）

基金 the National Natural Science Foundation of China(90104005)

关键词 ALGORITHMS Database systems Trees (mathematics) Algorithms Database systems Trees (mathematics)

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

同被引文献12

1Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation [ C ]. Proceedings of ACM SIGMOD,2000,29 (2) : 1-12.
2Tan P N, Steinbach M, Kumar V. Introduction to data mining [ M ]. Addison-Wesley Longman Publishing Co. , Inc. Boston, MA, USA, 2005,40 -53.
3Chui C K, Kao B, Hung E. Mining frequent itemsets from uncertain data[ J ]. Springer,2007,47-58.
4Chui C K, Kao B, Hung E. A decremental approach for mining frequent itemsets from uncertain [ C ]. PAKDD,2007:64-75.
5Leung C K-S, Carmichael C, Hao B Y. Efficient mining of frequent patterns from uncertain data[ C ]. Proc. of the 7th IEEE International Conference on Data Mining Workshops,2007:489-494.
6Aggarwal C, Li Y, Wang J, et al. Frequent pattern mining with uncertain data[ C ]. KDD ,2009:29-38.
7Aggarwal C C. Managing and Mining Uncertain Data[ J]. Springer, 2009,35:494-510.
8Agrawal R, Srikant R. Fast Algorithms for Mining Association Rules in Large Databases[ C ]. Proceedings of the 20th international conference on very large data bases, VLDB, 1994:487-499.
9Leung C K-S,Mateo M A F, Brajczuk D A. A Tree-Based Approach for Frequent Pattern Mining from Uncertain Data[ C]. PAKDD 2008.
10Zhang Q, Li F, Yi K. Finding Frequent Items in Probabilistic Data [ C ]. Proceeding of the ACM SIGMOD,2008 : 819-832.

引证文献1

1李学明,沈雪,王震.基于树的不确定数据频繁项的挖掘算法研究[J].世界科技研究与发展,2010,32(4):433-436.

1何炎祥,向剑文,朱骁峰,孔维强.不产生候选的快速投影频繁模式树挖掘算法[J].计算机科学,2002,29(11):71-75. 被引量：11
2王海鹏,吴昊.可伸缩地形渲染中的GPU光线投射算法[J].微计算机信息,2010,26(35):203-205.
3刘海虹,刘伯莹.数据挖掘技术[J].丹东纺专学报,2001,8(1):30-31. 被引量：9
4李力,靳蕃.A New Algorithm for Mining Frequent Pattern[J].Journal of Southwest Jiaotong University(English Edition),2002,10(1):10-20. 被引量：2
5Xiu-LiMa,Yun-HaiTong,Shi-WeiTang,Dong-QingYang.Efficient Incremental Maintenance of Frequent Patterns with FP-Tree[J].Journal of Computer Science & Technology,2004,19(6):876-884. 被引量：9
6史巧硕,周慧霞,李杨,李娟.回归方法估算最长频繁模式长度[J].河北工业大学学报,2015,44(5):45-50.
7杨君锐,赵群礼.一种不产生候选集的最大频繁集快速挖掘算法[J].微电子学与计算机,2004,21(11):125-128. 被引量：4
8石菲.金融行业挖掘数据金矿[J].中国信息化,2013(10):50-51.
9冯建华,钱乾,王建勇,周立柱.Efficient Mining of Frequent Closed XML Query Pattern[J].Journal of Computer Science & Technology,2007,22(5):725-735.
10Actian公司从大数据提炼商业价值[J].金融电子化,2014(6):128-128.

Wuhan University Journal of Natural Sciences

2003年第02A期

浏览历史

内容加载中请稍等...

Constructing Projection Frequent Pattern Tree for Efficient Mining 被引量：1

同被引文献12

引证文献1

相关作者

相关机构

相关主题

浏览历史