基于优化的FP-Tree的频繁闭合项集挖掘算法

Frequent Closed Itemsets Mining Algorithm Based on Improved FP-growth

下载PDF

导出

摘要在经典的频繁闭合项集挖掘算法中,如Closet与Closet+,当条件模式数据库很庞大时,频繁项集的数目将会急剧增长,算法的效率会逐步恶化,并且算法挖掘结果的有效性也随着大量冗余模式的产生而下降.本文首先针对传统的FP-tree的算法,给出了一种改进的FP-tree算法,然后在新算法的基础上,提出新的频繁闭合项集挖掘算法,该算法只需把FP-Tree中所有由叶子结点到根结点的路径遍历一遍,就可以得到各项的所有子条件模式基,避免了传统FP-tree算法在同一条路径上向前回溯比较的繁琐.实验表明优化后的算法避免了资源的耗费,减少了频繁闭合项集挖掘的运算开销,大大提高了数据挖掘的效率. The classic mining algorithms for mining frequent itemsets, such as Closet and Closet ＋, are proved to be inefficient and produce many redundant patterns, when mining extremely large datasets. This paper gives a new method to improve the performance of FP-tree firstly. Then based on the improved FP-tree a frequent closed itemsets mining algorithm is provided to improve the effectiveness of mining frequent close itemsets. The new algorithm optimizes the process of mining frequent itemsets and does not need to build conditional FP-tree recursively. The experimental results show that the new approach can save execution time. The feasibility and effectiveness of this new algorithm are also proved by experiments.

作者颜伟苏兆锋周钦亮

机构地区曲阜师范大学信息网络中心鲁东大学管理学院上海市徐汇区漕河泾高新区Mettier Toledo

出处《曲阜师范大学学报（自然科学版）》 CAS 2009年第2期57-61,共5页 Journal of Qufu Normal University(Natural Science)

关键词数据挖掘闭合项集频繁模式增长 data mining closed itemsets FP-growth

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献15

1Agrawal R, Imielinski T, Swami A Mining association rules between sets of items in large databases [ J ]. SIGMOD93, May 1993.
2Agrawal R, Srikant R. Fast algorithms for mining association rules [J]. VLDB94, Sept. 1994.
3Bayardo R J. Effciently Mining long patterns from databases. SIGMOD'98, 1998.
4Brin S, Motwani R, Ullman J D, et al Dynamic hemset Counting and Implication Rules for Market Basket Data [J]. SIGMOD'97, 1997.
5Pei J, Han J, Mao R. CLOSET: An effcient algorithm for mining frequent closed itemsets [ J]. DMKD'00, 2000.
6Gunopulos D, Mannila H, Saluja S. Discovering All Most Specific Sentences by Randomized Algorithms. ICDT'97, 1997.
7Han E, Karypis G, Kumar V, Scalable Parallel Data Mining for Association Rules [J]. TKDE, 2000.12(2).
8Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation [ J]. SIGMOD00, 2000.
9Wang Jianyong , Han Jiawei, Pei Jian Closet + : Searching for the Best Strategies for mining frequent closed itemsets [ J ] SIGKDD'03.
10] Liu J, Pan Y, Wang K, et al. Mining frequent item sets by opportunistic projection [J]. SIGKDD'02, 2002.

二级参考文献13

1AGRAWAL R, IMIELINSKI T, SWAMI A. Mining association rules between sets of items in large databases[ A]. In Proc. 1993 ACM-SIGMOD Int. Conf. Management of Data ( SIGMOD' 93)[C]. Washington, DC, 1993.207-216.
2MANNILA H, TOIVONEN H, VERKAMO I. Efficient Algorithms for Discovering Association Rules[ A]. Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining[ C], 1984. 181 -192.
3AGRAWAL R, SRIKANT R.Fast algorithms for mining association roles[ A]. In Proc. 1994 Int. Conf. VeryLarge Data Bases ( VLDB'94)[C]. Santiago, Chile, 1994. 487 -499.
4DONG G, LI J.Efficient mining of emerging patterns: Discovering trends and differences[A]. In Proc. 1999 Int. Conf. Knowledge Discovery and Data Mining (KDD' 99)[ C]. San Diego, CA, 1999.43 - 52.
5AGRAWAL R, MANNILA H, SRIKANT R, et al. Fast discovery of association rules[ M]. In Advances in Knowledge Discovery and Data Mining, U.M. Fay'gad, G. Piatetsky-Shapiro, P, Smylh, and R,Uthurusamy ( Eds. ), AAAI/MIT Press, 1996, 307 -328.
6AGRAWAL R, SRIKANT R, Mining sequential patterns[ A]. In Proc, 1995 Int. Conf. Data Engineering ( ICDE' 95) [ C]. Taipei,Taiwan, 1995. 3 - 14.
7PARK JS, CHEN MS, YU PS. An effective hash-based algorithm for mining association rules[ A]. In Proc. 1995 ACM-SIGMOD Int,Conf. Management of Data ( SIGMOD' 95) [ C]. San Jose, CA,1995. 175 - 186.
8PEI J, HAN J, MORTAZAVI-ASL B, et al. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth[ A].In Proc. 2001 Int. Conf. Data Engineering ( ICDE' 01) [ C]. Heidelberg, Germany, 2001. 215 - 224.
9SRIKANT R, AGRAWAL R, Mining sequential patterns: Generalizations and performance improvements[ A]. In Prec. 5th Int. Conf.Extending Database Technology ( EDBT' 96) [ C]. Avignon, France,1996. 3 - 17.
10SAVASERE A, OMIECINSKI E, NAVATHE S. An efficient algorithm for mining association roles in large databases[ A]. In Proc.1995 Int. Conf. Very Large Data Bases (VLDB' 95)[C]. Zurich,Switzerland, 1995. 432 - 443.

共引文献6

1刘金岭.数据挖掘技术在商品销售预测方面的应用[J].商场现代化,2008(5):31-32. 被引量：1
2钱雪忠,惠亮.关联规则中基于降维的最大频繁模式挖掘算法[J].计算机应用,2011,31(5):1339-1343. 被引量：13
3凌绪雄,王社国,李洋,苗再良.无项头表的FP-Growth算法[J].计算机应用,2011,31(5):1391-1394. 被引量：8
4吴洁明,李硕征,史建宜.节点具有相关性的树形结构的验证方法[J].计算机工程与设计,2014,35(4):1298-1302.
5钱慎一,王欢欢,杨铁松.改进关联规则算法在烟草物流销售规律中的应用[J].计算机系统应用,2016,25(3):204-208. 被引量：1
6吕璐成,赵亚娟,王学昭,赵萍.基于关联规则挖掘的研发团队识别方法[J].科技管理研究,2016,36(17):148-152. 被引量：7

1毛建景.经典关联规则挖掘算法[J].河南科技,2014,33(10):17-18.
2王丹丹,刘同明,张静.H-C:基于H-Struct的频繁闭合项集挖掘算法[J].江苏科技大学学报（自然科学版）,2006,20(4):60-63.
3周钦亮,李玉忱,公爱国.一种新的高效生成FP-Tree条件模式基的算法[J].计算机应用,2006,26(6):1418-1421. 被引量：7
4程转流,胡为成.数据流频繁模式挖掘技术研究[J].铜陵学院学报,2007,6(5):69-70.
5张玉强.基于数据挖掘和本体的实时入侵检测系统[J].微计算机信息,2006,22(07X):142-144. 被引量：6
6程转流,胡学钢.数据流中频繁闭合模式的挖掘[J].计算机工程,2008,34(16):50-52. 被引量：4
7黄嘉满,张冬茉.基于文本的关联规则提取方法的研究[J].计算机仿真,2008,25(1):96-99. 被引量：5
8喻斌,武友新.更优的快速频繁模式树生成算法[J].计算机工程与设计,2007,28(20):4853-4855. 被引量：3
9武丽芬.一种优化FP-growth的支持度相同项的排序算法[J].网络新媒体技术,2012,1(4):53-56. 被引量：1
10李国徽,杨兵,胡惇,陈辉,杜建强.挖掘滑动窗口中的数据流频繁模式[J].小型微型计算机系统,2008,29(8):1491-1497. 被引量：9

曲阜师范大学学报（自然科学版）

2009年第2期

浏览历史

内容加载中请稍等...

基于优化的FP-Tree的频繁闭合项集挖掘算法

参考文献15

二级参考文献13

共引文献6

相关作者

相关机构

相关主题

浏览历史