Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy 被引量：2

Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy

导出

摘要 This paper presents some new algorithms to efficiently mine max frequent generalized itemsets （g-itemsets） and essential generalized association rules （g-rules）. These are compact and general representations for all frequent patterns and all strong association rules in the generalized environment. Our results fill an important gap among algorithms for frequent patterns and association rules by combining two concepts. First, generalized itemsets employ a taxonomy of items, rather than a flat list of items. This produces more natural frequent itemsets and associations such as （meat, milk） instead of （beef, milk）, （chicken, milk）, etc. Second, compact representations of frequent itemsets and strong rules, whose result size is exponentially smaller, can solve a standard dilemma in mining patterns： with small threshold values for support and confidence, the user is overwhelmed by the extraordinary number of identified patterns and associations; but with large threshold values, some interesting patterns and associations fail to be identified. Our algorithms can also expand those max frequent g-itemsets and essential g-rules into the much larger set of ordinary frequent g-itemsets and strong g-rules. While that expansion is not recommended in most practical cases, we do so in order to present a comparison with existing algorithms that only handle ordinary frequent g-itemsets. In this case, the new algorithm is shown to be thousands, and in some cases millions, of the time faster than previous algorithms. Further, the new algorithm succeeds in analyzing deeper taxonomies, with the depths of seven or more. Experimental results for previous algorithms limited themselves to taxonomies with depth at most three or four. In each of the two problems, a straightforward lattice-based approach is briefly discussed and then a classificationbased algorithm is developed. In particular, the two classification-based algorithms are MFGI_class for mining max frequent g-itemsets and EGR_class for mining essential g-rules. The classification-based algorithms are featured with conceptual classification trees and dynamic generation and pruning algorithms. This paper presents some new algorithms to efficiently mine max frequent generalized itemsets （g-itemsets） and essential generalized association rules （g-rules）. These are compact and general representations for all frequent patterns and all strong association rules in the generalized environment. Our results fill an important gap among algorithms for frequent patterns and association rules by combining two concepts. First, generalized itemsets employ a taxonomy of items, rather than a flat list of items. This produces more natural frequent itemsets and associations such as （meat, milk） instead of （beef, milk）, （chicken, milk）, etc. Second, compact representations of frequent itemsets and strong rules, whose result size is exponentially smaller, can solve a standard dilemma in mining patterns： with small threshold values for support and confidence, the user is overwhelmed by the extraordinary number of identified patterns and associations; but with large threshold values, some interesting patterns and associations fail to be identified. Our algorithms can also expand those max frequent g-itemsets and essential g-rules into the much larger set of ordinary frequent g-itemsets and strong g-rules. While that expansion is not recommended in most practical cases, we do so in order to present a comparison with existing algorithms that only handle ordinary frequent g-itemsets. In this case, the new algorithm is shown to be thousands, and in some cases millions, of the time faster than previous algorithms. Further, the new algorithm succeeds in analyzing deeper taxonomies, with the depths of seven or more. Experimental results for previous algorithms limited themselves to taxonomies with depth at most three or four. In each of the two problems, a straightforward lattice-based approach is briefly discussed and then a classificationbased algorithm is developed. In particular, the two classification-based algorithms are MFGI_class for mining max frequent g-itemsets and EGR_class for mining essential g-rules. The classification-based algorithms are featured with conceptual classification trees and dynamic generation and pruning algorithms.

作者 Daniel Kunkle 张冬晖 Gene Cooperman

机构地区 College of Computer and Information Science

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2008年第1期77-102,共26页 计算机科学技术学报（英文版）

关键词 generalized association rules frequent generalized itemsets redundancy avoidance generalized association rules, frequent generalized itemsets, redundancy avoidance

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献32

1Hipp J, Myka A, Wirth R, G/intzer U. A new algorithm for faster mining of generalized association rules. In Proc. European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), Nantes, Prance, 1998, pp.74-82.
2Pramudiono I, Kitsuregawa M. FP-tax: Tree structure based generalized association rule mining. In Proc. A CM/SIGMOD International Workshop on Research Issues on Data Mining and Knowledge Discovery ( DMKD), Paris, France, 2004, pp.60-63.
3Srikant R, Agrawal R. Mining generalized association rules. In Proc. International Conference on Very Large Data Bases (VLDB), Zurich, Switzerland, 1995, pp.407-419.
4Sriphaew K, Theeramunkong T. A new method for finding generalized frequent itemsets in generalized association rule mining. In Proc. International Symposium on Computers and Communications (ISCC), Taormina, Italy, 2002, oo.1040-1045.
5Sriphaew K, Theeramunkong T. Fast algorithms for mining generalized frequent patterns of generalized association rules. IEICE Transactions on Information and Systems, March 2004, E87-D(3).
6Sriphaew K, Theeramunkong T. Mining generalized closed frequent itemsets of generalized association rules. In Proc. International Conference on Knowledge-Based Intelligent Information and Engineering Systems ( KES), Oxford, United Kingdom, 2003, pp.476-484.
7Bayardo Jr R J. Efficiently mining long patterns from databases. In Proc. ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), Seattle, WA, 1998, pp.85- 93.
8Agarwal R C, Aggarwal C C, Prasad V V V. A tree projection algorithm for generation of frequent item sets. Journal of Parallel Distributed Computing, 2001, 61(3): 350-371.
9Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In Proceedings of ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), Dallas, TX, 2000, pp.1 12.
10Lin D I, Kedem Z M. Pincer-Search: An efficient algorithm for discovering the maximum frequent set. IEEE Trans. Knowledge and Data Engineering (TKDE), 2002, 14(3): 553-566.

同被引文献27

1易彤,徐宝文,吴方君.一种基于FP树的挖掘关联规则的增量更新算法[J].计算机学报,2004,27(5):703-710. 被引量：32
2Xiu-LiMa,Yun-HaiTong,Shi-WeiTang,Dong-QingYang.Efficient Incremental Maintenance of Frequent Patterns with FP-Tree[J].Journal of Computer Science & Technology,2004,19(6):876-884. 被引量：9
3陈耿,朱玉全,杨鹤标,陆介平,宋余庆,孙志挥.关联规则挖掘中若干关键技术的研究[J].计算机研究与发展,2005,42(10):1785-1789. 被引量：62
4Agrawal R, Srikant R. Fast Algorithm for Mining Association rules[C]//Proceedings of the 20th International Conference on VLDB. Santiago, Chile, 1994 : 487-499.
5Han J,Pei J, Yin Y. Mining frequent patterns without candidate generation [J]. ACM-SIGMOD International Conference on Management of Data, 2000,29 (2) : 1-12.
6Zaki M J. Fast vertical mining using diffsets[R]. 01-1. Troy, New YorkDepartment of Computer Science, Rensselaer Poly- technic Institute, 2001.
7Im E-J, Yelick K, Vuduc R. Sparsity: Optimization framework for sparse matrix kernels[J]. International Journal of High Per- formance Computing Applications, 2004,18(1) : 135-158.
8Hipp J,Guntzer U, Nakhaeizadeh G. Algorithms for association rule mining-A general survey and comparison[J]. SIGKDD Ex- plorations, 2000,2 (1) : 58-64.
9Dong J, Han M. BitTableFI: An efficient mining frequent item- sets algorithm [J ]. Knowledge-Based Systems, 2007, 20 ( 4 ) :329-335.
10Zheng X Y, Sun J Z, Zheng X Y. Finding Frequent Item Sets from Sparse Matrix[C]//International Conference on Electronic Computer Technology. 2009 : 615-619.

引证文献2

1闫珍,皮德常,吴文昊.高维稀疏数据频繁项集挖掘算法的研究[J].计算机科学,2011,38(6):183-186. 被引量：5
2毛宇星,施伯乐.基于扩展自然序树的概化关联规则增量挖掘方法[J].计算机研究与发展,2012,49(3):598-606. 被引量：8

二级引证文献13

1张春生,庄丽艳.基于Apriori的相容数据集间关联规则演绎方法[J].计算机应用,2013,33(10):2796-2800. 被引量：4
2罗丹,李陶深.一种基于压缩矩阵的Apriori算法改进研究[J].计算机科学,2013,40(12):75-80. 被引量：46
3王乐,冯林,王水.不产生候选项集的TOP-K高效用模式挖掘算法[J].计算机研究与发展,2015,52(2):445-455. 被引量：9
4李进,周丹.基于关联规则的用户兴趣模型研究与应用[J].通讯世界,2015,0(4):245-245. 被引量：2
5王乐,熊松泉,常艳芬,王水.基于模式增长方式的高效用模式挖掘算法[J].自动化学报,2015,41(9):1616-1626. 被引量：10
6丁邦旭,黄永青.矩阵与前缀树方法挖掘频繁项集[J].计算机工程与应用,2015,51(22):154-157. 被引量：1
7张步忠,江克勤,张玉州.增量关联规则挖掘研究综述[J].小型微型计算机系统,2016,37(1):18-23. 被引量：13
8王永建,铁小辉,董真,陈伟东.一种人工智能搜索算法的改进研究[J].通信技术,2017,50(2):248-254. 被引量：3
9谢志轩,李玉强.一种改进的流数据上的高效用模式挖掘算法[J].小型微型计算机系统,2017,38(9):2080-2085. 被引量：3
10杨秋翔,孙涵.基于权值向量矩阵约简的Apriori算法[J].计算机工程与设计,2018,39(3):690-693. 被引量：15

1刘义,陈荦,景宁,刘露.海量空间数据的并行Top-k连接查询[J].计算机研究与发展,2011,48(S3):163-172. 被引量：7
2姚玉坤,陈曦,任智,易建琼,雷宏江.基于冗余避免的高效网络编码广播重传方法[J].系统工程与电子技术,2015,37(5):1170-1176. 被引量：5
3李斌,马戈,孙志挥.项目集发生变化的关联规则增量式更新算法[J].计算机应用,2004,24(12):105-107. 被引量：1
4XuMin,JinYuanping,ZhuWujia,LiWenwu.MINING CYCLIC GENERALIZED ASSOCIATION RULES[J].Transactions of Nanjing University of Aeronautics and Astronautics,2002,19(1):98-102. 被引量：1
5DENG ZhiHong,WANG ZhongHui,JIANG JiaJian.A new algorithm for fast mining frequent itemsets using N-lists[J].Science China(Information Sciences),2012,55(9):2008-2030. 被引量：25
6王雪飞,沈来信.基于卫星图像古民居指纹识别的C4.5改良分类算法[J].科学技术与工程,2013,21(17):4987-4993. 被引量：2
7QIU Tao-Rong,LIU Qing,HUANG Hou-Kuan.A Granular Computing Approach to Knowledge Discovery in Relational Databases[J].自动化学报,2009,35(8):1071-1079. 被引量：3
8刘振山,王清贤,罗军勇.一种优化IPv6源路由拓扑探测的方法[J].计算机科学,2008,35(10):69-72. 被引量：1
9慕欢欢,柴玉梅,王黎明.面向数据流的一个高效用项集挖掘算法[J].计算机应用与软件,2015,32(4):283-287. 被引量：4
10童咏昕,陈雷,余洁莹.Mining Frequent Itemsets in Correlated Uncertain Databases[J].Journal of Computer Science & Technology,2015,30(4):696-712. 被引量：1

Journal of Computer Science & Technology

2008年第1期

浏览历史

内容加载中请稍等...

Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy 被引量：2

参考文献32

同被引文献27

引证文献2

二级引证文献13

相关作者

相关机构

相关主题

浏览历史