期刊文献+

Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy 被引量:2

Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy
原文传递
导出
摘要 This paper presents some new algorithms to efficiently mine max frequent generalized itemsets (g-itemsets) and essential generalized association rules (g-rules). These are compact and general representations for all frequent patterns and all strong association rules in the generalized environment. Our results fill an important gap among algorithms for frequent patterns and association rules by combining two concepts. First, generalized itemsets employ a taxonomy of items, rather than a flat list of items. This produces more natural frequent itemsets and associations such as (meat, milk) instead of (beef, milk), (chicken, milk), etc. Second, compact representations of frequent itemsets and strong rules, whose result size is exponentially smaller, can solve a standard dilemma in mining patterns: with small threshold values for support and confidence, the user is overwhelmed by the extraordinary number of identified patterns and associations; but with large threshold values, some interesting patterns and associations fail to be identified. Our algorithms can also expand those max frequent g-itemsets and essential g-rules into the much larger set of ordinary frequent g-itemsets and strong g-rules. While that expansion is not recommended in most practical cases, we do so in order to present a comparison with existing algorithms that only handle ordinary frequent g-itemsets. In this case, the new algorithm is shown to be thousands, and in some cases millions, of the time faster than previous algorithms. Further, the new algorithm succeeds in analyzing deeper taxonomies, with the depths of seven or more. Experimental results for previous algorithms limited themselves to taxonomies with depth at most three or four. In each of the two problems, a straightforward lattice-based approach is briefly discussed and then a classificationbased algorithm is developed. In particular, the two classification-based algorithms are MFGI_class for mining max frequent g-itemsets and EGR_class for mining essential g-rules. The classification-based algorithms are featured with conceptual classification trees and dynamic generation and pruning algorithms. This paper presents some new algorithms to efficiently mine max frequent generalized itemsets (g-itemsets) and essential generalized association rules (g-rules). These are compact and general representations for all frequent patterns and all strong association rules in the generalized environment. Our results fill an important gap among algorithms for frequent patterns and association rules by combining two concepts. First, generalized itemsets employ a taxonomy of items, rather than a flat list of items. This produces more natural frequent itemsets and associations such as (meat, milk) instead of (beef, milk), (chicken, milk), etc. Second, compact representations of frequent itemsets and strong rules, whose result size is exponentially smaller, can solve a standard dilemma in mining patterns: with small threshold values for support and confidence, the user is overwhelmed by the extraordinary number of identified patterns and associations; but with large threshold values, some interesting patterns and associations fail to be identified. Our algorithms can also expand those max frequent g-itemsets and essential g-rules into the much larger set of ordinary frequent g-itemsets and strong g-rules. While that expansion is not recommended in most practical cases, we do so in order to present a comparison with existing algorithms that only handle ordinary frequent g-itemsets. In this case, the new algorithm is shown to be thousands, and in some cases millions, of the time faster than previous algorithms. Further, the new algorithm succeeds in analyzing deeper taxonomies, with the depths of seven or more. Experimental results for previous algorithms limited themselves to taxonomies with depth at most three or four. In each of the two problems, a straightforward lattice-based approach is briefly discussed and then a classificationbased algorithm is developed. In particular, the two classification-based algorithms are MFGI_class for mining max frequent g-itemsets and EGR_class for mining essential g-rules. The classification-based algorithms are featured with conceptual classification trees and dynamic generation and pruning algorithms.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2008年第1期77-102,共26页 计算机科学技术学报(英文版)
关键词 generalized association rules frequent generalized itemsets redundancy avoidance generalized association rules, frequent generalized itemsets, redundancy avoidance
  • 相关文献

参考文献32

  • 1Hipp J, Myka A, Wirth R, G/intzer U. A new algorithm for faster mining of generalized association rules. In Proc. European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), Nantes, Prance, 1998, pp.74-82.
  • 2Pramudiono I, Kitsuregawa M. FP-tax: Tree structure based generalized association rule mining. In Proc. A CM/SIGMOD International Workshop on Research Issues on Data Mining and Knowledge Discovery ( DMKD), Paris, France, 2004, pp.60-63.
  • 3Srikant R, Agrawal R. Mining generalized association rules. In Proc. International Conference on Very Large Data Bases (VLDB), Zurich, Switzerland, 1995, pp.407-419.
  • 4Sriphaew K, Theeramunkong T. A new method for finding generalized frequent itemsets in generalized association rule mining. In Proc. International Symposium on Computers and Communications (ISCC), Taormina, Italy, 2002, oo.1040-1045.
  • 5Sriphaew K, Theeramunkong T. Fast algorithms for mining generalized frequent patterns of generalized association rules. IEICE Transactions on Information and Systems, March 2004, E87-D(3).
  • 6Sriphaew K, Theeramunkong T. Mining generalized closed frequent itemsets of generalized association rules. In Proc. International Conference on Knowledge-Based Intelligent Information and Engineering Systems ( KES), Oxford, United Kingdom, 2003, pp.476-484.
  • 7Bayardo Jr R J. Efficiently mining long patterns from databases. In Proc. ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), Seattle, WA, 1998, pp.85- 93.
  • 8Agarwal R C, Aggarwal C C, Prasad V V V. A tree projection algorithm for generation of frequent item sets. Journal of Parallel Distributed Computing, 2001, 61(3): 350-371.
  • 9Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In Proceedings of ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), Dallas, TX, 2000, pp.1 12.
  • 10Lin D I, Kedem Z M. Pincer-Search: An efficient algorithm for discovering the maximum frequent set. IEEE Trans. Knowledge and Data Engineering (TKDE), 2002, 14(3): 553-566.

同被引文献27

  • 1易彤,徐宝文,吴方君.一种基于FP树的挖掘关联规则的增量更新算法[J].计算机学报,2004,27(5):703-710. 被引量:32
  • 2Xiu-LiMa,Yun-HaiTong,Shi-WeiTang,Dong-QingYang.Efficient Incremental Maintenance of Frequent Patterns with FP-Tree[J].Journal of Computer Science & Technology,2004,19(6):876-884. 被引量:9
  • 3陈耿,朱玉全,杨鹤标,陆介平,宋余庆,孙志挥.关联规则挖掘中若干关键技术的研究[J].计算机研究与发展,2005,42(10):1785-1789. 被引量:62
  • 4Agrawal R, Srikant R. Fast Algorithm for Mining Association rules[C]//Proceedings of the 20th International Conference on VLDB. Santiago, Chile, 1994 : 487-499.
  • 5Han J,Pei J, Yin Y. Mining frequent patterns without candidate generation [J]. ACM-SIGMOD International Conference on Management of Data, 2000,29 (2) : 1-12.
  • 6Zaki M J. Fast vertical mining using diffsets[R]. 01-1. Troy, New YorkDepartment of Computer Science, Rensselaer Poly- technic Institute, 2001.
  • 7Im E-J, Yelick K, Vuduc R. Sparsity: Optimization framework for sparse matrix kernels[J]. International Journal of High Per- formance Computing Applications, 2004,18(1) : 135-158.
  • 8Hipp J,Guntzer U, Nakhaeizadeh G. Algorithms for association rule mining-A general survey and comparison[J]. SIGKDD Ex- plorations, 2000,2 (1) : 58-64.
  • 9Dong J, Han M. BitTableFI: An efficient mining frequent item- sets algorithm [J ]. Knowledge-Based Systems, 2007, 20 ( 4 ) :329-335.
  • 10Zheng X Y, Sun J Z, Zheng X Y. Finding Frequent Item Sets from Sparse Matrix[C]//International Conference on Electronic Computer Technology. 2009 : 615-619.

引证文献2

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部