This paper presents some new algorithms to efficiently mine max frequent generalized itemsets (g-itemsets) and essential generalized association rules (g-rules). These are compact and general representations for a...This paper presents some new algorithms to efficiently mine max frequent generalized itemsets (g-itemsets) and essential generalized association rules (g-rules). These are compact and general representations for all frequent patterns and all strong association rules in the generalized environment. Our results fill an important gap among algorithms for frequent patterns and association rules by combining two concepts. First, generalized itemsets employ a taxonomy of items, rather than a flat list of items. This produces more natural frequent itemsets and associations such as (meat, milk) instead of (beef, milk), (chicken, milk), etc. Second, compact representations of frequent itemsets and strong rules, whose result size is exponentially smaller, can solve a standard dilemma in mining patterns: with small threshold values for support and confidence, the user is overwhelmed by the extraordinary number of identified patterns and associations; but with large threshold values, some interesting patterns and associations fail to be identified. Our algorithms can also expand those max frequent g-itemsets and essential g-rules into the much larger set of ordinary frequent g-itemsets and strong g-rules. While that expansion is not recommended in most practical cases, we do so in order to present a comparison with existing algorithms that only handle ordinary frequent g-itemsets. In this case, the new algorithm is shown to be thousands, and in some cases millions, of the time faster than previous algorithms. Further, the new algorithm succeeds in analyzing deeper taxonomies, with the depths of seven or more. Experimental results for previous algorithms limited themselves to taxonomies with depth at most three or four. In each of the two problems, a straightforward lattice-based approach is briefly discussed and then a classificationbased algorithm is developed. In particular, the two classification-based algorithms are MFGI_class for mining max frequent g-itemsets and EGR_class for mining essential g-rules. The classification-based algorithms are featured with conceptual classification trees and dynamic generation and pruning algorithms.展开更多
The problem of association rule mining has gained considerableprominence in the data mining community for its use as an important tool of knowledge discovery from large-scale databases. And there has been a spurt of r...The problem of association rule mining has gained considerableprominence in the data mining community for its use as an important tool of knowledge discovery from large-scale databases. And there has been a spurt of researchactivities around this problem. Traditional association rule mining is limited tointratransaction. Only recently the concept of N-dimensional inter-transaction association rule (NDITAR) was proposed by H.J. Lu. This paper modifies and extendsLu's definition of NDITAR based on the analysis of its limitations, and the generalized multidimensional association rule (GMDAR) is subsequently introduced, whichis more general, flexible and reasonable than NDITAR.展开更多
文摘This paper presents some new algorithms to efficiently mine max frequent generalized itemsets (g-itemsets) and essential generalized association rules (g-rules). These are compact and general representations for all frequent patterns and all strong association rules in the generalized environment. Our results fill an important gap among algorithms for frequent patterns and association rules by combining two concepts. First, generalized itemsets employ a taxonomy of items, rather than a flat list of items. This produces more natural frequent itemsets and associations such as (meat, milk) instead of (beef, milk), (chicken, milk), etc. Second, compact representations of frequent itemsets and strong rules, whose result size is exponentially smaller, can solve a standard dilemma in mining patterns: with small threshold values for support and confidence, the user is overwhelmed by the extraordinary number of identified patterns and associations; but with large threshold values, some interesting patterns and associations fail to be identified. Our algorithms can also expand those max frequent g-itemsets and essential g-rules into the much larger set of ordinary frequent g-itemsets and strong g-rules. While that expansion is not recommended in most practical cases, we do so in order to present a comparison with existing algorithms that only handle ordinary frequent g-itemsets. In this case, the new algorithm is shown to be thousands, and in some cases millions, of the time faster than previous algorithms. Further, the new algorithm succeeds in analyzing deeper taxonomies, with the depths of seven or more. Experimental results for previous algorithms limited themselves to taxonomies with depth at most three or four. In each of the two problems, a straightforward lattice-based approach is briefly discussed and then a classificationbased algorithm is developed. In particular, the two classification-based algorithms are MFGI_class for mining max frequent g-itemsets and EGR_class for mining essential g-rules. The classification-based algorithms are featured with conceptual classification trees and dynamic generation and pruning algorithms.
文摘The problem of association rule mining has gained considerableprominence in the data mining community for its use as an important tool of knowledge discovery from large-scale databases. And there has been a spurt of researchactivities around this problem. Traditional association rule mining is limited tointratransaction. Only recently the concept of N-dimensional inter-transaction association rule (NDITAR) was proposed by H.J. Lu. This paper modifies and extendsLu's definition of NDITAR based on the analysis of its limitations, and the generalized multidimensional association rule (GMDAR) is subsequently introduced, whichis more general, flexible and reasonable than NDITAR.