This paper presents a new efficient algorithm for mining frequent closed itemsets. It enumerates the closed set of frequent itemsets by using a novel compound frequent itemset tree that facilitates fast growth and eff...This paper presents a new efficient algorithm for mining frequent closed itemsets. It enumerates the closed set of frequent itemsets by using a novel compound frequent itemset tree that facilitates fast growth and efficient pruning of search space. It also employs a hybrid approach that adapts search strategies, representations of projected transaction subsets, and projecting methods to the characteristics of the dataset. Efficient local pruning, global subsumption checking, and fast hashing methods are detailed in this paper. The principle that balances the overheads of search space growth and pruning is also discussed. Extensive experimental evaluations on real world and artificial datasets showed that our algorithm outperforms CHARM by a factor of five and is one to three orders of magnitude more efficient than CLOSET and MAFIA.展开更多
HA(hashing array),a new algorithm,for mining frequent itemsets of large database is proposed.It employs a structure hash array,ItemArray() to store the information of database and then uses it instead of database in l...HA(hashing array),a new algorithm,for mining frequent itemsets of large database is proposed.It employs a structure hash array,ItemArray() to store the information of database and then uses it instead of database in later iteration.By this improvement,only twice scanning of the whole database is necessary,thereby the computational cost can be reduced significantly.To overcome the performance bottleneck of frequent 2-itemsets mining,a modified algorithm of HA,DHA(direct-addressing hashing and array) is proposed,which combines HA with direct-addressing hashing technique.The new hybrid algorithm,DHA,not only overcomes the performance bottleneck but also inherits the advantages of HA.Extensive simulations are conducted in this paper to evaluate the performance of the proposed new algorithm,and the results prove the new algorithm is more efficient and reasonable.展开更多
文摘This paper presents a new efficient algorithm for mining frequent closed itemsets. It enumerates the closed set of frequent itemsets by using a novel compound frequent itemset tree that facilitates fast growth and efficient pruning of search space. It also employs a hybrid approach that adapts search strategies, representations of projected transaction subsets, and projecting methods to the characteristics of the dataset. Efficient local pruning, global subsumption checking, and fast hashing methods are detailed in this paper. The principle that balances the overheads of search space growth and pruning is also discussed. Extensive experimental evaluations on real world and artificial datasets showed that our algorithm outperforms CHARM by a factor of five and is one to three orders of magnitude more efficient than CLOSET and MAFIA.
文摘HA(hashing array),a new algorithm,for mining frequent itemsets of large database is proposed.It employs a structure hash array,ItemArray() to store the information of database and then uses it instead of database in later iteration.By this improvement,only twice scanning of the whole database is necessary,thereby the computational cost can be reduced significantly.To overcome the performance bottleneck of frequent 2-itemsets mining,a modified algorithm of HA,DHA(direct-addressing hashing and array) is proposed,which combines HA with direct-addressing hashing technique.The new hybrid algorithm,DHA,not only overcomes the performance bottleneck but also inherits the advantages of HA.Extensive simulations are conducted in this paper to evaluate the performance of the proposed new algorithm,and the results prove the new algorithm is more efficient and reasonable.