摘要
频繁项集挖掘是关联规则挖掘中最关键的步骤。最大频繁项集是一种常用的频繁项集简化表示方法。自顶向下的最大频繁项集挖掘方法在最大频繁项集维度远小于频繁项数时往往会产生过多的候选频繁项集。已有的自底向上的最大频繁项集挖掘方法或者需多次遍历数据库,或者需递归生成条件频繁模式树,而预测剪枝策略有进一步提升的空间。为此,提出了基于最小非频繁项集的最大频繁项集挖掘算法(BNFIA),采用基于DFP-tree的存储结构,通过自底向上的方式挖掘出最小非频繁项集,利用最小非频繁项集的性质进行预测剪枝,以缩小搜索空间,再通过边界频繁项集快速挖掘出最大频繁项集。验证实验结果表明,提出算法的性能较同类算法有较为明显的提升。
Mining frequent itemsets is the most critical step in mining association rules. Maximum frequent itemsets is a common com- pressed representation of frequent itemsets. In mining maximum frequent itemsets, the top-down methods would produce lots of candidate itemsets when the dimensions of maximum frequent itemsets is smaller than the number of frequent itemsets. The existing bottom-up methods need either traversal in database many times or building FP-trce recursively, and the prediction pruning strategies have further room for improvement. The algorithm of discovering maximum frequent itemsets based on minimum non-frequent itemsets named BN- FIA has been proposed,which uses storage structure based on FP-tree and digs out the minimum non-frequent itemsets through a bottom -up approach first,then prunes with the minimum non-frequent itemsets to narrow search space for acquiring the maximum frequent itemsets fast through boundary frequent itemsets. Experimental results show that the proposed algorithm has performed better than the al- gorithm with same type.
出处
《计算机技术与发展》
2017年第8期57-60,65,共5页
Computer Technology and Development
基金
国家科技重点专项"核高基"(2015ZX01040-201)