摘要
关联规则挖掘是数据挖掘研究的最重要课题之一。基于图的关联规则挖掘DLG算法通过一次扫描数据库构建关联图,然后遍历该关联图产生频繁项集,有效地提高了关联规则挖掘的性能。在分析该算法基本原理基础上,提出了一种改进的算法—DLG#。改进算法在关联图构造同时构造项集关联矩阵,在候选项集生成时结合关联图和Apriori性质对冗余项集进行剪枝,减少了候选项集数,简化了候选项集的验证。比较实验结果表明,在不同数据集和不同支持度阈值下,改进算法都能更快速的发现频繁项集,当频繁项集平均长度较大时性能提高明显。
Mining association rules is one of the most important research field of data mining. The algorithm of mining association rules based on graph that named DLG scans the database once to construct an association graph, and then traverses the graph to generate frequent itemsets, which improves the performance of mining association rule efficiently. The basic principle of DLG is analyzed, a revised algorithm that named DLG# is proposed. The revised algorithm construct an association matrix and an association graph at the same time and in the phase of generating candidate itemsets the Apriori property based on association graph is utilized to prune the redundancy, thus the number of candidates is cut down and the validation of candidates is simple. Compared experiment results show that the revised algorithm can be more rapid to discovery frequent itemsets under different datasets and different support thresholds, the performance improve significantly when the average length of frequent itemsets is large.
出处
《计算机与数字工程》
2009年第12期38-41,162,共5页
Computer & Digital Engineering
关键词
数据挖掘
关联规则
频繁项集
关联图
关联矩阵
date mining, association rule, frequent itemsets, association graph, association matrix