摘要
Apriori算法是数据挖掘领域挖掘关联规则频繁项目集的经典算法,但该算法存在产生大量的候选项目集及需要多次扫描数据库的缺陷。为此提出一种新的挖掘关联规则频繁项目集算法(CApriori算法):利用分解事务矩阵来压缩存放数据库的相关信息,进而对分解事务矩阵进行关联规则挖掘;优化了由频繁k-1项目集生成频繁k项目集的连接过程;提出了一种不需要扫描数据库,利用行集"与运算"快速计算支持数的方法,改进算法挖掘所有的频繁项目集只需扫描数据库两次。实验结果表明,改进算法在最小支持度较小时效率高于Apriori算法。
Apriori algorithm is a classical algorithm that detects frequent item sets of association rules in data mining field, but it has defects in generating a huge number of candidate item sets and scanning the database many times. Therefore,this paper proposed a new algorithm named CApriori. It used the decomposed transaction matrix to compress information of database, then mined association rules in decomposed transaction matrix. It also optimized the process of connection when generating frequent k item sets from frequent k- 1 item sets. It put forward a kind of method using the "and operation"to calculate the supports of all candidate item sets without scanning the database. The new algorithm only needs to scan database two times. The experimental results prove that the improved algorithm is more effective than Apriori algorithm when the minimum support is low.
出处
《计算机应用》
CSCD
北大核心
2014年第A02期113-116,共4页
journal of Computer Applications