摘要
通过对Apriori算法的思想和性能的分析,认为Apriori算法存在以下三点不足:(1)由K阶频繁集生成K+1阶候选频繁集时,在K+1阶候选频繁集中过滤掉非频繁集的策略值得进一步改进;(2)连接程序中相同的项目重复比较太多,因而其效率值得进一步改进;(3)在回扫数据库时有许多不必比较的项目或事务重复比较。根据上述三点不足,提出了相应的三种优化策略来优化Apriori算法,得到一效率较高的改进Apriori算法。
In this paper,after the principle and efficiency of the Apriori algorithm are analyzed,authors think that there are three following disadvantages:First,when candidate frequent(K+1)-itemsets are generated from frequent K-itemsets,the strategy of deleting infrequent (K+1)-itemsets from candidate frequent(K+1)-itemsets is not perfect;Second,the join procedure of Apriori algorithm is not very good because the comparison times of the same item is excessive;In the end,many items or transactions which need not be scanned are scanned repeatedly.According to the above three disadvantages,three corresponding optimized strategies are used to optimize the Apriori algorithm,and present an improved Apriori algorithm,which is more efficient than the original Apriori Algorithm.
出处
《计算机工程与应用》
CSCD
北大核心
2004年第36期190-192,202,共4页
Computer Engineering and Applications