摘要
关联分析是数据挖掘的本质体现,关联规则挖掘就是寻找给定的大量数据项集之间存在的某种规律的过程。Apriori算法是关联规则中最重要的一种挖掘频繁项集的算法,但是它也存在一定的不足。目的为了提高挖掘效率。方法采用实验的方法,在经典Apriori算法的基础上进行改进。结果证明改进的Apriori算法性能优于经典的Apriori算法,尤其是在交易事务条数比较多的情况下,效果更加明显。结论是改进的算法在计算支持度个数时,每次不需要扫描全部数据库,只需要在精简的数据库表中扫描各项所在的行就可以了,大大节省了时间;支持度计数的统计也比较容易,也不会产生过多的冗余,可以在很大程度上降低挖掘的复杂度,提高挖掘算法的效率。
Correlation analysis is the essence of data mining.Mining association rules is the process of searching regular rule between a large given set of data items.Apriori algorithm is one of the most important association rules mining of frequent item sets,but there is certain shortcoming with it.The purpose of this study is to improve the efficiency of mining.Using the method of experiment,efficiency was improved on the basis of classical Apriori algorithm.The results showed that the modified Apriori algorithm outperformed the classical,and the effect was more obvious especially with a larger transaction number.The conclusion is that the improved algorithm does not need to scan the whole database each time when calculating the support number;it only needs to scan the row where the reduced database table is with greatly saved time;the statistics for support count is relatively easy,which won’t produce too much redundancy.Thus it would largely reduce the complexity of mining and improve the efficiency of mining algorithm.
出处
《河北北方学院学报(自然科学版)》
2013年第2期16-21,共6页
Journal of Hebei North University:Natural Science Edition