摘要
关联规则挖掘是数据挖掘中常见的一种形式,高效地找出频繁项目集是关联规则挖掘的中心问题.文章在分析生成频繁项目集的AprioriTid算法的基础上,指出了算法中存在由于项目的重复存储而使数据量偏大的问题,提出并证明了“C_(k-1)中支持率小于minsupport的项目集在C_(k-1)中是无用的”的定理,并以此为依据改进了算法.实验表明,改进算法在缩小数据规模方面是行之有效的。
Mining association rule is one of the common forms in data mining, in which the critical problem is to get the frequent itemsets efficiently. AprioriTid algorithm, which is used to construct the frequent itemset, is analyzed in the paper. Based on the analysis, the defect is pointed out that there are too many data due to those items repeatedly saved in the algorithm, and the theorem of the itemset whose support is less than minsupport in Ck-1 is useless in Ck-l is put forward and proved. And then a new algorithm based the theorem is offered. Experiments show that the new algorithm is effective in decreasing data size.
出处
《烟台大学学报(自然科学与工程版)》
CAS
2003年第4期261-264,共4页
Journal of Yantai University(Natural Science and Engineering Edition)