摘要
为提高关联规则挖掘算法的效率及其对大型数据集的适应性,提出了基于划分的带项目约束的频繁项集挖掘算法Partition CHS Miner。算法按照约束条件裁减数据集,并采用基于约束的超结构CHS(con-straint-based hyper-structure)存储数据。对大型数据集,先将其划分为多个不相交的数据子集,使子集的大小适合主存,然后在子集上采用基于超结构的带项目约束的挖掘算法挖掘出局部频繁项集,最后合并所有子集中的频繁项集形成全局的带约束的候选项集,计算出全局频繁项集。实验证明了算法的有效性。
To improve the efficiency and adaptablity of the algorithms to mine association rules in a large dataset, an algorithm- Partition _ CHS _ Miner for mining frequent itemsets with item constraints based on partition is proposed to mine frequent itemsets. The constraints are employed to reduce the datasets and CHS(constraint-based hyper-structure) is used to store transactions in the algorithm. For a large dataset, the algorithm first divides it into some disjoint sub-datasets whose size is accommodated in the main memory. Then local frequent itemsets are mined in sub-datasets by using constraint-based hyper-structure mining algorithm. At last, all local frequent itemsets are merged into global candidate itemsets and the global frequent itemsets are calculated based on these global candidate itemsets. The results of experiment show the efficiency of the algorithm.
出处
《系统工程与电子技术》
EI
CSCD
北大核心
2006年第7期1082-1086,共5页
Systems Engineering and Electronics
基金
国家"973"计划基础研究发展基金资助课题(G1999032701)
关键词
数据挖据
关联规则
频繁项集
划分
data mining
association rule
frequent itemset
partition