摘要
为满足大数据实时处理的需求,提出了一种基于划分的关联规则并行分层挖掘算法(Parallel Hierarchical Association Rule Mining,PHARM)。首先,将整个数据库D随机分割成若干个非重叠区域,并行挖掘出局部频繁项集;然后利用先验性质,连接局部频繁项集得全局候选项集;再次扫描D统计出每个候选项集的实际支持度,以确定全局频繁项集。最后,建模分析了该算法的高效性。
To deal with big data's demand of real-time processing,we proposed the parallel hierarchical association rule mining algorithm based on partitioning.First,the algorithm divides the transactions of Dinto n nonoverlapping partitions randomly,and all the local frequent itemsets mining is parallelized.Second,apriori property is utilized to collect frequent itemsets from all partitions and form the global candidate itemsets with respect to D.Then the actual support of each candidate is counted to determine the global frequent itemsets.At last,the algorithm's high efficiency was analyzed by modeling.
出处
《计算机科学》
CSCD
北大核心
2016年第1期286-289,共4页
Computer Science
基金
国家自然科学基金项目(61163010)
甘肃省自然科学基金(1308RJZA194)资助
关键词
大数据
划分
关联规则
并行分层挖掘
高效性
Big data
Partition
Association rule
Parallel hierarchical mining
High efficiency