摘要
大数据和高度并行的计算架构的时代已经来临,如何让传统的串行数据挖掘方法在当下获得更高的效率是一个值得探讨的问题。根据现代GPU大规模并行运算架构的特点(单结构多数据),对传统的串行Apriori算法进行并行化处理。使用最新的CUDA技术完成对传统串行Apriori算法中的支持度统计、候选集生成这两个计算的并行化实现,讨论了多种实现方法的差异,并提出改进方案。实验表明:改进后的并行算法使支持度统计在10000条事务的条件下效率提高16%,候选集生成在10000条事务的条件下效率提高25%。
Big data and parallel computation era have come,and it is a trend to convert serial data mine algorithm into parallel algorithm to take advantage of cheap machine. In this paper two main steps, namely support counting and candidate set generation in serial apriori algorithm, were rebuilt parallelly on CUDA architecture. Meanwhile the difference between various implements of parallel apriori was compared to find a better solution. Finally, the experiments indicate that the time of support counting and candidate set generation decreases 16% and 25% respectively on a data set containing 10000 items.
出处
《计算机科学》
CSCD
北大核心
2014年第10期238-243,共6页
Computer Science
基金
国家海洋公益性行业专项(201305026)资助
关键词
数据挖掘
关联规则
频繁模式
并行算法
Data minint, Association rules, Frequent itemset mining, Parallel agorithm