摘要
传统的Apriori算法需要频繁扫描数据库,导致系统I/O、内存和通信的开销增大,且由于挖掘目标不明确,产生大量的无用或无意义的规则,导致关联规则的挖掘速度不理想,难以满足大数据时代下的数据挖掘需求.针对这些问题,提出了一种基于并行矩阵目标明确的Apriori算法,该算法结合数据划分原理与MapReduce将规则挖掘过程并行化,同时将事务数据库矩阵化使数据库扫描次数减少到两次,并且设定目标项缩小了候选项集的规模和挖掘过程系统开销,提高了算法的性能,使算法更适用于分布式系统进行大数据的挖掘.最后通过实验证明改进后算法具有更好的性能,且挖掘出来的规则更符合用户需求.
The tranditional Apriori algorithm need to scan database frequently,which means increased overhead for system I/O,memory,and communications.At the same time,due to the aimless searching,which will output some useless rules and the association rule mining speed is not ideal,it will be difficult to meet the data mining needs in the big-data age.In order to solve those problems,a new Apriori algorithm is proposed.This algorithm combines the principle of data partitioning and parallelizing the rule mining process with MapReduce.At the meantime,it can reduce the size of the candidate set and the overhead of the mining process by setting target items.It can improve the performance of the algorithm and be more suitable for distributed systems for large data mining.Finally,we use an experiment to prove that improved algorithm has better performance and the minded rules are more meet user needs.
出处
《浙江工业大学学报》
CAS
北大核心
2017年第5期574-579,共6页
Journal of Zhejiang University of Technology