摘要
数据采集方式的增多导致单处理器下的关联规则挖掘受到I/O和内存的限制。针对该问题,对传统挖掘算法进行改进。借助Hadoop平台的优势,通过累加迭代的方法降低算法的时间复杂度,并利用MapReduce编程特点,通过一次遍历和MapReduce任务调度完成频繁项集挖掘,在强关联挖掘中通过Sqoop组件将外部表Hive中的数据迁移到Redis,实现数据的高速读取。实验结果表明,该方法可有效提高挖掘效率,提高幅度随数据集规模同步增大,并且具有较好的加速比和扩展性。
Development of ways for data acquisition leads to limit of traditional association rule mining by I/O and memory. Aiming at this problem, this paper puts forward an improved method, which uses advantages of the Hadoop platform,reduces the time complexity of the algorithm by incremental iterative method, and makes full use of the features of MapReduce programming. It completes the frequent itemset mining through traverse and MapReduce task scheduling, which improves the efficiency of processing. In the mining of strong association, with the help of Sqoop, the external tables are migrated from Hive to Redis, which makes the data read more efficient. Experimental results show that the proposed method can improve processing efficiency. With the data increasing, the advance becomes more obvious, and improved algorithm also has better speedup and scalability, which is able to quickly mine the association rules in large data.
出处
《计算机工程》
CAS
CSCD
北大核心
2016年第10期69-74,79,共7页
Computer Engineering