期刊文献+

基于Hadoop平台的改进关联规则挖掘算法 被引量:15

Improved Association Rule Mining Algorithm Based on Hadoop Platform
下载PDF
导出
摘要 数据采集方式的增多导致单处理器下的关联规则挖掘受到I/O和内存的限制。针对该问题,对传统挖掘算法进行改进。借助Hadoop平台的优势,通过累加迭代的方法降低算法的时间复杂度,并利用MapReduce编程特点,通过一次遍历和MapReduce任务调度完成频繁项集挖掘,在强关联挖掘中通过Sqoop组件将外部表Hive中的数据迁移到Redis,实现数据的高速读取。实验结果表明,该方法可有效提高挖掘效率,提高幅度随数据集规模同步增大,并且具有较好的加速比和扩展性。 Development of ways for data acquisition leads to limit of traditional association rule mining by I/O and memory. Aiming at this problem, this paper puts forward an improved method, which uses advantages of the Hadoop platform,reduces the time complexity of the algorithm by incremental iterative method, and makes full use of the features of MapReduce programming. It completes the frequent itemset mining through traverse and MapReduce task scheduling, which improves the efficiency of processing. In the mining of strong association, with the help of Sqoop, the external tables are migrated from Hive to Redis, which makes the data read more efficient. Experimental results show that the proposed method can improve processing efficiency. With the data increasing, the advance becomes more obvious, and improved algorithm also has better speedup and scalability, which is able to quickly mine the association rules in large data.
出处 《计算机工程》 CAS CSCD 北大核心 2016年第10期69-74,79,共7页 Computer Engineering
关键词 HADOOP平台 MapReduce编程 关联规则 大数据 数据挖掘 Hadoop platform MapReduce programming association rule big data data mining
  • 相关文献

参考文献16

  • 1Han Jiawei, Pei Jian, Yin Yiwen. Mining Frequent Patterns Without Candidate Generation [ C ]//Proceed- ings of ACM SIGMOD International Conference on Management of Data. New York, USA: ACM Press, 2000 : 1-12.
  • 2Agrawal R, Imielinki T, Swami A. Mining Association Rules Between Sets of Items in Large Database [ C ]// Proceedings of ACM SIGMOD International Conference on Management of Data. New York, USA : ACM Press, 1993:207-216.
  • 3Borthakur D, Gray J, Sarma J S, et al. Aiyer: Apache Hadoop Goes Realtime at Facebook [ C ]//Proceedings of the 38th ACM SIGMOD International Conference on Management of Data. New York, USA: ACM Press, 2011 : 1071-1080.
  • 4王智钢,王池社,马青霞.分布式并行关联规则挖掘算法研究[J].计算机应用与软件,2013,30(10):113-115. 被引量:13
  • 5Nguyen D,Vo B, Le B. Efficient Strategies for Parallel Mining Class Association Rules [ J ]. Expert Systems with Applications ,2014,41 (10) :4716-4729.
  • 6陆嘉恒.Hadoop实践[M].北京:机械工业出版社,2012.
  • 7Lu Bingliang,Wei Shuchao. One More Efficient Parallel Initialization Algorithm of K-means with MapReduce [C]// Proceedings of the 4th International Conference on Computer Engineering and Networks. Berlin, Germany: Springer, 2014 : 845 -852.
  • 8Mukhopadhyay D, Agrawal C, Maru D,et al. Addressing Name Node Scalability Issue in Hadoop Distributed File System Using Cache Approach [ C ]//Proceedings of 2014 International Conference on Information Technology. Washington D. C. ,USA:IEEE Press,2014:321-326.
  • 9Yang Xinyue, Zhen Liu, Fu Yan. MapReduce as a Programming Model for Association Rules Algorithm on Hadoop [C ]//Proceedings of the 3rd International Conference on Information Sciences and Interaction Sciences. Washington D. C. , USA: IEEE Press, 2010: 99-102.
  • 10杨勇,高松松.基于MapReduce的关联规则并行增量更新算法[J].重庆邮电大学学报(自然科学版),2014,26(5):670-678. 被引量:10

二级参考文献23

  • 1谈克林,孙志挥.一种FP树的并行挖掘算法[J].计算机工程与应用,2006,42(13):155-157. 被引量:10
  • 2Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases [ C ]//Proceedings of ACM SIGMOD In- ternational Conference on Management of Date, 1993:207 - 216.
  • 3Agrawal R, Srikant R. Fast algorithms for mining association rules [C]//Proceedings of the 1994 International Conference on Very Large Data Bases, 1994:487 - 499.
  • 4Han J, Pei J, Yin Y. Mining Frequent Patterns Without Candidate Gen- eration[ C]//Proceedings of ACM SIGMOD International Conference on Management of Data,2000 : 1 - 12.
  • 5Pramudiono I, Kitsuregawa M. Parallel FP-Growth on PC cluster[ C ]// Proceedings of International Conference on Internet Computing,2003 : 467 - 473.
  • 6Zaiane O R, Mohammad E H, Lu P. Fast parallel association rule mining without candidacy generation[ C]//Proceedings of 1st IEEE International Conference on Data Mining,2001 : 665 - 668.
  • 7Liu L, Li E, Zhang Y, et al. Optimization of frequent item-set mining on multiple-core processors [ C ]//Proceedings of 33 rd International Con- ference on Very Large Data Bases,2007:1275-1285.
  • 8Hand D J. Principles of data mining[J]. Drug safety, 2007,30(7):621-622.
  • 9Hadoop. Open Source of Implementation of Hadoop [DB/OL]. [2013-06-25]. http://hadoop, apache, org.
  • 10Armbrust M, Fox A, Griffith R, et al. A view of cloud computing. Communications of ACM, 2010,53 (4) :50-58.

共引文献21

同被引文献159

引证文献15

二级引证文献44

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部