期刊文献+

基于MapReduce的增量数据挖掘研究 被引量:3

Research of incremental data mining based on MapReduce
下载PDF
导出
摘要 频繁项集挖掘是数据挖掘过程中的重要部分,传统数据挖掘算法中常用Apriori算法和FP增长算法来挖掘频繁项集。在实际应用中,传统算法往往不能用于频繁更新的数据库,采用IMBT数据结构能从不断更新的数据库中挖掘频繁项集,但是这将导致存储空间不足和运行效率低下的问题。基于MapReduce的增量数据挖掘能够有效解决这些问题,通过对比基于MapReduce的增量数据挖掘和传统增量数据挖掘的运行时间可以证明,基于Mapeduce的增量数据挖掘更高效。 Frequent itemset mining is an important part of data mining. Apriori and FP-tree are often used to mine frequent itemsets in traditional data mining algorithms. In practical situation, the traditional algorithms often cannot be used in the database which updates frequently. IMBT data structure is used to mine frequent itemsets from a continuously updated database , but this will lead to lack of storage space and the low efficiency. Incremental data mining based on MapReduce can solve these problems , To compare the running time of incremental data mining based on MapReduce and traditional incremental data mining can demonstrate the incremental data mining based on MapReduce is more efficient.
出处 《微型机与应用》 2014年第1期67-70,共4页 Microcomputer & Its Applications
关键词 增量数据挖掘 MAPREDUCE 增量挖掘二叉树 频繁项集 incremental data mining MapReduce IMBT frequent itemset
  • 相关文献

参考文献8

  • 1范明;孟小峰.数据挖掘概念与技术[M]{H}北京:机械工业出版社,2012.
  • 2蒋翠清,胡俊妍.基于FP-tree的最大频繁项集挖掘算法[J].合肥工业大学学报(自然科学版),2010,33(9):1387-1391. 被引量:4
  • 3HONG T P,WANG C Y,TAO Y H. A new incremental data mining algorithm using pre-large itemsets[J].Intelligent Data Analysis,2001,(02):111-129.
  • 4HONG T P,LIN C W,WU Y L. Incrementally fast updated frequent pattern trees[J].{H}Expert systems with application,2008,(04):2424-2435.
  • 5YANG C H,YANG D L. IMBT-a binary Tree for Efficient Support Counting of Incremental Data Mining[J].2009 International Conference on Computational Science &Engineering,2009,(01):324-329.
  • 6刘鹏.云计算[M]{H}北京:电子工业出版社,2011.
  • 7高岚岚.云计算与网格计算的深入比较研究[J].海峡科学,2009(2):56-57. 被引量:24
  • 8LAM C;韩冀中.Hadoop 实战[M]{H}北京:人民邮电出版社,2012.

二级参考文献16

  • 1于红,王秀坤,孟军.用有序FP-tree挖掘最大频繁项集[J].控制与决策,2007,22(5):520-524. 被引量:7
  • 2胡学钢,刘卫,王德兴.基于剪枝概念格的项集知识表示与挖掘[J].计算机工程与应用,2007,43(22):176-178. 被引量:4
  • 3Agrawal R,Imielinski T,Swami A N.Mining association rules between sets of items in large database[C]//Buneman P,Jajodia S.Proc ACM SIGMOD Int Conf Management of Data,1993:207-216.
  • 4Agrawal R,Srikant R.Fast algorithms for mining association rules[C]//Bocca J B,Jarke M,Zaniolo C.Proc 20th Int Conf.Very Large Data Bases,1994:487-499.[2009-08-20].http://citeseer.nj.nec.com/agrawal194fast.html.
  • 5Bayardo R J.Effeciently mining long patterns from databases[C]//Proc of the ACM SIGMOD Int Conf on Management of Data.New York:ACM Press,1998:85-93.
  • 6Lin D,Kedem Z M.Pricer-Search:a new algorithm for discovering the maximum frequent set[C]//Proc of the 6th European Conf on Extending Database Technology.Heidelberg:Springer-Verlag,1998:105-119.
  • 7Gouda K,Zaki M J.Efficiently mining maximal frequent itemsets[C]//Proc IEEE Int Conf Data Mining,2001:163-170.
  • 8Burdick D,Calimlim M,Flannick J,et al.MAFIA:a maximal frequent itemset algorithm[J].IEEE Transactions on Knowledge and Data Engineering,2005:1490-1504.
  • 9Ma Lisheng,Deng Huiwen.Fast algorithm for mining[C]//First International Symposium on Data,Privacy and E-Commerce,2007:86-91.
  • 10Qian Jin,Ye Feiyue.Mining maximal frequent itemsets with frequent pattern list[C]//Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery,Vol 1,2007:628-632.

共引文献26

同被引文献22

  • 1易珺,路璐,曹东.改进的k-means算法在客户细分中的应用研究[J].微型机与应用,2005,24(12):52-53. 被引量:4
  • 2ChuckL.Hadoop实战[M].韩冀中,译.北京:人民邮电出版社,2011:37-41.
  • 3AGRAWAL R, SRIKANT R. Fast algorithms for mining as- sociation rules [C]. Proceedings of the 20th International Conference on Very Large Data Bases. Santiago,Chile, 1994 : 487-499.
  • 4DEAN J, GHEMAWAT S. MapReduce: simplified data processing on large clusters [J]. Communications of the ACM, 2008,51(1) :107-113.
  • 5The apache software foundation. Hadoop[EB/OL]. (2015-07- 08) [2015-08-16]. http://hadoop.apache.org/.
  • 6Wang Jianyong, Han Jiawei, Pei Jian. CLOSET+ :searching for the best strategies for mining frequent closed itemsets[C]. Proceedings of the ninth ACM SIGKDD International Con- ference on Knowledge Discovery and Data Mining, ACM, 2003 : 236-245.
  • 7FELDMAN R, DAGAN I. Knowledge discovery in textual databases (KDT)[C]. Proceedings of 1 International Confer- ence on Knowledge Discovery and Data Mining, Montreal, Canada, 1995:112-117.
  • 8ZANE O, EL-HAJJ M, LU P. Fast parallel association rule mining without candidacy generation[C]. Proceedings of IEEE International Conference on Data Mining, ICDM 2001, 2001:665-668.
  • 9Liu Li, Li E, Zhang Yimin, et al. Optimization of frequent itemset mining on multiple-core processor[C]. Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB Endowment, 2007:1275-1285.
  • 10MADRIA K, BHOWMICK S. Research issue in web data mining [C]. First International Proceedings of Data Ware- housing and Knowledge Discovery, 1999 : 303-312.

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部