期刊文献+

大数据环境下频繁项集挖掘的研究 被引量:2

Research on Frequent Itemsets Mining in Large Data Environment
下载PDF
导出
摘要 多种频繁项集挖掘(FIM)方法组合用来对大数据进行挖掘会暴露很多问题。针对暴露的问题,在MapReduce平台上对两种频繁项集挖掘算法进行了研究。采用两种新的大数据集挖掘方法:Dist-Eclat和BigFIM,前者侧重于速度,利用基于k-FIs的简易负荷平衡方案来解决问题。而后者通过先验变体对k-FIs进行挖掘后将找出的频繁项集分配给映射程序,通过优化后在真正大的数据集上运行。最后通过实验证明该方法时间复杂度较低,数据量越大优势将越明显,扩展效果越好。 A variety of mining frequent itemsets(FIM)combination method used for mining on large data will expose many problems.According to the exposed problems to two kinds of frequent itemsets mining algorithm were researched in the platform of MapReduce,This paper adopts two kinds of big new data set mining method:Dist-Eclat and BigFIM.The former focuses on speed,using simple load balancing scheme based on k-FIs to solve the problem.The latter by mining the k-FIs through a priori variants will find frequent item sets assigned to mapping procedures,through optimized operation in a real large data sets.The experiments prove that the time complexity of the method is low.The advantage will be more obvious and the effect of expansion is better,when data quantity is bigger.
作者 李挥剑
出处 《青岛科技大学学报(自然科学版)》 CAS 2015年第2期224-231,共8页 Journal of Qingdao University of Science and Technology:Natural Science Edition
基金 交通运输部应用基础研究(主干学科)项目(2012-319-226-320)
关键词 分布式数据挖掘 频繁项集挖掘 MAPREDUCE HADOOP Eclat算法 distributed data mining FIM MapReduce Hadoop Eclat Algorithm
  • 相关文献

参考文献13

  • 1Agrawal R, Srikant R. Fast algorithms for mining associa- tion rules in large databases [C]//VLDB, Proceedings of 20th International Conference on very Large Data Bases,San- tiago Chile, 2004:487-499.
  • 2Bayardo R J. Efficiently mining long patterns from databases[C]//Special Interest Groupon Management of Data, Seattle Washington, 2004: 85-93.
  • 3Zaki M, Parthasarathy S, Ogihara M, et al. Parallel algo- rithmsfor discovery of association rules [C]//Data Mining and Knowledge,2007:343-373.
  • 4Mobasher B, Dai H, Luo T, et al. Effective personalization based on association rule discovery from web usage data [C]//Proeeedings of the 3rd International Workshop on Web Information and Data Management, 2001: 9-15.
  • 5Dean J,Ghemawat S. MapReduce: Simplified data processing onlarge cluster[C]//USENIX Association, 6th Symposium on Operating Systems Design and Implementation, 2004: 123-129.
  • 6Agrawal R, Sharer J. Parallel mining of association rules [C]//IEEE Transations Knowledge Data Engineering, 2006: 962-969.
  • 7Lin M Y, Lee P Y, Hsueh S C. Apriori-based frequent item- set mining algorithms on MapReduce[J]. International Con- ferenee on Ubiquitous Information Management and Comun- ciation, 2012:26-30.
  • 8Li H, Wang Y, Zhang D,et al. Parallel fp-growth for query recommendation[C]//Proceedings of the 2008 ACM Confer- ence on Recommender Systems, New York, 2008:107-114.
  • 9Zhou L, Zhong Z, Chang J, et al. Balaneedparallel FP- growth with MapReduce[C]//IEEE Youth Conference on Information, Compating and Telecommunications, 2010 : 243- 246.
  • 10Malek M, Kadima H. Searching frequent itemsets by clus- tering data: Towards a parallel approach using mapreduce [C]//Proceeding WISE 2011 and 2012 Workshops Springer Berlin Heidelberg, 2013: 251-258.

同被引文献19

引证文献2

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部