摘要
多种频繁项集挖掘(FIM)方法组合用来对大数据进行挖掘会暴露很多问题。针对暴露的问题,在MapReduce平台上对两种频繁项集挖掘算法进行了研究。采用两种新的大数据集挖掘方法:Dist-Eclat和BigFIM,前者侧重于速度,利用基于k-FIs的简易负荷平衡方案来解决问题。而后者通过先验变体对k-FIs进行挖掘后将找出的频繁项集分配给映射程序,通过优化后在真正大的数据集上运行。最后通过实验证明该方法时间复杂度较低,数据量越大优势将越明显,扩展效果越好。
A variety of mining frequent itemsets(FIM)combination method used for mining on large data will expose many problems.According to the exposed problems to two kinds of frequent itemsets mining algorithm were researched in the platform of MapReduce,This paper adopts two kinds of big new data set mining method:Dist-Eclat and BigFIM.The former focuses on speed,using simple load balancing scheme based on k-FIs to solve the problem.The latter by mining the k-FIs through a priori variants will find frequent item sets assigned to mapping procedures,through optimized operation in a real large data sets.The experiments prove that the time complexity of the method is low.The advantage will be more obvious and the effect of expansion is better,when data quantity is bigger.
出处
《青岛科技大学学报(自然科学版)》
CAS
2015年第2期224-231,共8页
Journal of Qingdao University of Science and Technology:Natural Science Edition
基金
交通运输部应用基础研究(主干学科)项目(2012-319-226-320)