摘要
设计了一种基于MapReduce框架并行挖掘大数据频繁模式的算法,算法首先研究了运用位图计算发现数据集频繁模式的方法;并对传统MapReduce框架进行扩展,增加了位图计算和不重要模式剪枝等计算功能;为了提高大数据模式挖掘的性能,还设计模式剪枝算法来识别并删除数据集中的不重要模式.最后,实验结果表明,该算法具有很强的可扩展性,并优于其它同类算法.
This paper proposed a parallel algorithm of mining frequent patterns in big data using extended MapReduce. First,we analyze the frequent pattern mining method using bitmap computation by scanning the dataset only once. Secondly,we extended the traditional MapReduce frame by adding the bitmap computation and frequent mining function. In order to improve the performance of mining big data,an algorithm of pruning insignificant patterns in dataset is also presented. Finally,the experimental results show that the proposed method is efficient,strong in scalability,and prior to analogous algorithms.
出处
《小型微型计算机系统》
CSCD
北大核心
2014年第7期1599-1603,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61262033
61262009
61363075)资助
江西省教育厅科技项目(GJJ13303)资助