期刊文献+

基于存储改进的分区并行关联规则挖掘算法 被引量:4

Partitioned parallel association rules mining algorithm based on storage improvement
下载PDF
导出
摘要 针对现有算法存储结构简单、生成大量冗余的候选集、时间和空间复杂度高、挖掘效率不理想的情况,为了进一步提高关联规则算法挖掘频繁集的速度,优化算法的执行性能,提出基于内存结构改进的关联规则挖掘算法。该算法基于Spark分布式框架,分区并行挖掘出频繁集,提出在挖掘过程中利用布隆过滤器进行项目存储,并对事务集和候选集进行精简化操作,进而达到优化挖掘频繁集的速度、节省计算资源的目的。算法在占用较少内存的条件下,相比于YAFIM和MR-Apriori算法,在挖掘频繁集效率上有明显的提升,不但能较好地提升挖掘速度,降低内存的压力,而且具有很好的可扩展性,使得算法可以应用到更大规模的数据集和集群,从而达到优化算法性能的目的。 In order to further improve the speed of the association rules mining frequent sets and optimize the execution performance of the algorithm,this paper proposed an association rule mining algorithm based on improved memory structure.Based on the Spark distributed framework,the proposed algorithm mined frequent sets in parallel.It used the Bloom filter to store the project in the mining process,and simplified the operation of the transaction set and the candidate set,so as to optimize the speed of mining frequent sets and save the computing resources.Compared with the YAFIM and the MR-Apriori algorithm,the proposed algorithm has a significant improvement in the efficiency of mining frequent sets under the condition of occupying less memory.The algorithm can not only improve the mining speed and reduce the memory pressure,but also has good scalability,so that the algorithm can be applied to larger data sets and clusters to optimize the performance.
作者 王永贵 谢南 曲海成 Wang Yonggui;Xie Nan;Qu Haicheng(School of Software,Liaoning Technical University,Huludao Liaoning 125105,China)
出处 《计算机应用研究》 CSCD 北大核心 2020年第1期167-171,共5页 Application Research of Computers
基金 国家自然科学基金资助项目(61404069) 国家自然科学基金青年基金资助项目(41701479).
关键词 关联规则 大数据 候选集 布隆过滤器 SPARK association rule big data candidate set Bloom filter Spark
  • 相关文献

参考文献4

二级参考文献66

  • 1陈爱东,刘国华,费凡,周宇,万小妹,貟慧.满足均匀分布的不确定数据关联规则挖掘算法[J].计算机研究与发展,2013,50(S1):186-195. 被引量:18
  • 2宋余庆,朱玉全,孙志挥,杨鹤标.一种基于频繁模式树的约束最大频繁项目集挖掘及其更新算法[J].计算机研究与发展,2005,42(5):777-783. 被引量:21
  • 3马建庆,钟亦平,张世永.基于兴趣度的关联规则挖掘算法[J].计算机工程,2006,32(17):121-122. 被引量:20
  • 4刘学军,徐宏炳,董逸生,钱江波,王永利.基于滑动窗口的数据流闭合频繁模式的挖掘[J].计算机研究与发展,2006,43(10):1738-1743. 被引量:26
  • 5倪坚.对Apriori算法的一个改进[J].大连交通大学学报,2007,28(2):88-89. 被引量:9
  • 6Ranger C,Raghuraman R,Penmetsa A.Evaluating MapReduce for multi-core and multiprocessor systems[C]//Proceedings of the 2007 IEEE 13th Internationanl Symposium on High Performance Computer Architecture.Washington:IEEE Computer Society,2007:13-24.
  • 7Wegener D,Mock M,Adranale D,et al.Toolkit-based high-performance data mining of large data on MapReduce clusters[C]// Proceeding of the 2009 IEEE International Conference on Data Mining Workshops.Miami:[s.n.],2009:296-301.
  • 8Goncalves C,Assuncao L,Cunha J.Data analytics in the cloud with flexible MapReduce workflows[C]//Proceedings of the 2012 IEEE 4th International Conference on Cloud Computing Technology and Science.Taipei:[s.n.],2012:427-434.
  • 9Dean J,Ghemmawat S.MapReduce:simplied data processing on large clusters[C]//Proceedings of the 6th Sympesium on Operating System Design and Implementation.New York:ACM Press,2004:137-150.
  • 10Shvachko K,Kuang H,Radia S,et al.The hadoop distributed file system[C]//Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologied.Nevada:[s.n.],2010:1-10.

共引文献194

同被引文献51

引证文献4

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部