期刊文献+

采用N-list结构的混合并行频繁项集挖掘算法 被引量:6

Hybrid Parallel Frequent Itemsets Mining Algorithm by Using N-List Structure
下载PDF
导出
摘要 针对大数据环境下并行MRPrePost频繁项集挖掘算法中存在计算节点负载不均衡,N-list合并效率低以及冗余搜索等问题,提出了基于N-list结构的混合并行频繁项集挖掘算法HP-FIMBN。首先,设计负载量估计函数(LE)来计算出频繁1项集F-list中每一项的负载量,同时提出基于贪心策略的分组方法(GM-GS)将F-list中的每一项根据其负载量进行均匀分组,既解决了数据划分中计算节点负载不均衡的问题,又降低了集群中各节点上子PPC-Tree树的规模;其次,提出预先放弃策略(EAS),该策略不仅能有效避免合并过程中的无效计算,而且不需要遍历初始N-list结构就能得到最终的N-list,极大地提高了N-list结构的合并效率;最后,采用集合枚举树作为搜索空间,并提出超集等价剪枝策略(SES)来避免挖掘过程中的冗余搜索,生成最终的挖掘结果。实验结果表明,该算法在大数据环境下进行频繁项集挖掘具有较好的效果。 Aiming at the problem of unbalanced load,bad efficiency of N-list merge and redundant search for each node based on parallel frequent itemset mining algorithm MRPrePost(parallel PrePost algorithm based on MapReduce),this paper proposes a hybrid parallel frequent itemset mining algorithm based on N-list(HP-FIMBN).Firstly,a load estimation function(LE)is designed to calculate the load of each item in F-list,and a grouping method based on greedy strategy(GM-GS)is proposed,which not only deals with the problem of unbalanced load in the process of data partitioning,but also decreases the size of sub-PPC-tree on local node.Secondly,in order to accelerate the completion of the merging of two N-list structures,it designs an early abandon strategy(EAS),which can not only efficiently avoid invalid calculations in the merging process,but also does not need to traverse the initial N-list structure to get the final result.Finally,the set-enumeration tree is adopted as the search space,a superset equivalent strategy(SES)is proposed to avoid redundant search in the mining process and the final mining results are generated.Experimental results show that the modified algorithm has better performance on mining frequent itemsets in big data environment.
作者 刘卫明 张弛 毛伊敏 LIU Weiming;ZHANG Chi;MAO Yimin(School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou,Jiangxi 341000,China)
出处 《计算机科学与探索》 CSCD 北大核心 2022年第1期120-136,共17页 Journal of Frontiers of Computer Science and Technology
基金 国家重点研发计划(2018YFC1504705) 国家自然科学基金(41562019) 江西省教育厅科技项目(GJJ151528,GJJ151531)。
关键词 频繁项集挖掘 N-list结构 贪心策略 集合枚举树 超集等价剪枝策略(SES) frequent item mining N-list greedy strategy set-enumeration tree superset equivalent strategy(SES)
  • 相关文献

参考文献11

二级参考文献87

  • 1施亮,钱雪忠.基于Hadoop的并行FP-Growth算法的研究与实现[J].微电子学与计算机,2015,32(4):150-154. 被引量:15
  • 2HaHan J W, Pei J, Yin Y W. Mining frequent itemsets without candidate generation. In: The 2000 ACM SIGMOD International Conference on Management of data (SIGMOD’00), New York, 2000. 1-12.
  • 3AgAgrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. In: The 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD’93), Washington, 1993. 207-216.
  • 4HaHan J, Cheng H, Xin D, et al. Frequent itemset mining: current status and future directions. Data Min Knowl Discov,2007, 15: 55-86.
  • 5BaBaralis E, Cerquitelli T, Chiusano S. IMine: index support for item set mining. IEEE TKDE J, 2009, 21: 493-506.
  • 6ZaZaki M J, Gouda K. Fast vertical mining using diffsets, In: The 9th ACM SIGKDD International Conference on. Knowledge Discovery and Data Mining (SIGKDD’03), Washington, 2003. 326-335.
  • 7DeDeng Z H, Wang Z H. A new fast vertical method for mining frequent itemsets. Int J Comput Intell Syst, 2010, 3:733-744.
  • 8AgAgrawal R, Srikant R. Fast algorithm for mining Association rules. In: The 20th International Conference on Very Large Data Bases (VLDB’94), Santiago de Chile, 1994. 487-499.
  • 9SaSavasere A, Omiecinski E, Navathe S. An efficient algorithm for mining association rules in large databases. In: The21th International Conference on Very Large Data Bases (VLDB’95), Zurich, 1995. 432-443.
  • 10ShShenoy P, Haritsa J R, Sundarshan S, et al. Turbo-charging vertical mining of large databases. In: ACM International Conference on Management of Data and Symposium on Principles of Database Systems (SIGMOD’00), Dallas, 2000.22-33.

共引文献198

同被引文献65

引证文献6

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部