期刊文献+

基于Hadoop平台的并行DHP数据分析方法 被引量:4

Data analysis method for parallel DHP based on Hadoop
下载PDF
导出
摘要 由候选项集G2生成频繁2-项集岛是关联规则Apriori算法的一个瓶颈。直接哈希修剪(DHP)算法利用一个生成的Hash表见H2减G2中无用的候选项集,以此提高厶的生成效率。但传统DHP算法是一个串行算法,不能有效处理较大规模数据。针对这一问题,提出DHP的并行化算法——H_DHP。首先,对DHP算法并行化策略的可行性进行了理论分析与证明;其次,基于Hadoop平台,把Hash表以的生成以及频繁项集L1、L3~Lk的生成方法进行了并行实现,并借助Hbase数据库生成关联规则。仿真实验结果表明:与传统DHP算法相比,H_DHP算法在数据的处理时间效率、处理数据集的规模大小,以及加速比和可扩展性等方面都有较好的性能。 It is a bottleneck of Apriori algorithm for mining association rules that the candidate set C2 is used to generate the frequent 2-item set L2. In the Direct Hashing and Pruning (DHP) algorithm, a generated Hash table H2 is used to delete the unused candidate item sets in C2 for improving the efficiency of generating L2. However, the traditional DI-IP is a serial algorithm, which cannot effectively deal with large scale data. In order to solve the problem, a DHP parallel algorithm, termed H DHP algorithm, was proposed. First, the feasibility of parallel strategy in DHP was analyzed and proved theoretically. Then, the generation method for the Hash table H2 and frequent item sets L1, L3 - Lk was developed in parallel based on Hadoop, and the association rules were generated by Hbase database. The simulation experimental results show that, compared with the DHP algorithm, the H_DHP algorithm has better performance in the processing efficiency of data, the size of the data set, the speedup and scalability.
作者 杨燕霞 冯林
出处 《计算机应用》 CSCD 北大核心 2016年第12期3280-3284,3291,共6页 journal of Computer Applications
基金 国家科技支撑计划项目(2014BAH11F01 2014BAH11F02) 四川省科技支撑计划项目(15GZ0079)~~
关键词 HADOOP HASH表 APRIORI算法 直接哈希修剪算法 Hadoop Hash table Apriori algorithm Direct Hashing and Pruning (DHP) algorithm
  • 相关文献

参考文献8

二级参考文献158

共引文献342

同被引文献36

引证文献4

二级引证文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部