期刊文献+

Hadoop框架下的一种改进的Apriori算法 被引量:2

An Improved Aprior Algorithm under Hadoop’s Framework
下载PDF
导出
摘要 常见的基于Hadoop框架的Apriori改进算法在统计支持度时有扫描数据集、候选项集剪枝等方面效率低下且集群间的数据传输有较大的时间开销的问题,提出了一种Apriori的改进算法Apriori_Ind.算法运用Hadoop集群,使用先按事务对数据集分块,再将数据集的格式转换为<项,事务集>的分块处理策略,使算法充分利用分布式计算优势,实现各节点并行的实现候选项集生成与剪枝操作.并利用前项与后项的新结构表示频繁项集,新结构在各节点进行候选项集生成和剪枝时提高算法效率.Apriori_Ind具有减小集群传输代价、加速剪枝等优势.实验表明新算法适合大规模数据挖掘,特别是项的数量较大的情况下,算法性能有明显的提高. The common Apriori improved algorithm based on Hadoop framework has the problems of inefficient data scanning,candidate set pruning,etc.,and large time overhead of data transmission between clusters.An improved Apriori algorithm,Apriori_Ind,is proposed.The algorithm uses Hadoop cluster to divide the data set into blocks by transaction,and then converts the format of the data set into a block processing strategy of<item,transaction set>,so that the algorithm makes full use of the advantages of distributed computing and realizes the parallel implementation of each node.The candidate set is generated and pruned.The new items are used to represent frequent itemsets.The new structure improves the efficiency of the algorithm when generating candidate sets and pruning at each node.Apriori_Ind has the advantage of reducing the cost of cluster transmission and speeding up pruning.Experiments show that the new algorithm is suitable for large-scale data mining,especially in the case of a large number of items,the performance of the algorithm is significantly improved.
作者 王青松 姜富山 WANG Qing-song;JIANG Fu-shan(College of Information,Liaoning University,Shenyang 110036,China)
出处 《辽宁大学学报(自然科学版)》 CAS 2019年第3期257-264,共8页 Journal of Liaoning University:Natural Sciences Edition
基金 国家自然科学基金(61502215)
关键词 APRIORI HADOOP 频繁项集 分布式计算 大数据 MAPREDUCE Apriori Hadoop frequent itemset distributed computation big data MapReduce
  • 相关文献

参考文献9

二级参考文献86

共引文献894

同被引文献25

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部