期刊文献+

基于Hadoop的并行Apriori算法 被引量:1

Parallel apriori algorithm based on Hadoop
下载PDF
导出
摘要 针对经典Apriori算法及其改进算法不能有效处理大规模数据集,提出基于Hadoop-MapReduce编程模型的两种改进算法:HAprioriK,HApriori2。其中HAprioriK需要k个MapReduce Jobs,而HApriori2仅需要2个就能在整个数据集上找到频繁k项集,两种改进算法均充分利用了Hadoop平台的计算优势,可以轻松地处理大量数据。采用IBM的数据集进行改进算法有效性的研究,实验结果表明,HApriori2算法在不同规模的数据集和支持度下,能够有效地挖掘频繁项集,具有比HAprioriK更好的性能。 Aiming at the classical Apriori algorithm and the subsequent improved algorithm can not effectively deal with large-scale data set, two improved algorithms based on Hadoop-MapReduce programming model are proposed: HApriori K,HApriori2. Where HApriori K requires k MapReduce Jobs and HApriori2 requires only 2 to find frequent k-itemsets on the entire dataset. Both of the improved algorithms take advantage of the Hadoop platform's computational advantage and can easily handle large amounts of data. The experimental results show that the HApriori2 algorithm can effectively exploit frequent itemsets and has better performance than that of HApriori K under the different data sets and support degree.
作者 谢建峰 孙剑伟 XIE Jian-feng, SUN Jian-wei(Depavtment of No .5 System, North China Institute of Computing Technology,Beijing 100083,Chin)
出处 《信息技术》 2018年第4期129-133,140,共6页 Information Technology
关键词 MAPREDUCE 并行Apriori算法 数据挖掘 MapReduce parallel Apriori algorithm data mining
  • 相关文献

参考文献7

二级参考文献53

共引文献138

同被引文献10

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部