期刊文献+

双MapReduce改进的Canopy-Kmeans算法 被引量:6

Improved Canopy-Kmeans Algorithm based on Double-MapReduce
下载PDF
导出
摘要 由于传统的Canopy-Kmeans算法在中心点的选取存在随机性,其迭代过程的冗余计算降低了算法的运行效率.文中基于"最小最大原则"和三角不等式原理,在Hadoop平台上提出了一种基于双MapReduce改进的Canopy-Kmeans算法.实验结果表明:设计的并行算法精确率在不同大小的数据集上平均提高了15.3%,加速比和扩展性随着数据规模和节点的不断增加也相应的提高了1.5~3倍,解决了Canopy中心点选中存在的问题和迭代过程中冗余的距离计算. The Canopy-Kmeans algorithm has the disadvantage of great randomness in the selection of center points,and the redundant computation in the iterative process significantly reduces the operation efficiency of the algorithm.So the paper proposes an improved Canopy-Kmeans algorithm based on the Double-MapReduce on the Hadoop platform,which is based on the " minimum maximum principle" and the principle of triangle inequality.The experimental results show that the precision of the designed parallel algorithm is raised by 15.3% on average,and the speedup and scalability are increased by 1.5to3 times with the increase of the data size and the number of node.The problem existing in the selection of Canopy center point is successfully solved and the redundant distance calculation in iterative is avoided.
作者 刘宝龙 苏金
出处 《西安工业大学学报》 CAS 2016年第9期730-737,共8页 Journal of Xi’an Technological University
基金 陕西省科技统筹创新工程计划项目(2015KTCXSF-10-11) 西安市未央区科技计划项目(201609)
关键词 Canopy-Kmeans 冗余计算 HADOOP平台 双MapReduce Canopy-Kmeans redundant computation hadoop platform double-MapReduce
  • 相关文献

参考文献13

二级参考文献124

共引文献429

同被引文献55

引证文献6

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部