摘要
基于Map-reduce,提出了面向多核处理器应用于大规模集群的并行编程方法,应用该方法运行数据挖掘算法Canopy和K-means。针对K-means算法对初始聚类中心敏感,提出了基于Canopy的K-means优化算法。基于实际数据集的实验结果表明,多核Canopy-K-means聚类算法的准确度和执行效率随着核数的增多呈线性增长。
In this paper,we develop a applicable parallel programming method which based on Map-reduce,one that is easily applied to machine learn algorithms Canopy and K-means on multi-core and large cluster.A improved K-means algorithm based on Canopy is presented according to it 's sensitiveity to the initial centers.Our experimental results show basically linear speedup with an increasing number of processors.
出处
《微计算机信息》
2012年第9期486-487,233,共3页
Control & Automation