摘要
针对K-means在处理海量数据时,因初始聚类中心的选取不确定,从而导致收敛速度过慢的问题,本文提出了改进的K-means算法,首先用模糊聚类的思想对数据集进行模糊分类,其次采用动态计算聚类中心的方式对数据集进行二次分类,最后将算法在MapReduce模型上进行了实现.实验结果表明,改进后的算法不仅提高了加速比,而且算法的收敛速度更快.
Because the selection of the initial clustering center is not sure, K-means algorithm has slow conver- gence speed when it is dealing with massive amounts of data. This paper introduced an improved k-means algorithm. Firstly, the idea of fuzzy clustering is introduced to classify the datasets. Secondly, the datasets are reclassified by means of dynamic clustering center. Finally, the improved algorithm is tested on MapReduce programming model. The experimental results show that the improved algorithm not only has a higher speedup, but also has a faster convergence.
出处
《哈尔滨理工大学学报》
CAS
北大核心
2016年第1期31-35,共5页
Journal of Harbin University of Science and Technology
基金
黑龙江省教育厅科学技术研究项目(12531107)