期刊文献+

基于MapReduce框架下的K-means聚类算法的改进 被引量:7

Improved K-means Clustering Algorithm Based on MapReduce Framework
下载PDF
导出
摘要 针对K-means算法处理海量数据的聚类效果和速率,提出一种基于MapReduce框架下的K-means算法分布式并行化编程模型。首先对K-means聚类算法初始化敏感的问题,给出一种新的相异度函数,根据数据间的相异程度来确定k值,并选取相异度较小的点作为初始聚类中心,再把K-means算法部署在MapReduce编程模型上,通过改进MapReduce编程模型来加快K-means算法处理海量数据的速度。实验表明,基于MapReduce框架下改进的K-means算法与传统的K-means算法相比,准确率及收敛时间方面均有所提高,并且并行聚类模型在不同数据规模和计算节点数目上具有良好的扩展性。 Aiming at the clustering effect and speed of K-means algorithm in processing massive data, a distributed parallel programming model of K-means clustering algorithm based on MapReduce framework is proposed. First, for the sensitive initialization problem of K-means clustering algorithm, a new dissimilarity function is given, according to the degree of dissimilarity between data, k value is determined, and the point with smaller dissimilarity is selected as the initial clustering center, then the K-means algorithm is deployed on the MapReduce programming model, K-means algorithm speeds up to deal with massive data by improving MapReduce programming model. Experiments show that both accuracy and convergence time of the improved K-means algorithm under MapReduce are improved compared with the traditional K-means algorithm, and the parallel clustering model has good expansivity in different data scales and the number of calculated nodes.
作者 宋阳 石鸿雁 SONG Yang;SHI Hong-yan(School of Science, Shenyang University of Technology, Shenyang 110870, China)
出处 《计算机与现代化》 2019年第8期28-32,43,共6页 Computer and Modernization
基金 国家自然科学基金资助项目(61074005) 辽宁省高等学校优秀科技人才支持计划项目(LR2012005)
关键词 K-MEANS算法 相异度函数 MAPREDUCE模型 K-means algorithm dissimilarity function MapReduce model
  • 相关文献

参考文献13

二级参考文献206

共引文献412

同被引文献67

引证文献7

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部