摘要
K-means是一种迭代的聚类分析。当前聚类分析技术正在蓬勃发展,广泛应用于数据挖掘、统计学、机器学习、空间数据库技术、生物学以及市场营销等领域,聚类分析已成为数据挖掘研究领域中一个非常活跃的研究课题。基于此,笔者综合K-means算法简单、效率高、收敛速度快、可扩展性好等特点,通过区域密度法确定k值个数从而解决K-means初期k值选择的难题和初期聚类选择时的随机性使算法陷入局部最优的局面。因为给定初始中心,大大提高了算法的效率和速度,使算法得到了进一步优化。实际证明笔者所提出的算法具有良好的效果。
K-means is an iterative clustering analysis.At present,clustering analysis technology is developing vigorously,which is widely used in data mining,statistics,machine learning,spatial database technology,biology and marketing.Clustering analysis has become a very active research topic in the field of data mining.Based on this,the author synthesizes the characteristics of K-means algorithm,such as simplicity,high efficiency,fast convergence speed and good scalability,and determines the number of K-values by Region density method,so as to solve the problem of K-means initial value selection and the randomness of initial clustering selection,which makes the algorithm fall into the local optimum situation.Given the initial center,the efficiency and speed of the algorithm are greatly improved,and the algorithm is further optimized.Practice has proved that the algorithm proposed by the author has good effect.
作者
赵天星
王晓薇
Zhao Tianxing;Wang Xiaowei(Software College, Shenyang Normal University, Shenyang Liaoning 110034, China)
出处
《信息与电脑》
2019年第1期77-78,共2页
Information & Computer
关键词
新能源
大数据
数据挖掘
聚类分析
new energy
big data
data mining
clustering analysis