摘要
传统K-means聚类算法中聚类初始中心点是随机确定的,实际聚类数据集中可能有孤立点,造成了每次聚类的结果不同,聚类质量不同,有时陷入局部优化状态.针对这些问题,研究者曾试图用距离法解决孤立点的判断和确定初始聚类中心.这种思路存在不科学性.因为孤立点不仅指远离其他点,同时它的周围点稀疏;另外,当数据量过大、数据特征值过多时,算法的运算量大,需要占用大量的计算机资源,运算速度过慢.对传统的K-means聚类算法进行研究,提出了基于密度参数和距离理论的初始聚类中心的确定和孤立点的判断,对传统的K-means聚类算法进行改进.
Traditional K-means clustering algorithm clustering initial centers are randomly determined. The actualclustering data set may have isolated points,resulting in a different outcome of each clustering,and the differentclustering quality,sometimes caused the local optimization status. To solve these problems,researchers have tried touse the distance method to solve an isolated points and determine the initial cluster centers. This idea exists unscientific,because not only the isolated points are far away from other points around,but also the points are sparse;in addition,when the data volume is too large,with too much data characteristic value,large amount of computation algorithm,itwould take a lot of computer resources,the computing speed would be too slow. In this paper,by reseaching thetraditional K-means clustering algorithm,the judgments of initial centers and outliers are proposed based on densityparameters and initial cluster theory of the distance from the centers,and the traditional K-means clustering algorithm isimproved.
出处
《河南科学》
2016年第3期348-351,共4页
Henan Science