摘要
K-means聚类算法中,我们需要输入两个参数,一个是聚类数K,另一个是初始聚类中心,初始聚类中心的选择对聚类结果有较大的影响,传统的K-means聚类算法随机挑选K个聚类中心,而随机挑选的聚类中心难免会取到孤立点,这将对聚类结果产生很大的影响。K值是用户输入,K值选取的不好也将影响聚类效果。论文提出了一种改进的K-means聚类算法,先根据类簇指标确定需要聚类的数K,之后采用基于密度的思想,首先将聚类样本分为核心点、边界点和孤立点,之后排除孤立点和边界点并取核心点的中心点作为K个聚类中心后再进行K-means聚类,实验表明改进后的算法比原始的K-means聚类算法准确性更高。
Two parameters in the K-means algorithm need to be input,the one is the number of the K which is needed to clustering and the other is the initial clustering center. Selecting the initial cluster centers has a large impact on the clustering results in the algorithm of the K-means,the traditional K-means clustering algorithm selects the clustering center randomly,while randomlyselect the cluster center will inevitably take the outlier point,this has a large impact on the clustering results. The number of K is in-puted by users,a bad K also has a large impact on the on the clustering results. This paper proposes an improved K-means cluster-ing algorithm that based on the density of the thought,firstly divides the clustering samples into core point,border point and outlierpoint,then delete the border point and outlier point from the clustering samples and select the clustering center by using the center of clustering samples,the test shows that the improved algorithm has more stability than before.
出处
《计算机与数字工程》
2018年第1期21-24,113,共5页
Computer & Digital Engineering
关键词
K-MEANS聚类
聚类数
聚类中心
密度
孤立点
K-means clustering
clustering number
clustering center
density
outlier point