摘要
针对K-means算法需要人为确定聚类个数和随机选取初始聚类中心导致结果陷入局部最优的问题,结合基于密度峰值的聚类算法CFSFDP(Clustering by Fast Search and Find of Density Peaks),提出一种改进的无参数K-means算法。首先,计算样本点的局部密度和离散度。然后,建立决策图,将两个参数组成向量,计算每个点到周围5个点的距离,筛选出距离大于2倍均方差且密度大于平均密度的点作为算法的初始聚类中心,统计聚类中心个数k作为聚类个数,将初始聚类个数k以及初始聚类中心作为K-means算法的初始参数对数据进行聚类。最后,对UCI(University of California,Irvine)数据集、人工建立的高斯数据集以及真实刀具振动数据集3种不同类型的数据集进行聚类。结果表明,所提算法保持传统算法全局最优性,并验证了提出算法的有效性。由于K-means是一种无监督聚类方法,在获得较优刀具状态识别结果的同时,可减少人工数据标定、有监督训练等工作量及运算成本,这对于准确实时提取数控机床刀具运行状态具有较高的实际意义。
For the problem that the K-means algorithm requires manual determination of the cluster numbers and random selection of initial clustering centers,which can fall into local optima,an improved parameter-free K-means algorithm is proposed by combining the density peak-based clustering algorithm CFSFDP(Clustering by Fast Search and Find of Density Peaks).First,the local density and dispersion of the sample points are calculated,then a decision diagram is established,and a vector of two parameters is composed.The distance from each point to the surrounding 5 points is calculated,and those with a distance greater than 2 times the mean square error and a density greater than the average density are filtered out.The filtered point is used as the initial clustering center of the algorithm.The number of statistical clustering centers k is used as the number of clusters,and the initial number of clusters k and the initial clustering centers are used as the initial parameters of the K-means algorithm to cluster data.The algorithm is tested on different types of data sets,including artificially created Gaussian data sets,UCI(University of California,Irvine) data sets,and real tool vibration data sets.The results show that the proposed algorithm maintains the global optimality of the traditional algorithm and validates its effectiveness.Since K-means is an unsupervised clustering method,it can reduce the workload and computational cost of manual data calibration,supervised training,etc.,while obtaining better tool state recognition results,which is of high practical significance for accurate real-time extraction of the operating state of the tool for computerized numerical control machine tools.
作者
吴晓勇
侯秋丰
罗勇
WU Xiaoyong;HOU Qiufeng;LUO Yong(Product Development Department,Zhejiang Xianglong Machinery Company Limited,Ningbo 315311,China)
出处
《吉林大学学报(信息科学版)》
CAS
2023年第5期930-937,共8页
Journal of Jilin University(Information Science Edition)
基金
2021年度宁波市第二批重大科技攻关暨“揭榜挂帅”基金资助项目(科技创新2025重大专项(2022Z018))。
关键词
K-MEANS聚类算法
无参数
数控机床
刀具磨损识别
K-means clustering algorithm
nonparametric
numerical control machine
tool wear identification