摘要
针对全局K-means聚类算法穷举样本点导致计算量大的问题,提出一种基于加权空间划分的高效全局K-means聚类算法。算法首先对样本空间进行网格划分,然后提出密度准则与距离准则对网格进行过滤,保留密度较大且相互距离较远的网格作为候选中心网格。为避免全局K-means算法只在样本集中选取候选中心的局限性,提出权重准则和中心迭代策略扩充候选中心,增加候选中心多样性。最后,通过增量聚类方式遍历候选中心得到最终的聚类结果。在UCI数据集上的实验结果表明:与全局K-means算法相比,新算法在保证聚类精度的前提下,计算效率平均提高了89.39%~95.79%。与K-means++、IK-+和近期提出的CD算法相比,新算法精度更高,并且克服了因随机初始化导致的聚类结果不稳定问题。
Aiming at the problem of large amount of calculation caused by exhaustive sample points in global K-means clustering algorithm, this paper proposes an efficient global K-means clustering algorithm based on weighted space partition. Firstly, the sample space is divided into grids, and then the density criterion and distance criterion are proposed to filter the grids, and the grids with large density and far distance from each other are retained as candidate center grids. In order to avoid the limitation that the global K-means algorithm only selects candidate centers in the sample set, the weight criterion and the center iteration strategy are proposed to expand the candidate centers and increase the diversity of the candidate centers. Finally, the candidate centers were traversed by incremental clustering to obtain the final clustering result. The experimental results on UCI data sets show that compared with the global K-means algorithm, the computational efficiency of the new algorithm is improved by 89.39%~95.79% on average under the premise of ensuring the clustering accuracy. Compared with K-means ++, IK-+ and the recently proposed CD algorithm, the new algorithm has higher accuracy and overcomes the problem of unstable clustering results caused by random initialization.
作者
曲福恒
潘曰涛
杨勇
胡雅婷
宋剑飞
魏成宇
QU Fu-heng;PAN Yue-tao;YANG Yong;HU Ya-ting;SONG Jian-fei;WEI Cheng-yu(College of Computer Science and Technology,Changchun University of Science and Technology,Changchun 130022,China;Jilin Technology College of Electronic Information,Jilin 132021 China;College of Information Technology,Jilin Agricultural University,Changchun 130118,China)
出处
《吉林大学学报(工学版)》
EI
CAS
CSCD
北大核心
2024年第5期1393-1400,共8页
Journal of Jilin University:Engineering and Technology Edition
基金
吉林省教育厅科学技术研究项目(JJKH20240422KJ)
大学生创新训练项目(202210193030)
吉林省科技厅科技发展计划重点研发项目(20240304028SF)。
关键词
人工智能
K-MEANS算法
聚类中心
网格划分
权重
增量式聚类
artificial intelligence
K-means algorithm
clustering center
multidimensional grid space
weight
incremental clustering