摘要
针对K‑Means聚类算法利用均值更新聚类中心,导致聚类结果受样本分布影响的问题,提出了神经正切核K‑Means聚类算法(NTKKM)。首先通过神经正切核(NTK)将输入空间的数据映射到高维特征空间,然后在高维特征空间中进行K‑Means聚类,并采用兼顾簇间与簇内距离的方法更新聚类中心,最后得到聚类结果。在car和breast‑tissue数据集上,对NTKKM聚类算法的准确率、调整兰德系数(ARI)及FM指数这3个评价指标进行统计。实验结果表明,NTKKM聚类算法的聚类效果以及稳定性均优于K‑Means聚类算法和高斯核K‑Means聚类算法。NTKKM聚类算法与传统的K‑Means聚类算法相比,准确率分别提升了14.9%和9.4%,ARI分别提升了9.7%和18.0%,FM指数分别提升了12.0%和12.0%,验证了NTKKM聚类算法良好的聚类性能。
Aiming at the problem that the clustering results of K‑Means clustering algorithm are affected by the sample distribution because of using the mean to update the cluster centers,a Neural Tangent Kernel K‑Means(NTKKM)clustering algorithm was proposed.Firstly,the data of the input space were mapped to the high‑dimensional feature space through the Neural Tangent Kernel(NTK),then the K‑Means clustering was performed in the high‑dimensional feature space,and the cluster centers were updated by taking into account the distance between clusters and within clusters at the same time.Finally,the clustering results were obtained.On the car and breast‑tissue datasets,three evaluation indexes including accuracy,Adjusted Rand Index(ARI)and FM index of NTKKM clustering algorithm and comparison algorithms were counted.Experimental results show that the effect of clustering and the stability of NTKKM clustering algorithm are better than those of K‑Means clustering algorithm and Gaussian kernel K‑Means clustering algorithm.Compared with the traditional K‑Means clustering algorithm,NTKKM clustering algorithm has the accuracy increased by 14.9%and 9.4%respectively,the ARI increased by 9.7%and 18.0%respectively,and the FM index increased by 12.0%and 12.0%respectively,indicating the excellent clustering performance of NTKKM clustering algorithm.
作者
王梅
宋晓晖
刘勇
许传海
WANG Mei;SONG Xiaohui;LIU Yong;XU Chuanhai(School of Computer and Information Technology,Northeast Petroleum University,Daqing Heilongjiang 163318,China;Heilongjiang Key Laboratory of Petroleum Big Data and Intelligent Analysis(Northeast Petroleum University),Daqing Heilongjiang 163318,China;Gaoling School of Artificial Intelligence,Renmin University of China,Beijing 100872,China;Beijing Key Laboratory of Big Data Management and Analysis Method(Renmin University of China),Beijing 100872,China)
出处
《计算机应用》
CSCD
北大核心
2022年第11期3330-3336,共7页
journal of Computer Applications
基金
国家自然科学基金资助项目(51774090,62076234)
黑龙江省博士后科研启动金资助项目(LBH‑Q20080)
黑龙江省自然科学基金资助项目(LH2020F003)
黑龙江省高等教育教学改革重点委托项目(SJGZ20190011)。
关键词
神经正切核
K‑Means
核聚类
特征空间
核函数
Neural Tangent Kernel(NTK)
K‑Means
kernel clustering
feature space
kernel function