摘要
为解决k-means聚类算法中异常样本点破坏数据分布,致使簇中心发生较大偏差的问题,通过计算样本点与潜在簇中心的距离赋予样本点不同的权重,降低外点对数据分布的影响,并通过对权重向量施加?;-norm范数在聚类模型中自适应移除外点.采用交替最小化优化算法求解模型,在人工合成数据集和真实数据集上的实验表明,所提模型能有效降低外点对聚类的影响,可得到更有效的聚类效果.
In this paper, to solve the problem of that few outliers can easily destroy the cluster structure, leading to a significant deviation for the obtained centroids in k-means clustering algorithm, we assign different weights on the data points based on their distance from the potential cluster center to alleviate the negative impact on the data structure. Moreover, we also incorporate outlier detection in our clustering model by imposing ?;-norm constraint on weight assignments. To optimize the model, we introduce an efficient alternating minimization algorithm. Extensive experiments on both synthetic and real datasets show the effectiveness of the proposed model.
作者
胡豪杰
陈辉
穆婷婷
姚敏立
何芳
张峰干
Hu Haojie;Chen Hui;Mu Tingting;Yao Minli;He Fang;Zhang Fenggan(Rocket Force Engineering University,Xi’an 710025,China;The Fourth Academy of China Aerospace Science and Technology Corporation,Xi’an 710025,China;Beijing New Era Global Import and Export Co.,Ltd.,Beijing 100027,China)
出处
《南京师范大学学报(工程技术版)》
CAS
2022年第1期75-80,共6页
Journal of Nanjing Normal University(Engineering and Technology Edition)