摘要
传统的C均值聚类算法是一种硬划分聚类方法,对初始聚类中心的设置敏感,具有聚类中心趋同性问题.为了克服该问题,模糊C均值(FCM)聚类算法被提出.然而,FCM中模糊隶属度的拖尾和翘尾特征却带来了新的问题:一方面,聚类结果更易受噪声和离群点的影响;另一方面,数据簇的可分性下降,聚类结果泛化性差.针对这些问题提出了一种新的具有自适应性的模糊聚类算法,该算法采用正则化技术与软阈值法,模糊隶属度具有明显的稀疏性结构特征;引入了虚拟类,有效降低异常点与离群点对聚类结果的影响,并且解决了FCM所存在的翘尾问题,提高数据簇可分性与类内聚程度.对比相关算法,在人造数据集和UCI数据集,以及图像分割问题上的实验结果验证了该算法的有效性.
As a hard clustering method,the traditional C-means algorithm appears sensitive to the setting of initial clustering centers and is often troubled by the convergence problem of clustering centers.For the purpose of overcoming this defect,fuzzy C-means(FCM)clustering algorithm has been proposed.However,trailing and warping features of fuzzy membership degree in FCM endure new problems.On one hand,clustering results are more susceptible to noise outliers;on the other hand,the separability of clustering decreases and clustering results have a poor generalization ability.In this article,aiming at these problems,we propose a new adaptive fuzzy clustering algorithm,in which the regularization technology and soft threshold are adopted.This algorithm is characterized with obvious sparse structures.Due to the introduction of virtual noise class,the algorithm effectively reduces the influence of outliers and outliers on clustering results,solves the warp-tail problem existing in FCM,and greatly enhances the separability and class cohesion.Comparing to relevant algorithms,experimental results on synthetic datasets,UCI datasets and image segmentation indicate the effectiveness of the proposed algorithm.
作者
高云龙
赖文馨
潘金艳
康丽雯
GAO Yunlong;LAI Wenxin;PAN Jinyan;KANG Liwen(School of Aerospace Engineering,Xiamen University,Xiamen 361102,China;School of Information Engineering,Jimei University,Xiamen 361021,China)
出处
《厦门大学学报(自然科学版)》
CAS
CSCD
北大核心
2021年第6期1001-1010,共10页
Journal of Xiamen University:Natural Science
基金
国家自然科学基金(61203176)
福建省自然科学基金(2013J05098,2016J01756)。
关键词
模糊聚类
稀疏性
正则化
软阈值
fuzzy clustering
sparse
regularization
soft threshold