摘要
模糊C均值(FCM)算法是数据聚类分析的主要算法。但在嘈杂环境下,对于抽样大小不一的聚类,数目越多准确性越低,上述弊端可通过替代性FCM(AFCM)的高斯内核映射来解决。鉴于AFCM的不足,提出了针对模糊C均值聚类的广义洛伦兹内核函数。利用该算法对鸢尾数据库进行聚类,将其划分成山鸢尾、变色鸢尾和维吉尼亚鸢尾3类。实验结果表明,广义洛伦兹模糊C均值(GLFCM)可实现对离群聚类和大小不等的聚类数据的分类,其结果优于K均值、FCM、替代性C均值(AFCM)、Gustafson-Kessel(GK)和Gath-Geva(GG)方法,收敛迭代次数比AFCM的更少,其分区索引(SC)效果也好于其他方法。
Fuzzy C means(FCM) algorithm is the main algorithm for data clustering analysis. But in a noisy environ- ment, for the clusters of different sampling sizes, accuracy is low when the number of clusters is large. The above disad- vantages can be sloved through the Gauss kernel mapping of alternative FCM(AFCM). This paper proposed generalized Lorenz kernel function to the fuzzy C means clustering for the deficiency of AFCM. This algorithm was used to analyze the Iris database cluster, to classify the Iris database into three clusters of Iris setosa, Iris versicolour and Iris virginica. Experimental results show that the generalized lorentzian fuzzy C-means(GLFCM) can classify data of outliers and un- equal sized clusters. The GLFCM yields better cluster than K-means(KM), FCM, alternative fuzzy C-means(AFCM), Gustafson-Kessel(GK) and Gath-Geva(GG). It takes less iteration than that of AFCM to converge. Its partition index (SC) is better than the others.
出处
《计算机科学》
CSCD
北大核心
2015年第9期268-271,共4页
Computer Science
基金
黑龙江省智能教育与信息工程重点实验室开放基金项目(1155xnc107)
黑龙江省教育厅科学技术研究项目(12543067)资助
关键词
广义洛伦兹隶属函数
K均值
替代性模糊C均值
聚类
离群聚类
Generalized lorentzian membership function, K-means, Alternative fuzzy C-means, Clustering, Outlier clustering