摘要
在经典的模糊C均值(FCM)算法中,聚类数需要预先给出,否则算法无法工作,这在一定程度上限制了FCM算法的应用范围。针对FCM算法中聚类数需要预先设定问题,提出了一种新的模糊聚类有效性指标。首先,通过运行FCM算法得到隶属度矩阵;然后,通过隶属度矩阵计算类内紧密性和类间重叠性;最后,利用类内的紧密性和类间的重叠性定义了一个新的聚类有效性指标。该指标克服了FCM算法中类数需要预先设定的缺点,利用该指标可以发现最符合数据自然分布的类的数目。通过对人工数据集和实际数据集的测试表明,对于模糊因子取1.8,2.0和2.2三个不同的常用值,均能发现最优聚类数。
It is necessary to pre-define a cluster number in classical Fuzzy C-means (FCM) algorithm. Otherwise, FCM algorithm can not work normally, which limits the applications of this algorithm. Aiming at the problem of pre-assigning cluster number for FCM algorithm, a new fuzzy cluster validity index was presented. Firstly, the membership matrix was got by running the FCM algorithm. Secondly, the intra class compactness and the inter class overlap were computed by the membership matrix. Finally, a new cluster validity index was defined by using the intra class compactness and the inter class overlap. The proposal overcomes the shortcomings of FCM that the cluster number must be pre-assigned. The optimal cluster number can be effectively found by the proposed index. The experimental results on artificial and real data sets show the validity of the proposed index. It also can be seen that the optimal cluster number are obtained for three different fuzzy factor values of 1.8, 2.0 and 2.2 which are general used in FCM algorithm.
出处
《计算机应用》
CSCD
北大核心
2014年第8期2166-2169,共4页
journal of Computer Applications
基金
国家自然科学基金资助项目(61105059)
关键词
模糊聚类
模糊C均值算法
有效性指标
模糊因子
最佳聚类数
fuzzy clustering
Fuzzy C-Means (FCM) algorithm
validity index
fuzzy factor
optimal cluster number