摘要
聚类的错误主要表现为两种形式:将原属不同类的数据分到同一个聚类和将原属同一类的数据分到不同聚类.文中提出类内不一致性和类间重叠度两个指标分别度量聚类中出现这两类错误的程度.一个好的模糊分割中包含的聚类错误应尽可能少.同时,聚类紧致度应尽可能大.基于这两个错误度量指标和紧致性度量,提出一种有效性函数来判断模糊聚类的有效性.实验结果表明,提出的有效性函数能有效判断最佳聚类数并且有较好的鲁棒性.
The mistakes in fuzzy clustering can be categorized into two types: classifying data originated from different classes into one cluster and classifying data originated from the same class into different clusters. In this paper, intra-class non-consistency and inter-class overlapping are defined to measure the two kinds of mistakes respectively. A good fuzzy partition is expected to have few clustering mistakes and large compactness. Based on the two mistake measures and cluster compactness, a cluster validity index is proposed to evaluate the clustering results. Experimental results show the effectiveness and the robustness of the proposed validity index in determining optimal number of clusters.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2010年第1期11-16,共6页
Pattern Recognition and Artificial Intelligence
基金
国家"十一五"科技支撑计划资助项目(No.2006BAK08807)
关键词
模糊C均值聚类
有效性函数
类内不一致性
类间重叠度
Fuzzy C-Means Clustering, Validity Index, Intra-Class Non-Consistency, Inter-Class Overlapping