摘要
针对C均值算法(C-means method,CM)对初值敏感、易陷入局部最优的问题,提出一种优化初值的C均值算法(Optimal initialization-based CM,OICM)。该算法首先计算数据集中每个点的邻域以及邻域密度,选择具有最大邻域密度的点作为第一个聚类中心;然后,从剩余的数据集中选择具有最大邻域密度、且其邻域与已有聚类中心的邻域的连接度满足一定条件的点作为下一个聚类中心,以此类推,直到确定了C个聚类中心;最后,利用C均值算法完成数据集的聚类分析。在仿真数据集和UCI数据集上进行聚类实验,结果表明OICM算法有效地克服了传统C均值算法对初值敏感的缺点,且性能优于其他3种典型的全局C均值算法。
C-means Clustering Method (CM) is a widely for data clustering, which is sensitive to the initial cluster centers and easily leads to local optimum. To solve this problem, an Optimal Initialization-based C-means Method (OI-CM) is proposed. First for each point in the dataset, the neighborhood and neighborhood density are calculated, and the point with the maximum neighborhood density is selected as the first cluster center. Then, the point with the maximum neighborhood density from the rest datasets is selected as the next cluster center, whose neighborhood must have little coupling degree with the neighborhoods of existing cluster centers. This procedure is continued until all the cluster centers are selected. Finally, the CM is utilized to cluster the datasets with the selected cluster centers. Experimental results on simulated and UCI datasets show that the proposed OI-CM can effectively solve the sensitivity defect of the traditional CM to initial duster centers, and has superior performance than other three global CMs.
出处
《吉林大学学报(工学版)》
EI
CAS
CSCD
北大核心
2018年第1期306-311,共6页
Journal of Jilin University:Engineering and Technology Edition
基金
国家自然科学基金项目(61503151)
吉林省自然科学基金项目(10100505)
吉林省重点科技攻关项目(20140204046GX)
关键词
计算机应用
C均值算法
初值敏感
邻域密度
computer application
C-means method
initial value sensitivity
neighborhood density