摘要
针对现有基于划分的聚类算法无法有效聚类簇大小和簇密度有较大差异的非均匀数据的问题,提出一种基于变异系数聚类算法。从聚类优化目标的角度出发,分析了以K-means为代表的划分聚类算法引发"均匀效应"的成因;提出以变异系数度量非均匀数据的分布散度,并基于变异系数定义一种非均匀数据的相异度公式;基于相异度公式定义了聚类目标优化函数,并根据局部优化方法给出聚类算法过程。在合成和真实数据集上的试验结果表明,与K-means、Verify2、ESSC聚类算法相比,本研究提出的非均匀数据的变异系数聚类算法(coefficient of variation clustering for non-uniform data,CVCN)聚类精度提升5%~40%。
Affected by the"uniform effect",a problem existed in the partition-based algorithms remained on open and challenging taskdue to handling. To solve this problem,a clustering algorithm based on coefficient of variation was proposed. The"uniform effect"caused by K-means-type partitioning clustering algorithm from the view of clustering optimization was analyzed. Instead of the squared error,a new measure of dispersion for non-uniform data was proposed relied on the coefficient of variation. The clustering objective optimization function was defined using a new non-uniform data dissimilarity formula,which was proposed based on the coefficient of variation. According to the local optimization method,the clustering algorithm process was given. The experimental results on real and synthetic non-uniform datasets showed that the clustering accuracy of CVCN was better than K-means,Verify2,ESSC.
作者
杨天鹏
徐鲲鹏
陈黎飞
YANG Tianpeng;XU Kunpeng;CHEN Lifei(College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350117, Fujian, China;Digit Fujian Internet-of-Things Laboratory of Environmental Monitoring, Fujian Normal University, Fuzhou 350117, Fujian, China)
出处
《山东大学学报(工学版)》
CAS
北大核心
2018年第3期140-146,共7页
Journal of Shandong University(Engineering Science)
基金
国家自然科学基金资助项目(61175123)
福建省自然科学基金资助项目(2015J01238)
福建师范大学创新团队资助项目(IRTL1704)
关键词
聚类
基于划分聚类
非均匀数据
均匀效应
变异系数
K-MEANS
clustering
partition-based clustering
non-uniform data
uniform effect
coefficient of variation
K-means