摘要
针对密度峰聚类分配时,仅考虑样本点与指向点(密度比它大的最近点)之间的距离,不适用于流形聚类(如Circleblock数据集、Lineblobs数据集等)的问题,提出了K近邻相似度优化的密度峰聚类算法。在计算每个点的密度与指向点后,通过相似度函数,找出每个点的K近邻,然后根据K近邻信息判断样本点的指向点是否正确,对于指向错误的点重新寻找正确的指向点,可以有效减少错误分配。在人工数据集和UCI数据集上的实验表明,新算法具有更高的准确率。
For the clustering of density peaks, only the distance between the sample point and the point of pointing(the nearest point of density is bigger than it)is considered, and it is not applicable to the problem of manifold clustering(such as Circleblock data set, Lineblobs data set, etc.). A density peak clustering algorithm with K similarity optimization is proposed. After calculating the density and point of each point, find the K neighborhood of each point by the similarity function, and then judge whether the point of the sample point is correct according to the K proximity information.For the point pointing to the wrong point, it can effectively reduce the error distribution. Experiments on artificial datasets and UCI datasets show that the new algorithm has a higher accuracy rate.
作者
朱庆峰
葛洪伟
ZHU Qingfeng;GE Hongwei(Ministry of Education Key Laboratory of Advanced Process Control for Light Industry(Jiangnan University),Wuxi,Jiangsu 214122,China;School of Internet of Things Engineering,Jiangnan University,Wuxi,Jiangsu 214122,China)
出处
《计算机工程与应用》
CSCD
北大核心
2019年第2期148-153,252,共7页
Computer Engineering and Applications
关键词
聚类
密度峰
相似度
K近邻
clustering
density peaks
similarity
K nearest neighbor