摘要
密度峰值聚类是一类具有代表性的聚类分析方法,但针对复杂数据集时,其聚类效果较差。论文利用数据对象的近邻信息,提出了一种密度峰值聚类分析算法。该算法首先采用数据对象的K近邻,计算数据对象局部密度,并通过与其K近邻的密度和距离的比值得到邻域密度比,重新定义了DPC密度计算方法,有效地解决了DPC截断距离dc在选择上的随意性;其次利用数据对象之间的相似性度量,结合影响空间、共享K近邻和密度比,给出了一种新的数据对象之间的相似性度量方法;然后利用数据对象的距离和密度相似的影响因素并与相似近邻结合,改进了FKNN-DPC分配策略。最后采用UCI数据集,实验验证了该算法具有良好的聚类簇效果。
Density peak clustering is a representative cluster analysis method,but its clustering effect is poor for complex data sets.In this paper,a clustering analysis algorithm of density peak is proposed by using the nearest neighbor information of data ob-jects.Firstly,the local density of the data object is calculated by using the k-nearest neighbor of the data object,and the neighbor-hood density ratio is obtained by the ratio of the density and distance of its k-nearest neighbor.The DPC density calculation method is redefined,and the DPC cutoff distance dc is effectively solved.Secondly,using the similarity measure between data objects,combined with influence space,shared k-nearest neighbor and density ratio,a new similarity measure method between data objects is proposed.Then,using the influence factors of distance and density similarity of the data objects and combined with most similar nearest neighbor,FKNN-DPC allocation strategy is improved.Finally,experiments on UCI datasets show that the algorithm has a good cluster effect.
作者
刘昱
胡立华
LIU Yu;HU Lihua(School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024)
出处
《计算机与数字工程》
2023年第6期1250-1255,共6页
Computer & Digital Engineering
关键词
聚类分析
密度峰值
相似性度量
聚类簇扩展
密度
cluster analysis
density peak
similarity measure
cluster expansion
density