摘要
密度峰值聚类算法(DPC算法)虽然具有简单高效的优点,但存在着需要人为确定截断距离的不足,从而造成聚类结果出现不准确。为解决这一问题,本文提出了一种基于K近邻的改进算法。该算法引入信息熵,采用属性加权的距离公式进行聚类,这样就解决了不同属性的权重影响问题;在聚类过程中通过计算数据点的近邻密度,再利用KNN近邻算法实现自动求解截断距离,据此得到聚类中心再进行聚类,通过实验证明,该算法在准确性、运行效率上均有不同程度的提升。
Although DPC algorithm is simple and efficient, it needs to determine the truncation distance manually, which results in inaccurate clustering results. To solve this problem, an improved algorithm based on K-nearest neighbor is proposed. In this algorithm, information entropy is introduced, and attribute weighted distance formula is used to cluster, which solves the problem of weight influence of different attributes. In the process of clustering, the nearest neighbor density of data points is calculated, and then KNN algorithm is used to automatically solve the truncation distance, and then clustering is obtaine.
作者
罗军锋
锁志海
郭倩
LUO Jun-feng;SUO Zhi-hai;GOU Qian(Net&Information center,xi’an jiaotong University,Xi’an 710049,China)
出处
《软件》
2020年第7期185-188,共4页
Software
关键词
聚类
密度峰值
局部密度
聚类中心
信息熵
K近邻
截断距离
相对距离
Clustering
Density peak
Local density
Clustering center
Information entropy
K-nearest-neighbor
Truncation distance
Relative distance