摘要
针对空间分布复杂的数据以及空间分布未知的现实数据聚类问题,设计了一种改进流形距离作为不相似测度。该不相似测度可有效利用所有数据点之间的全局一致性,挖掘无类属数据集的空间分布信息。通过使用该不相似测度,提出了基于改进流形距离K-medoids算法。将新算法与基于已有的流形距离和基于欧氏距离的Kmedoids算法进行性能比较,对八个人工数据集以及USPS手写体数字识别问题的实验结果表明:新算法针对不同结构的测试数据集,在聚类性能上均优于或接近于另外两种K-medoids算法,并且对于各种分布的,无论简单或复杂,凸或者非凸的数据都可以进行聚类。
In this paper, an improved manifold distance based dissimilarity measure was designed to identify clusters in complex distribution and unknown reality data sets. This dissimilarity measure can mine the space distribution information of the data sets with no class labels by utilizing the global consistency between all data points. A K-medoids algorithm based on the improved manifold distance was proposed using the dissimilarity measure. The experimental results on eight artificial data sets with different structure and the USPS handwritten digit data sets indicate that the new algorithm outperforms or performs similarly to the other two K-medoids algorithms based on the existing manifold distance and Euclid distance and has the ability to identify clusters with simple or complex, convex or non-convex distribution.
出处
《计算机应用》
CSCD
北大核心
2013年第9期2482-2485,2657,共5页
journal of Computer Applications