摘要
聚类分析是数据挖掘领域的重要组成部分之一,而度量学习是聚类分析中的关键性步骤。传统聚类算法中通常使用欧氏距离进行距离度量,但是欧氏距离只关注两两样本之间的距离关系,并没有顾及数据的全局性分布结构。考虑到数据的全局性结构信息,提出了一种新的具有全局性的度量方法——有效距离度量(effective distance metric),其主要思想是通过稀疏重构的方法计算数据样本之间的有效距离。进一步地,将有效距离应用到K-means、K-medoids和FCM(fuzzy C-means)3种经典聚类算法中开发了3种基于有效距离的聚类算法,即EK-means,EK-medoids和EFCM聚类算法。通过与传统聚类算法在UCI标准数据集上的实验结果进行比较,验证了基于有效距离的聚类算法能显著提高聚类效果。
Distance metric learning is a key step in clustering analysis, which is an important sub-domain of data mining.Euclidean distance metric is a quite commonly used local distance metric in clustering algorithms, which only focuses on the distance between two samples. This paper proposes a new global distance metric method, named as the effective distance metric. In the new method, the similarity between two samples is evaluated by using not only the distance between these two samples, but also distances between one specific sample and all the other related ones. Sparse reconstruction coefficients are employed to reflect such global relationship among samples. Then, this paper develops three effective distance-based clustering algorithms, including EK-means, EK-medoids and EFCM, by applying the effective distance to three classical clustering algorithms, i.e., K-means, K-medoids and FCM (fuzzy C-means), respectively.The experimental results on UCI benchmark datasets demonstrate the efficacy of the proposed methods.
作者
光俊叶
刘明霞
张道强
GUANG Junye;LIU Mingxia;ZHANG Daoqiang(College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;College of Information Science and Technology, Taishan University, Taian, Shandong 271021, China)
出处
《计算机科学与探索》
CSCD
北大核心
2017年第3期406-413,共8页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金Nos.61422204
61473149
南京航空航天大学研究生创新实验室开放基金No.kfjj20151605~~
关键词
聚类
距离度量
度量学习
有效距离
clustering
distance metric
metric learning
effective distance