摘要
谱聚类算法受到度量中尺度因子的影响,同时传统谱聚类算法通过欧氏距离度量样本间相似性也不准确。针对上述问题,提出一种基于传递距离的谱聚类算法。算法首先通过改进传统谱聚类中的度量方式,用基于传递距离的度量方式度量样本间相似性,并构建传递矩阵,接着用传递矩阵做相似度变换构建拉普拉斯矩阵,最终通过求特征值和特征向量完成聚类。基于传递距离的谱聚类算法在人工数据集及UCI数据集上均取得了良好的聚类结果,具有较好的鲁棒性和有效性。
Spectral clustering algorithms are influenced by the mesoscale factors of metrics,and similarity measured by Euclidean distance is not always accurate.In view of this situation,a spectral clustering algorithm based on transitive distance is proposed.The main idea contains three steps.First,a minimum spanning tree is constructed to do a similarity transformation,as a result a transfer matrix is generated.Second,we construct a Laplacian matrix by the transfer matrix of the first step.Data is projected into the eigen-space of this Laplacian matrix.Lastly,the clustering in the space of the second step is done.The experimental results on artificial data sets and UCI data sets show that the spectral clustering algorithm based on the transitive distance has good robustness and effectiveness.
作者
戴天辰
顾正弘
DAI Tian-chen;GU Zheng-hong(Yangzhou Branch,China Telecom,Yangzhou 225002,China;College of Information Engineering,Yangzhou University,Yangzhou 225127,China)
出处
《计算机与现代化》
2018年第12期61-66,共6页
Computer and Modernization
关键词
谱聚类
尺度因子
传递距离
传递矩阵
spectral clustering
mesoscale factors
transitive distance
transitive matrix