摘要
针对传统数据分析方法对高维数据进行聚类分析时存在的操作过程繁琐及准确率低等缺陷,提出基于曲线距离分析的嵌入式增强聚类算法(ECE-CDA)。计算高维空间中数据点之间的成对曲线距离并由聚类引导将其映射到低维空间,构造权重函数保持局部拓扑结构不变性。该算法简化了数据分析过程,同时实现降维和聚类,可作为通用的高精度框架。在12个公共数据集上的实验结果表明,该算法能有效进行数据降维并大幅提高模型的聚类精度。
Owing to the cumbersome operation process and low accuracy of traditional data analysis methods for high-dimensional data,an embedded enhanced clustering algorithm(ECE-CDA)based on curvilinear distance analysis is proposed.ECE-CDA calculated the pairwise curvilinear distance between data points in high-dimensional space,then clustering guide dimensionality reduction projected the pairwise curvilinear distance into the low-dimensional space.In order to maintain the local topology structure,a weight function was constructed.ECE-CDA simplified the process of data analysis and implemented dimensionality reduction and clustering simultaneously,which could be used as a general framework.Extensive experiments have been conducted on 12 public data sets.Experimental results show that clustering and dimensionality reduction processes can be performed simultaneously by ECE-CDA,and it can greatly improve the clustering accuracy.
作者
吴艳萍
王红军
李天瑞
邓萍
Wu Yanping;Wang Hongjun;Li Tianrui;Deng Ping(School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,Sichuan,China)
出处
《计算机应用与软件》
北大核心
2021年第10期321-329,共9页
Computer Applications and Software
关键词
曲线距离分析
降维
聚类
随机梯度下降
Curvilinear distance analysis
Dimensionality reduction Clustering
Stochasticgradient descent