期刊文献+

基于稀疏自编码的多维数据去重聚类算法分析

Analysis of Multi-dimensional Data De-Duplication Clustering Algorithm Based Sparse Self-Coding
下载PDF
导出
摘要 随着科技信息的不断发展,数据量与数据类型与日俱增,针对数据集维度高、重复数据多导致有效信息提取复杂的问题,提出基于改进稀疏自编码器的多维数据聚类算法。算法分为数据处理与聚类分析两大部分,数据处理时首先利用S-SAE中逐层贪婪的原理将高维数据集降维至每组6维的数据集;接着采用映射值匹配机制对降维后的数据集进行重复数据清洗处理,被清洗的值用0替代;然后将处理好的数据投入到K-Means++聚类算法中进行聚类分析;最终构建出TS-SAE-K-Means++多维数据聚类模型,并通过最优化分析得出其最优化参数设置情况。通过对不同基线组合算法的仿真对比分析表明,TS-SAE-K-Means++在聚类轮廓系数S与模型特征值F1评价体系中均优于其它算法组合。这表明提出的算法在解决高维数据内有效信息提取的问题上具有一定的优越性。 With the continuous development of science and technology information,the volume and type of data are increasing day by day.To address the problem of high dimensionality of data sets and complicated extraction of effective information due to many duplicate data,this paper proposes a multi-dimensional data clustering algorithm based on improved sparse self-encoder.The algorithm is divided into two major parts:data processing and clustering analysis.The data processing first uses the layer-by-layer greedy principle in S-SAE to downscale the high-dimensional data set to a 6-dimensional data set in each group;Then the mapped value matching mechanism is used to clean the downscaled data set with duplicate data,and the cleaned values are replaced by O;Then the processed data are put into the K-Means++clustering algorithm for clustering analysis;Finally,a TS-SAE-K-Means++multi-dimensional data clustering model is constructed and its optimal parameter settings are derived by optimization analysis.The simulation comparison analysis of different baseline combination algorithms shows that TS-SAE-K-Means++outperforms other algorithm combinations in the evaluation system of clustering profile coefficient S and model eigenvalue F1.This indicates that the algorithm proposed in this paper has certain superiority in solving the problem of effective information extraction within high-dimensional data.
作者 薛丽香 高丽杰 李占波 XUE Li-xiang;GAO Li-jie;LI Zhan-bo(College of Information Engineering,Zhengzhou University of Science and Technology,Zhengzhou Henan 450064,China;Network Management Center,Zhengzhou University,Zhengzhou Henan 450001,China)
出处 《计算机仿真》 2024年第3期542-547,共6页 Computer Simulation
关键词 改进稀疏自编码器 聚类算法 评级指标 Improved sparse self-encoder Clustering algorithm Rating metrics
  • 相关文献

参考文献7

二级参考文献45

共引文献59

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部