数据增强和自适应自步学习的深度子空间聚类算法

Deep Subspace Clustering Algorithm with Data Augmentation and Adaptive Self-Paced Learning

下载PDF

导出

摘要深度子空间聚类通过联合执行自表达特征学习和聚类分配而取得了比传统聚类更好的性能。尽管在各种应用中出现了大量的深度子空间聚类算法,但是多数算法都无法学习到精准的面向聚类的特征。针对深度子空间聚类方法在学习聚类的特征表示时不够精准、影响最终聚类性能等问题,提出一种改进的深度子空间聚类算法。通过随机移位和旋转对原样本进行数据增强,交替地使用增强样本来训练和优化自编码器,同时更新样本的集群分配,从而学习到更稳健的特征表示。在微调阶段,损失函数中每个增强样本的目标都是将原样本分配到集群中心,目标计算可能出错,目标错误的样本会误导自编码器网络训练,为此,利用一种无需额外超参数的自适应自步学习算法,在每次迭代中选择最具说服力的样本来提高泛化能力。在MNIST、USPS、COIL100数据集上进行实验,结果表明,该算法的准确率分别达到0.931 8、0.893 4、0.723 6,消融实验和敏感性分析结果也验证了算法的有效性。 Deep subspace clustering achieves better performance than traditional clustering by jointly performing self-expressed feature learning and cluster allocation.Despite the emergence of a large number of deep subspace clustering algorithms in various applications,most algorithms are unable to learn accurate clustering-oriented features.In this study,an improved deep subspace clustering algorithm is proposed to address issues such as insufficient accuracy in learning the feature representation of clustering,which affects the final performance of deep subspace clustering methods.Random displacement and rotation are used to enhance the original sample data,whereby the autoencoder is trained and optimized by alternately using enhanced samples while updating the cluster allocation of samples to learn more robust feature representations.In the fine-tuning phase,the goal is for each enhanced sample in the loss function,to allocate the original sample to the cluster center.The target calculation may be wrong,and the sample with the wrong target will mislead the self-encoder network training.Therefore,an adaptive self-paced learning algorithm without additional hyperparameters is used to select the most convincing sample in each iteration to improve generalization ability.Experiments were conducted on the MNIST,USPS,and COIL100 datasets,and the results showed that the accuracy of the algorithm reached 0.9318,0.8934,and 0.7236,respectively.The ablation experiment and sensitivity analysis results also verified the effectiveness of the algorithm.

作者江雨燕陶承凤李平 JIANG Yuyan;TAO Chengfeng;LI Ping(School of Management Science and Engineering,Anhui University of Technology,Maanshan 243032,Anhui,China;School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)

机构地区安徽工业大学管理科学与工程学院南京邮电大学计算机学院

出处《计算机工程》 CAS CSCD 北大核心 2023年第8期96-103,110,共9页 Computer Engineering

基金国家自然科学基金(62006126)。

关键词深度学习子空间聚类数据增强自适应自步学习编码器 deep learning subspace clustering data augmentation adaptive self-paced learning encoder

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献5

1张有健,陈晨,王再见.深度学习算法的激活函数研究[J].无线电通信技术,2021,47(1):115-120. 被引量：23
2孙登第,凌媛,丁转莲,罗斌.基于稀疏子空间聚类的多层网络社团检测[J].计算机工程,2021,47(10):52-60. 被引量：2
3葛君伟,杨广欣.基于共享最近邻的密度自适应邻域谱聚类算法[J].计算机工程,2021,47(8):116-123. 被引量：7
4刘攀登,刘清明.稀疏数据中基于高斯混合模型的位置推荐框架[J].计算机工程,2018,44(1):62-68. 被引量：3
5谢娟英,王艳娥.最小方差优化初始聚类中心的K-means算法[J].计算机工程,2014,40(8):205-211. 被引量：84

二级参考文献43

1张惟皎,刘春煌,李芳玉.聚类质量的评价方法[J].计算机工程,2005,31(20):10-12. 被引量：60
2钱线,黄萱菁,吴立德.初始化K-means的谱方法[J].自动化学报,2007,33(4):342-346. 被引量：32
3袁方,周志勇,宋鑫.初始聚类中心优化的k-means算法[J].计算机工程,2007,33(3):65-66. 被引量：152
4盛骤,谢式千,潘承毅.概率论与数理统计[M].2版.北京:高等教育出版社,1997:18-28.
5Han Jiawei,Kamber M.Data Mining:Concepts and Techniques[M].2nd ed.Beijing,China:China Machine Press,2011.
6Pena J M,Lozano J A,Larranaga P.An Empirical Comparison of Four Initialization Methods for the K Means Algorithm[J].Pattern Recognition Letters,1999,20(10):1027-1040.
7Vance F.Clustering and the Continuous K-Means Algorithm[J].Los Alamos Science,1994,22:138-134.
8Jain A K,Murty M N,Flynn P J.Data Clustering:A Review[J].ACM Computing Survey,1999,31 (3):264-323.
9Kaufman L,Rousseeuw P J.Finding Groups in Data:An Introduction to Cluster Analysis[M].New York,USA:John Wiley & Sons,Inc.,1990.
10Dhillon I S,Guan Yuqiang,Kogan J.Refining Clusters in High Dimensional Text Data[C]//Proceedings of the 2nd SIAM Workshop on Clustering High Dimensional Data.Arlington,USA:[s.n.],2002:59-66.

共引文献112

1高如新,常嘉浩,杜亚博,刘群坡.基于改进YOLOv5s的煤矸石目标检测算法[J].电子测量技术,2023,46(13):95-101. 被引量：1
2张晓倩,曲福恒,杨勇,才华,梁鲜.一种高效的基于初始聚类中心优化的K-means算法[J].长春理工大学学报（自然科学版）,2015,38(4):154-158. 被引量：6
3张小川,严杰,朱常鹏.聚类算法在市政绩效评估中的应用[J].软件导刊,2015,14(11):48-51. 被引量：2
4罗亚松,许江湖,胡洪宁,贺静波,陈占伟.正交频分复用传输速率最大化自适应水声通信算法研究[J].电子与信息学报,2015,37(12):2872-2876. 被引量：6
5蔡宇浩,梁永全,樊建聪,李璇,刘文华.加权局部方差优化初始簇中心的K-means算法[J].计算机科学与探索,2016,10(5):732-741. 被引量：11
6傅城州,汤庸,贺超波,王津凌,袁成哲.基于标签相似度计算的学术圈构建方法[J].计算机科学,2016,43(9):52-56. 被引量：5
7朱纯,吴建华,潘毅.关于优化K-medoids聚类算法搜索策略研究[J].计算机仿真,2016,33(10):244-248. 被引量：4
8张明微,吴海涛.一种优化初始聚类中心的k-means算法[J].上海师范大学学报（自然科学版）,2016,45(5):599-603. 被引量：2
9吕琳,尉永清,任敏,潘晓.基于蚁群优化算法的凝聚型层次聚类[J].计算机应用研究,2017,34(1):114-117. 被引量：14
10刘萍,龚雪飞,简家文,张帆,陈志芸.k-means-RBF集成神经网络在工业尾气检测中的应用[J].宁波大学学报（理工版）,2017,30(1):116-120. 被引量：1

1郑凯东,任辉.基于PSO的K-means聚类优化[J].信息技术与信息化,2023(2):77-80. 被引量：1
2刘雨昀.基于机器学习的商业运营管理研究与实现[J].电脑与信息技术,2023,31(4):97-102.

计算机工程

2023年第8期

浏览历史

内容加载中请稍等...

数据增强和自适应自步学习的深度子空间聚类算法

参考文献5

二级参考文献43

共引文献112

相关作者

相关机构

相关主题

浏览历史