摘要
深度聚类是一种结合深度学习进行数据表征学习的聚类方法.这种方法在聚类的基础上,利用深度学习技术来学习数据的内在结构和特征,从而更有效地进行大规模数据的聚类,在推荐系统和异常检测等领域有广泛的应用.然而,目前的深度聚类方法存在以下2个问题:1)传统的深度聚类网络没有充分利用图节点先验分布的信息;2)基于对比学习的深度聚类网络平等对待每一个样本,降低了模型的区分度.基于此,提出了一种结合伪标签和动态更新权重的聚类网络.该方法通过对原始图节点聚类得到伪标签,应用于交叉视图相似度矩阵生成正负样本对,以便模型能够正确地学习区分正负样本.之后,样本对根据自身相似度值计算自适应权重,再通过权重更新样本对的损失梯度.此外,在损失函数中引入类内类间阈值来寻找样本对相似度的最优值.在6个真实数据集上进行节点聚类实验,证明了该方法的优越性和有效性.
Deep clustering was usually regarded as a clustering method that combined deep learning for data representation learning.Based on clustering,this method utilized deep learning techniques to learn the inherent structure and characteristics of data,enabled more effective clustering of large-scale data.It had a wide range of applications in areas such as recommendation systems and anomaly detection.However,current deep clustering methods suffered from two main issues:1)traditional deep clustering networks failed to fully utilize the prior distribution information of graph nodes;and 2)contrastive learning-based deep clustering networks treated every sample equally,reduced the model′s discrimination.To address these issues,it was proposed a clustering network that combined pseudo-labeling and dynamic weight updating(ACGP).This method clustered the original graph nodes to obtain pseudo-labels,which were then applied to generate positive and negative sample pairs for cross-view similarity matrix,enabled the model to correctly learn to distinguish between positive and negative samples.Subsequently,sample pairs calculated adaptive weights based on their own similarity values,and then updated the sample pair loss gradients through weight updating.Additionally,it was also introduced intra-class and inter-class thresholds into the loss function to find the optimal similarity value for sample pairs.Furthermore,node clustering experiments were conducted on six real datasets to demonstrate the superiority and effectiveness of the presented method.
作者
张鑫煜
徐慧英
陈宇杭
朱信忠
ZHANG Xinyu;XU Huiying;CHEN Yuhang;ZHU Xinzhong(School of Computer Science and Technology,Zhejiang Normal University,Jinhua 321004,China)
出处
《浙江师范大学学报(自然科学版)》
CAS
2024年第4期404-412,共9页
Journal of Zhejiang Normal University:Natural Sciences
基金
国家自然科学基金资助项目(62376252,61976196)
浙江省自然科学基金重点资助项目(LZ22F030003)。
关键词
自监督学习
深度聚类
对比学习
聚类伪标签
自适应权重
self-supervised learning
deep clustering
contrastive learning
cluster pseudo-labels
adaptive weights