摘要
针对困难样本挖掘的图聚类算法是最近的研究热点,目前算法存在的主要问题有:对比方法和样本对加权策略缺少良好的融合机制;采样正样本时忽略了视图内部的“假阴性”样本;忽视图级信息对聚类的帮助。针对上述问题,提出困难样本采样联合对比增强的图聚类算法。首先使用自编码器学习嵌入,根据计算的伪标签、相似度、置信度信息为表示学习设计一种自加权对比损失,统一不同视图下节点对比和困难样本对加权策略。通过调整不同置信区域样本对的权重,损失函数驱动模型关注不同类型的困难样本以学习有区分性的特征,提高簇内表示的一致性和簇间表示的差异性,增强对样本的判别能力。其次,图级表示经聚类网络投影,通过聚类对比损失最大化不同视图下聚类的表示一致性。最后联合两种对比损失,利用自监督训练机制进行迭代优化,完成聚类任务。该算法在5个真实数据集上与9个基准聚类算法对比,在4个权威指标上达到最优,聚类性能出色。消融实验表明两个对比模块的有效性和可迁移性。
The graph clustering algorithm for hard samples mining is a recent research hotspot.In the current algorithm,the main problems include the lack of a fusion mechanism for comparing methods and a sample pair weighting strategy;the algorithms ignore“false negative”samples within the view when sampling positive samples and disregarding the help of graph-level information for clustering.To address the issues above,this paper proposed a graph clustering algorithm based on hard sample sampling joint contrast augmentation.Initially,it utilized an autoencoder to learn embeddings,designed a self-weighted contrast loss for representation learning by utilizing the calculated pseudo-label,similarity,and confidence information,and unified the strategies of node comparison and hard sample pair weighting across different views.By adjusting the weights of sample pairs in different confidence regions,the loss function derived the model to focus on different types of hard samples to learn discriminative features,improving the consistency of intra-cluster representation and the distinctiveness of inter-cluster representation and enhancing the ability to discriminate samples.Additionally,the clustering network projected the graph-level representation to maximize the representation consistency of clusters under different views through cluster contrast loss.Finally,combining the two comparison losses,the selfsupervised training is used for iterative optimization to complete clustering.In the comparison with 9 benchmark algorithms on 5 real datasets,this algorithm achieves superior performance on 4 authoritative indicators,highlighting its excellent clustering capabilities.Ablation experiments demonstrate the effectiveness and transferability of the two contrasting modules.
作者
朱玄烨
孔兵
陈红梅
包崇明
周丽华
Zhu Xuanye;Kong Bing;Chen Hongmei;Bao Chongming;Zhou Lihua(School of Information Science&Engineering,Yunnan University,Kunming 650504,China)
出处
《计算机应用研究》
CSCD
北大核心
2024年第6期1769-1777,共9页
Application Research of Computers
基金
国家自然科学基金资助项目(62062066,61762090,61966036,62276227)
2022年云南省基础研究计划重点项目(202201AS070015)
云南省中青年学术和技术带头人后备人才资助项目(202205AC160033)
云南省智能系统与计算重点实验室资助项目(202205AG070003)。
关键词
图表示学习
属性图聚类
对比学习
困难样本挖掘
graph representation learning
attributed graph clustering
contrastive learning
hard sample mining