摘要
作为一项基本的无监督学习任务,聚类旨在将无标签的、混杂的图像数据划分成语义相似的类。最近的一些方法通过引入数据增强,利用对比学习方法学习特征表示和聚类分配,关注模型区分不同语义类的能力,可能导致来自同一语义类样本的特征嵌入被分离的情况。针对以上问题,提出一种结构关系一致的对比聚类方法(Contrastive Clustering with Consistent Structural Relations,CCR),在实例级和聚类级执行对比学习,并且增加关系级别的一致性约束,让模型学习更多来自结构关系的“正数据对”信息,从而减小聚类嵌入被分离所带来的影响。实验结果表明,CCR方法在图像基准数据集上得到了比近年来的无监督聚类方法更优异的结果。模型在CIFAR-10和STL-10数据集上的平均准确度比相同实验设置下的最好方法提升了1.7%,在CIFAR-100数据集上提升了1.9%。
As a basic unsupervised learning task,clustering aims to divide unlabeled and mixed images into semantically similar classes.Some recent approaches focus on the ability of the model to discriminate between different semantic classes by introducing data augmentation,using contrastive learning methods to learn feature representations and cluster assignments,which may lead to situations that feature embeddings from samples with the same semantic class are separated.Aiming at the above problems,a comparative clustering method with consistent structural relations(CCR)is proposed,which performs comparative learning at the instance level and cluster level,and adds consistency constraints at the relationship level.So that the model can learn more information of‘positive data pair’and reduce the impact of cluster embedding being separated.Experimental results show that CCR obtains better results than the unsupervised clustering methods in recent years on the image benchmark dataset.The average accuracy on the CIFAR-10 and STL-10 datasets improves by 1.7%compared to the best methods in the same experimental settings and improves by 1.9%on the CIFAR-100 dataset.
作者
许洁
王立松
XU Jie;WANG Lisong(College of Computer Science and Technology/College of Artificial Intelligence/College of Software,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)
出处
《计算机科学》
CSCD
北大核心
2023年第9期123-129,共7页
Computer Science
基金
基础加强计划重点项目(2019JCJQZD33800)。
关键词
无监督学习
聚类
对比学习
数据增强
过度聚类
Unsupervised learning
Clustering
Contrastive learning
Data Augmentation
Over clustering