摘要
基于一致性的半监督学习方法通常使用简单的数据增强方法来实现对原始输入和扰动输入的一致性预测。在有标签数据的比例较低的情况下,该方法的效果难以得到保证。将监督学习中一些先进的数据增强方法扩展到半监督学习环境中,是解决该问题的思路之一。基于一致性的半监督学习方法MixMatch,提出了基于混合样本自动数据增强技术的半监督学习方法AutoMixMatch,在数据增强阶段采用自动数据增强技术,并在样本混合阶段提出了一种混合样本算法,用于提升对无标签样本的利用效果。通过图像分类方面的实验来测试所提方法的性能,在图像分类基准数据集中,所提方法在3种有标签样本比例下的分类效果均优于对比的几个主流半监督分类方法,验证了所提方法的有效性。此外,所提方法在有标签数据占训练数据比例极低(仅为0.05%)的情况下表现更好,在SVHN数据集上的实验结果表明,所提方法的分类错误率比MixMatch低30.17%。
Consistency-based semi-supervised learning methods typically use simple data augmentation methods to achieve consistent predictions for both original inputs and perturbed inputs.The effectiveness of this approach is difficult to be guaranteed when the proportion of labeled data is relatively low.Extending some advanced data augmentation method in supervised learning to be used in a semi-supervised learning setting is one of the ideas to solve this problem.Based on the consistency-based semi-supervised learning method MixMatch,a semi-supervised learning method AutoMixMatch based on automated mixed sample data augmentation techniques is proposed,which uses a modified automatic data augmentation technique in the data augmentation phase,and a mixed-sample algorithm is proposed to enhance the utilization of unlabeled samples in the sample mixing phase.The performance of the proposed method is evaluated through image classification experiments.In image classification benchmark datasets,the proposed method outperforms several mainstream semi-supervised classification methods in three labeled sample proportions,which validates the effectiveness of the method.In addition,the proposed method performs better with a very low proportion of labeled data to the training data(only 0.05%),and the classification error rate of the proposed method on the SVHN dataset is 30.17%lower than that of MixMatch.
作者
许华杰
陈育
杨洋
秦远卓
XU Hua-jie;CHEN Yu;YANG Yang;QIN Yuan-zhuo(College of Computer and Electronic Information,Guangxi University,Nanning 530004,China;Guangxi Key Laboratory of Multimedia Communications and Network Technology,Nanning 530004,China;College of Civil Engineering and Architecture,Guangxi University,Nanning 530004,China)
出处
《计算机科学》
CSCD
北大核心
2022年第3期288-293,共6页
Computer Science
基金
广西壮族自治区科技计划项目(2017AB15008)
崇左市科技计划项目(FB2018001)。
关键词
半监督学习
一致性
图像分类
自动数据增强
混合样本
Semi-supervised learning
Consistency
Image classification
Automated data augmentation
Mixed sample