基于小样本无梯度学习的卷积结构预训练模型性能优化方法被引量：1

Derivative-free few-shot learning based performance optimization method of pre-trained models with convolution structure

下载PDF

导出

摘要针对卷积结构的深度学习模型在小样本学习场景中泛化性能较差的问题,以AlexNet和ResNet为例,提出一种基于小样本无梯度学习的卷积结构预训练模型的性能优化方法。首先基于因果干预对样本数据进行调制,由非时序数据生成序列数据,并基于协整检验从数据分布平稳性的角度对预训练模型进行定向修剪;然后基于资本资产定价模型(CAPM)以及最优传输理论,在预训练模型中间输出过程中进行无需梯度传播的正向学习并构建一种全新的结构,从而生成在分布空间中具有明确类间区分性的表征向量;最后基于自注意力机制对生成的有效特征进行自适应加权处理,并在全连接层对特征进行聚合,从而生成具有弱相关性的embedding向量。实验结果表明所提出的方法能够使AlexNet和ResNet卷积结构预训练模型在ImageNet 2012数据集的100类图片上的Top-1准确率分别从58.82%、78.51%提升到68.50%、85.72%,可见所提方法能够基于小样本训练数据有效提高卷积结构预训练模型的性能。 Deep learning model with convolution structure has poor generalization performance in few-shot learning scenarios.Therefore,with AlexNet and ResNet as examples,a derivative-free few-shot learning based performance optimization method of convolution structured pre-trained models was proposed.Firstly,the sample data were modulated to generate the series data from the non-series data based on causal intervention,and the pre-trained model was pruned directly based on the co-integration test from the perspective of data distribution stability.Then,based on Capital Asset Pricing Model(CAPM)and optimal transmission theory,in the intermediate output process of the pre-trained model,the forward learning without gradient propagation was carried out,and a new structure was constructed,thereby generating the representation vectors with clear inter-class distinguishability in the distribution space.Finally,the generated effective features were adaptively weighted based on the self-attention mechanism,and the features were aggregated in the fully connected layer to generate the embedding vectors with weak correlation.Experimental results indicate that the proposed method can increase the Top-1 accuracies of the AlexNet and ResNet convolution structured pre-trained models on 100 classes of images in ImageNet 2012 dataset from 58.82%,78.51%to 68.50%,85.72%,respectively.Therefore,the proposed method can effectively improve the performance of convolution structured pre-trained models based on few-shot training data.

作者李亚鸣邢凯邓洪武王志勇胡璇 LI Yaming;XING Kai;DENG Hongwu;WANG Zhiyong;HU Xuan(School of Computer Science and Technology,University of Science and Technology of China,Hefei Anhui 230027,China;Suzhou Institute for Advanced Research,University of Science and Technology of China,Suzhou Jiangsu 215123,China)

机构地区中国科学技术大学计算机科学与技术学院中国科学技术大学苏州高等研究院

出处《计算机应用》 CSCD 北大核心 2022年第2期365-374,共10页 journal of Computer Applications

关键词资本资产定价模型 Wasserstein距离无梯度学习自注意力机制预训练模型 Capital Asset Pricing Model(CAPM) Wasserstein distance derivative-free learning self-attention mechanism pre-trained model

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]