摘要
Transformer在图像分类任务中具有广泛应用,但在小数据集分类任务中,Transformer受到数据量较少、模型参数量过大等因素的影响,导致分类精度低、收敛速度缓慢。本文提出了一种融合沙漏注意力的渐进式混合Transformer模型。首先,通过下-上采样的沙漏自注意力建模全局特征关系,利用上采样补充下采样操作丢失的信息,同时采用可学习温度参数和负对角掩码锐化注意力的分数分布,避免因层数过多产生过度平滑的现象;其次,设计渐进式下采样模块获得细粒度多尺度特征图,有效捕获低维特征信息;最后,使用混合架构,在顶层阶段使用设计的沙漏注意力,底层阶段使用池化层替代注意力模块,并引入带有深度卷积的层归一化,增加网络局部性。所提方法在T-ImageNet、CIFAR10、CIFAR100、SVHN数据集上进行实验,分类精度可以达到97.42%,计算量和参数量分别为3.41G和25M。实验结果表明,与对比算法相比,该方法的分类精度有明显提升,计算量和参数量有明显降低,提高了Transformer模型在小数据集上的性能表现。
Transformer has a wide range of applications in image classification tasks,but in small dataset classification tasks,Transformer is affected by factors such as small amount of data and excessive amount of model parameters,which leads to low classification accuracy and slow convergence speed.Therefore,a progressive hybrid transformer model with hourglass attention is proposed.Firstly,the global feature relationships are modeled by the hourglass self-attention with down-up sampling,and up-sampling is used to supplement the information lost by the down-sampling operation,while the learning temperature parameters and negative diagonal mask are used to sharpen the fractional distribution of the attention to avoid excessive smoothing due to the excessive number of layers.Secondly,progressive down-sampling modules are designed to obtain the fine-grained multi-scale feature maps,which can effectively capture the low-dimensional feature information.Finally,a hybrid architecture is used,where the designed hourglass attention is used in the top stage,the pooling layer is used in the bottom stage instead of the attention module,and layer normalization with deep convolution is introduced to increase network locality.The proposed method is experimented on T-ImageNet,CIFAR10,CIFAR100,and SVHN datasets,the classification accuracy can reach 97.42%,and the computation and parameters are 3.41G and 25M.The experimental results show that compared with the comparison algorithms,the classification accuracy of the proposed method is significantly improved,with a significant reduction in computation and parameters,which improves the performance of Transformer model on small datasets.
作者
彭晏飞
崔芸
陈坤
李泳欣
PENG Yanfei;CUI Yun;CHEN Kun;LI Yongxin(School of Electronic and Information Engineering,Liaoning Technical University,Huludao 125105,China)
出处
《液晶与显示》
CAS
CSCD
北大核心
2024年第9期1223-1232,共10页
Chinese Journal of Liquid Crystals and Displays
基金
国家自然科学基金(No.61772249)
辽宁省高等学校基本科研项目(No.LJKZ0358)。