结合沙漏注意力与渐进式混合Transformer的图像分类方法

Hourglass attention and progressive hybrid Transformer for image classification

下载PDF

导出

摘要 Transformer在图像分类任务中具有广泛应用,但在小数据集分类任务中,Transformer受到数据量较少、模型参数量过大等因素的影响,导致分类精度低、收敛速度缓慢。本文提出了一种融合沙漏注意力的渐进式混合Transformer模型。首先,通过下-上采样的沙漏自注意力建模全局特征关系,利用上采样补充下采样操作丢失的信息,同时采用可学习温度参数和负对角掩码锐化注意力的分数分布,避免因层数过多产生过度平滑的现象;其次,设计渐进式下采样模块获得细粒度多尺度特征图,有效捕获低维特征信息;最后,使用混合架构,在顶层阶段使用设计的沙漏注意力,底层阶段使用池化层替代注意力模块,并引入带有深度卷积的层归一化,增加网络局部性。所提方法在T-ImageNet、CIFAR10、CIFAR100、SVHN数据集上进行实验,分类精度可以达到97.42%,计算量和参数量分别为3.41G和25M。实验结果表明,与对比算法相比,该方法的分类精度有明显提升,计算量和参数量有明显降低,提高了Transformer模型在小数据集上的性能表现。 Transformer has a wide range of applications in image classification tasks,but in small dataset classification tasks,Transformer is affected by factors such as small amount of data and excessive amount of model parameters,which leads to low classification accuracy and slow convergence speed.Therefore,a progressive hybrid transformer model with hourglass attention is proposed.Firstly,the global feature relationships are modeled by the hourglass self-attention with down-up sampling,and up-sampling is used to supplement the information lost by the down-sampling operation,while the learning temperature parameters and negative diagonal mask are used to sharpen the fractional distribution of the attention to avoid excessive smoothing due to the excessive number of layers.Secondly,progressive down-sampling modules are designed to obtain the fine-grained multi-scale feature maps,which can effectively capture the low-dimensional feature information.Finally,a hybrid architecture is used,where the designed hourglass attention is used in the top stage,the pooling layer is used in the bottom stage instead of the attention module,and layer normalization with deep convolution is introduced to increase network locality.The proposed method is experimented on T-ImageNet,CIFAR10,CIFAR100,and SVHN datasets,the classification accuracy can reach 97.42%,and the computation and parameters are 3.41G and 25M.The experimental results show that compared with the comparison algorithms,the classification accuracy of the proposed method is significantly improved,with a significant reduction in computation and parameters,which improves the performance of Transformer model on small datasets.

作者彭晏飞崔芸陈坤李泳欣 PENG Yanfei;CUI Yun;CHEN Kun;LI Yongxin(School of Electronic and Information Engineering,Liaoning Technical University,Huludao 125105,China)

机构地区辽宁工程技术大学电子与信息工程学院

出处《液晶与显示》 CAS CSCD 北大核心 2024年第9期1223-1232,共10页 Chinese Journal of Liquid Crystals and Displays

基金国家自然科学基金(No.61772249) 辽宁省高等学校基本科研项目(No.LJKZ0358)。

关键词小数据集图像分类 TRANSFORMER 沙漏注意力多尺度特征混合架构 image classification for small dataset transformer hourglass attention multi scale features hybrid architecture

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献4

1卢宏涛,张秦川.深度卷积神经网络在计算机视觉中的应用研究综述[J].数据采集与处理,2016,31(1):1-17. 被引量：571
2王旖旎.基于Inception V3的图像状态分类技术[J].液晶与显示,2020,35(4):389-394. 被引量：7
3谢舰,姚剑敏,严群,林志贤.基于深度学习的磁瓦表面缺陷分割与识别[J].液晶与显示,2021,36(5):713-722. 被引量：7
4Wenhai Wang,Enze Xie,Xiang Li,Deng-Ping Fan,Kaitao Song,Ding Liang,Tong Lu,Ping Luo,Ling Shao.PVT v2:Improved baselines with Pyramid Vision Transformer[J].Computational Visual Media,2022,8(3):415-424. 被引量：70

二级参考文献76

1王飞,李定主.模式识别中贝叶斯决策理论的研究[J].科技情报开发与经济,2007,17(7):165-166. 被引量：8
2Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60 (2) 91 110.
3Dalai N, Triggs B. Histograms of oriented gradients for human detection[C]//Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society Conference on. San Diego, USA: IEEE, 2005, 1 886-893.
4Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786) : 504-507.
5Hubel D H, Wiesel T N. Receptive fields, binocular interaction and functional architecture in the catrs visual cortex[J]. The Journal of Physiology, 1962, 160(1): 106-154.
6Fukushima K, Miyake S. Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in posi- tion[J]. Pattern Recognition, 1982, 15(6): 455-469.
7Ruck D W, Rogers S K, Kabrisky M. Feature selection using a multilayer perceptron[J]. Journal of Neural Network Com- puting, 1990, 2(2): 40-48.
8Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors[J]. Nature, 1986,3231 533 538.
9LeCun Y, Denker J S, Henderson D, et al. Handwritten digit recognition with a back-propagation network[C]//Advances in Neural Information Processing Systems. Colorado, USA Is. n. ], 1990: 396-404.
10LeCun Y, Cortes C. MNIST handwritten digit database[EB/OL], http//yann, lecun, com/exdb/mnist, 2010.

共引文献651

1周晓,焦晨,朱开瑄.基于卷积神经网络的废旧塑料瓶颜色分拣系统[J].数字制造科学,2021(3):227-232. 被引量：4
2杜佳峰,王景松,杨宝军,薛勇新,郑春华.基于卷积神经网络的船舶水尺字符识别方法研究[J].中国水运（下半月）,2020(3):1-3. 被引量：1
3吴丽娟,任海清,关贵明,梁岱立,黄尧.基于人脸姿态识别的课堂学习状态反馈系统的设计与实现[J].沈阳师范大学学报（自然科学版）,2022,40(2):127-132. 被引量：2
4杨全.关于高校计算机视觉课程教学的思考[J].计算机产品与流通,2020,9(9):172-172.
5李敏,乔志远,杨易鑫.基于光学遥感影像的舰船检测研究综述[J].网络安全与数据治理,2023,42(S01):106-114.
6傅隆生,宋珍珍,Zhang Xin,李瑞,王东,崔永杰.深度学习方法在农业信息中的研究进展与应用现状[J].中国农业大学学报,2020,25(2):105-120. 被引量：54
7单鹏飞,李晨炜,来兴平,孙浩强,梁旭,陈兴周,符立梅.模拟暗湿工况下煤矸混合体态势热敏图像精准辨识实验[J].煤炭学报,2024,49(S01):483-494.
8范慧鹏,闪恒杰,房哲续,郭江川.用于故障诊断的深度学习分类模型及方法[J].河南电力,2023(S01):51-55. 被引量：2
9杨耿,张业明,侯金利,刘咏炫,鲁骏,周靖.高速公路图像识别技术应用探析[J].中国交通信息化,2022(S01):294-298. 被引量：1
10邴皓哲,赵健淇.基于机器视觉的飞机蒙皮表面缺陷检测方法综述[J].飞机设计,2024,44(3):62-65.

1赵建才,郭强强,汪先林.8D方法在汽车门洞密封条铝带装配质量问题的应用[J].时代汽车,2024(17):127-129.
2姜文涛,高原,袁姮,刘万军.门控机制的图像分类网络[J].电子学报,2024,52(7):2393-2406.
3杨大伟,金从稳,陈方鹏.胃癌患者血清IL-17、IL-33水平与临床病理特征及预后的关系分析[J].现代消化及介入诊疗,2024,29(6):739-744.
4彭贞,胡永近.基于日语特点的“应用型”二外日语教学、考核方式探索与实践[J].宿州学院学报,2024,39(5):76-80.
5张晋瑞,姚锡文,许开立,孙修.矿井智能巡检车CO传感器搭载位置优化[J].东北大学学报（自然科学版）,2024,45(5):721-728.
6杨卫东,周威,向洪涛.慢性萎缩性胃炎患者幽门螺杆菌感染与炎性细胞因子及病理特征的关系[J].中华消化病与影像杂志（电子版）,2024,14(5):459-464.
7张婷,焦志敏,茆佳佳,张雪芬,王彦霏,陈沛宇,金龙.微波辐射计联合云雷达的相对湿度校正方法[J].应用气象学报,2024,35(5):551-563.
8许庆,李嘉伟,曾滨,徐晓达,张灏达.基于实测预应力值的既有混凝土构件性能推定方法[J].工业建筑,2024,54(8):1-8. 被引量：1

液晶与显示

2024年第9期

浏览历史

内容加载中请稍等...

结合沙漏注意力与渐进式混合Transformer的图像分类方法

参考文献4

二级参考文献76

共引文献651

相关作者

相关机构

相关主题

浏览历史