摘要
对于传统的图像分类网络而言,卷积神经网络受限于较小且固定的感受野使其忽略了感受野之外的图像特征信息.基于Transformer模型灵活的多头自注意力机制使得其必须依赖于巨大的数据量以减少过拟合的风险,导致模型参数与计算复杂度过于庞大.针对上述问题本文提出了一种名为CSNet的多阶段图像分类模型.在模型浅层阶段利用大核卷积分解的思想扩大卷积层感受野以学习较大范围的特征信息.在深层阶段利用一种高效的自注意力机制,将卷积运算的特性加入自注意力机制中,有效减少了原始自注意力机制局部计算冗余和过分依赖数据的问题.CSNet在CIFAR-10和ImageNet-1K数据集上的分类准确率分别达到98.9%和82.6%,实验表明CSNet的模型性能优于ResNet和Vision Transformer.
For traditional image classification networks,convolutional neural networks are limited by a small and fixed receptive field that ignores information about image features outside the receptive field.The flexible multi-headed self-attention of Transformer-based models makes it necessary to rely on a huge amount of data to reduce the risk of overfitting,resulting in overly large model parameters and computational complexity.Aiming at these problems,this paper proposes a multi-stage image classification model called CSNet.In the shallow stage of the model,the idea of large kernel convolutional decomposition is used to expand the convolutional field to learn a larger range of feature information.In the deeper stage,an efficient self-attention is used to incorporate the features of convolutional operations into the self-attention,which effectively reduces the problems of local computational redundancy and over-reliance on data in the original self-attention.The accuracy of CSNet on CIFAR-10 and ImageNet-1K datasets reached 98.9%and 82.6%respectively,experiments showed that the model performance of CSNet was better than ResNet and Vision Transformer.
作者
田鑫驰
王亚刚
尹钟
陈浩
TIAN Xinchi;WANG Yagang;YIN Zhong;CHEN Hao(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2024年第3期684-691,共8页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61074087)资助.