期刊文献+

ConvFormer:基于Transformer的视觉主干网络 被引量:2

ConvFormer:Vision Backbone Network Based on Transformer
下载PDF
导出
摘要 针对主流Transformer网络仅对输入像素块做自注意力计算而忽略了不同像素块间的信息交互,以及输入尺度单一导致局部特征细节模糊的问题,本文提出一种基于Transformer并用于处理视觉任务的主干网络ConvFormer. ConvFormer通过所设计的多尺度混洗自注意力模块(Channel-Shuffle and Multi-Scale attention,CSMS)和动态相对位置编码模块(Dynamic Relative Position Coding,DRPC)来聚合多尺度像素块间的语义信息,并在前馈网络中引入深度卷积提高网络的局部建模能力.在公开数据集ImageNet-1K,COCO 2017和ADE20K上分别进行图像分类、目标检测和语义分割实验,ConvFormer-Tiny与不同视觉任务中同量级最优网络RetNetY-4G,Swin-Tiny和ResNet50对比,精度分别提高0.3%,1.4%和0.5%. To solve the problem that the mainstream network based on Transformer only does self-attention com-putation on the input pixel blocks and ignores the information interaction between different pixel blocks,as well as the blurring of local feature details due to a single input scale,a backbone network based on Transformer and used for pro-cessing vision tasks is proposed called ConvFormer.ConvFormer aggregates the semantic information between multi-scale pixel blocks through the designed channel-shuffle and multi-scale attention(CSMS)and dynamic relative position coding(DRPC)modules,as well as introduces deep convolution in the feedforward network to improve the local model-ing capability of the network.In the image classification,target detection,and semantic segmentation experiments on public datasets ImageNet-1K,COCO 2017,and ADE20K,ConvFormer-Tiny compares with the optimal networks of the same magnitude RetNetY-4G,Swin-Tiny,and ResNet50 in different vision tasks,the accuracy is improved by 0.3%,1.4%,and 0.5%.
作者 胡杰 昌敏杰 徐博远 徐文才 HU Jie;CHANG Min-jie;XU Bo-yuan;XU Wen-cai(School of Automotive Engineering,Wuhan University of Technology,Wuhan,Hubei 430070,China;Hubei Key Laboratory of Advanced Technology for Automotive Components,Wuhan University of Technology,Wuhan,Hubei 430070,China;Hubei Collaborative Innovation Center for Automotive Components Technology,Wuhan University of Technology,Wuhan,Hubei 430070,China;Hubei Research Center for New Energy&Intelligent Connected Vehicle,Wuhan University of Technology,Wuhan,Hubei 430070,China)
出处 《电子学报》 EI CAS CSCD 北大核心 2024年第1期46-57,共12页 Acta Electronica Sinica
基金 湖北省重大科技专项(No.2020AAA001,No.2022AAA001)。
关键词 机器视觉 自注意力 主干网络 TRANSFORMER machine vision self-attention backbone network Transformer
  • 相关文献

参考文献1

二级参考文献2

共引文献64

同被引文献3

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部