期刊文献+

基于重组性高斯自注意力的视觉Transformer 被引量:1

Vision Transformer Based on Reconfigurable Gaussian Self-attention
下载PDF
导出
摘要 在目前视觉Transformer的局部自注意力中,现有的策略无法建立所有窗口之间的信息流动,导致上下文语境建模能力不足.针对这个问题,基于混合高斯权重重组(Gaussian weight recombination,GWR)的策略,提出一种新的局部自注意力机制SGW-MSA(Shuffled and Gaussian window-multi-head self-attention),它融合了3种不同的局部自注意力,并通过GWR策略对特征图进行重建,在重建的特征图上提取图像特征,建立了所有窗口的交互以捕获更加丰富的上下文信息.基于SGW-MSA设计了SGWin Transformer整体架构.实验结果表明,该算法在mini-imagenet图像分类数据集上的准确率比Swin Transformer提升了5.1%,在CIFAR10图像分类实验中的准确率比Swin Transformer提升了5.2%,在MS COCO数据集上分别使用Mask R-CNN和Cascade R-CNN目标检测框架的mAP比Swin Transformer分别提升了5.5%和5.1%,相比于其他基于局部自注意力的模型在参数量相似的情况下具有较强的竞争力. In the current vision Transformer's local self-attention,the existing strategy cannot establish the information flow between all windows,resulting in the lack of context modeling ability.To solve this problem,this paper proposes a new local self-attention mechanism shuffled and Gaussian window-multi-head self-attention(SGW-MSA)based on the strategy of Gaussian weight recombination(GWR),which combines three different local self-attention forces,and reconstructs the feature map through GWR strategy,and extracts image features from the reconstructed feature map.The interaction of all windows is established to capture richer context information.This paper designs the overall architecture of SGWin Transformer based on SGW-MSA.The experimental results show that the accuracy of this algorithm in the mini-imagenet image classification dataset is 5.1%higher than that in the Swin Transformer,the accuracy in the CIFAR10 image classification experiment is 5.2%higher than that in the Swin Transformer,and the mAP using the Mask R-CNN and Cascade R-CNN object detection frameworks on the MS COCO dataset are 5.5%and 5.1%higher than that in the Swin Transformer,respectively.Compared with other models based on local self-attention,it has stronger competitiveness in the case of similar parameters.
作者 赵亮 周继开 ZHAO Liang;ZHOU Ji-Kai(College of Information and Control Engineering,Xi'an University of Architecture and Technology,Xi'an 710055;Shaanxi Provincial Key Laboratory of Geotechnical and Underground Space Engineering,Xi'an 710055)
出处 《自动化学报》 EI CAS CSCD 北大核心 2023年第9期1976-1988,共13页 Acta Automatica Sinica
基金 国家自然科学基金(51209167,12002251) 陕西省自然科学基金(2019JM-474) 陕西省岩土与地下空间工程重点实验室开放基金(YT202004) 陕西省教育厅服务地方专项计划(22JC043)资助。
关键词 TRANSFORMER 局部自注意力 混合高斯权重重组 图像分类 目标检测 Transformer local self-attention Gaussian weight recombination(GWR) image classification objection detection
  • 相关文献

参考文献3

二级参考文献13

共引文献314

同被引文献9

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部