摘要
在目前视觉Transformer的局部自注意力中,现有的策略无法建立所有窗口之间的信息流动,导致上下文语境建模能力不足.针对这个问题,基于混合高斯权重重组(Gaussian weight recombination,GWR)的策略,提出一种新的局部自注意力机制SGW-MSA(Shuffled and Gaussian window-multi-head self-attention),它融合了3种不同的局部自注意力,并通过GWR策略对特征图进行重建,在重建的特征图上提取图像特征,建立了所有窗口的交互以捕获更加丰富的上下文信息.基于SGW-MSA设计了SGWin Transformer整体架构.实验结果表明,该算法在mini-imagenet图像分类数据集上的准确率比Swin Transformer提升了5.1%,在CIFAR10图像分类实验中的准确率比Swin Transformer提升了5.2%,在MS COCO数据集上分别使用Mask R-CNN和Cascade R-CNN目标检测框架的mAP比Swin Transformer分别提升了5.5%和5.1%,相比于其他基于局部自注意力的模型在参数量相似的情况下具有较强的竞争力.
In the current vision Transformer's local self-attention,the existing strategy cannot establish the information flow between all windows,resulting in the lack of context modeling ability.To solve this problem,this paper proposes a new local self-attention mechanism shuffled and Gaussian window-multi-head self-attention(SGW-MSA)based on the strategy of Gaussian weight recombination(GWR),which combines three different local self-attention forces,and reconstructs the feature map through GWR strategy,and extracts image features from the reconstructed feature map.The interaction of all windows is established to capture richer context information.This paper designs the overall architecture of SGWin Transformer based on SGW-MSA.The experimental results show that the accuracy of this algorithm in the mini-imagenet image classification dataset is 5.1%higher than that in the Swin Transformer,the accuracy in the CIFAR10 image classification experiment is 5.2%higher than that in the Swin Transformer,and the mAP using the Mask R-CNN and Cascade R-CNN object detection frameworks on the MS COCO dataset are 5.5%and 5.1%higher than that in the Swin Transformer,respectively.Compared with other models based on local self-attention,it has stronger competitiveness in the case of similar parameters.
作者
赵亮
周继开
ZHAO Liang;ZHOU Ji-Kai(College of Information and Control Engineering,Xi'an University of Architecture and Technology,Xi'an 710055;Shaanxi Provincial Key Laboratory of Geotechnical and Underground Space Engineering,Xi'an 710055)
出处
《自动化学报》
EI
CAS
CSCD
北大核心
2023年第9期1976-1988,共13页
Acta Automatica Sinica
基金
国家自然科学基金(51209167,12002251)
陕西省自然科学基金(2019JM-474)
陕西省岩土与地下空间工程重点实验室开放基金(YT202004)
陕西省教育厅服务地方专项计划(22JC043)资助。