期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Visual attention network 被引量:29
1
作者 meng-hao guo Cheng-Ze Lu +2 位作者 Zheng-Ning Liu Ming-Ming Cheng Shi-Min Hu 《Computational Visual Media》 SCIE EI CSCD 2023年第4期733-752,共20页
While originally designed for natural language processing tasks,the self-attention mechanism has recently taken various computer vision areas by storm.However,the 2D nature of images brings three challenges for applyi... While originally designed for natural language processing tasks,the self-attention mechanism has recently taken various computer vision areas by storm.However,the 2D nature of images brings three challenges for applying self-attention in computer vision:(1)treating images as 1D sequences neglects their 2D structures;(2)the quadratic complexity is too expensive for high-resolution images;(3)it only captures spatial adaptability but ignores channel adaptability.In this paper,we propose a novel linear attention named large kernel attention(LKA)to enable self-adaptive and long-range correlations in self-attention while avoiding its shortcomings.Furthermore,we present a neural network based on LKA,namely Visual Attention Network(VAN).While extremely simple,VAN achieves comparable results with similar size convolutional neural networks(CNNs)and vision transformers(ViTs)in various tasks,including image classification,object detection,semantic segmentation,panoptic segmentation,pose estimation,etc.For example,VAN-B6 achieves 87.8%accuracy on ImageNet benchmark,and sets new state-of-the-art performance(58.2%PQ)for panoptic segmentation.Besides,VAN-B2 surpasses Swin-T 4%mloU(50.1%vs.46.1%)for semantic segmentation on ADE20K benchmark,2.6%AP(48.8%vs.46.2%)for object detection on COCO dataset.It provides a novel method and a simple yet strong baseline for the community.The code is available at https://github.com/Visual-Attention-Network. 展开更多
关键词 vision backbone deep learning ConvNets ATTENTION
原文传递
Attention mechanisms in computer vision:A survey 被引量:108
2
作者 meng-hao guo Tian-Xing Xu +7 位作者 Jiang-Jiang Liu Zheng-Ning Liu Peng-Tao Jiang Tai-Jiang Mu Song-Hai Zhang Ralph R.Martin Ming-Ming Cheng Shi-Min Hu 《Computational Visual Media》 SCIE EI CSCD 2022年第3期331-368,共38页
Humans can naturally and effectively find salient regions in complex scenes.Motivated by this observation,attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human vi... Humans can naturally and effectively find salient regions in complex scenes.Motivated by this observation,attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system.Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image.Attention mechanisms have achieved great success in many visual tasks,including image classification,object detection,semantic segmentation,video understanding,image generation,3D vision,multimodal tasks,and self-supervised learning.In this survey,we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach,such as channel attention,spatial attention,temporal attention,and branch attention;a related repository https://github.com/MenghaoG uo/Awesome-Vision-Attentions is dedicated to collecting related work.We also suggest future directions for attention mechanism research. 展开更多
关键词 ATTENTION TRANSFORMER computer vision deep learning salience
原文传递
PCT:Point cloud transformer 被引量:111
3
作者 meng-hao guo Jun-Xiong Cai +3 位作者 Zheng-Ning Liu Tai-Jiang Mu Ralph R.Martin Shi-Min Hu 《Computational Visual Media》 EI CSCD 2021年第2期187-199,共13页
The irregular domain and lack of ordering make it challenging to design deep neural networks for point cloud processing.This paper presents a novel framework named Point Cloud Transformer(PCT)for point cloud learning.... The irregular domain and lack of ordering make it challenging to design deep neural networks for point cloud processing.This paper presents a novel framework named Point Cloud Transformer(PCT)for point cloud learning.PCT is based on Transformer,which achieves huge success in natural language processing and displays great potential in image processing.It is inherently permutation invariant for processing a sequence of points,making it well-suited for point cloud learning.To better capture local context within the point cloud,we enhance input embedding with the support of farthest point sampling and nearest neighbor search.Extensive experiments demonstrate that the PCT achieves the state-of-the-art performance on shape classification,part segmentation,semantic segmentation,and normal estimation tasks. 展开更多
关键词 3D computer vision deep learning point cloud processing TRANSFORMER
原文传递
Can attention enable MLPs to catch up with CNNs? 被引量:1
4
作者 meng-hao guo Zheng-Ning Liu +3 位作者 Tai-Jiang Mu Dun Liang Ralph R.Martin Shi-Min Hu 《Computational Visual Media》 EI CSCD 2021年第3期283-288,共6页
In the first week of May 2021,researchers from four different institutions:Google,Tsinghua University,Oxford University,and Facebook shared their latest work[1–4]on ar Xiv.org at almost the same time,each proposing n... In the first week of May 2021,researchers from four different institutions:Google,Tsinghua University,Oxford University,and Facebook shared their latest work[1–4]on ar Xiv.org at almost the same time,each proposing new learning architectures,consisting mainly of linear layers,claiming them to be comparable or superior to convolutional-based models. 展开更多
关键词 enable FACEBOOK GOOGLE
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部