期刊文献+

轻量级图像超分辨率的蓝图可分离卷积Transformer网络

Blueprint separable convolution Transformer network for lightweight image super-resolution
原文传递
导出
摘要 目的 图像超分辨率重建的目的是将低分辨率图像复原出具有更丰富细节信息的高分辨率图像。近年来,基于Transformer的深度神经网络在图像超分辨率重建领域取得了令人瞩目的性能,然而,这些网络往往参数量巨大、计算成本较高。针对该问题,设计了一种轻量级图像超分辨率重建网络。方法 提出了一种轻量级图像超分辨率的蓝图可分离卷积Transformer网络(blueprint separable convolution Transformer network,BSTN)。基于蓝图可分离卷积(blueprint separable convolution,BSConv)设计了蓝图前馈神经网络和蓝图多头自注意力模块。然后设计了移动通道注意力模块(shift channel attention block,SCAB)对通道重点信息进行加强,包括移动卷积、对比度感知通道注意力和蓝图前馈神经网络。最后设计了蓝图多头自注意力模块(blueprint multi-head self-attention block,BMSAB),通过蓝图多头自注意力与蓝图前馈神经网络以较低的计算量实现了自注意力过程。结果 本文方法在4个数据集上与10种先进的轻量级超分辨率方法进行比较。客观上,本文方法在不同数据集上取得了不同程度的领先,并且参数量和浮点运算量都处于较低水平。当放大倍数分别为2、3和4时,在Set5数据集上相比SOTA(state-of-theart)方法,峰值信噪比(peak signal to noise ratio,PSNR)分别提升了0.11 dB、0.16 dB和0.17 dB。主观上,本文方法重建图像清晰,模糊区域小,具有丰富的细节。结论 本文所提出的蓝图可分离卷积Transformer网络BSTN以较少的参数量和浮点运算量达到了先进水平,能获得高质量的超分辨率重建结果。 Objective Image super-resolution aims to enhance the resolution and quality of low-resolution images,making them more visually appealing and suitable for human or machine recognition.By utilizing a series of degraded low-resolution images with coarse details,the objective is to reconstruct high-resolution images with finer details.The applications of superresolution algorithms are vast and encompass areas,such as object detection,medical pathological analysis,remote sensing satellite images,and security monitoring.The promising prospects of these applications have led to an increased recognition of the importance of image super-resolution algorithms among researchers.With the advancement of deep learning in computer vision,this method has been successfully applied to image super-resolution,leading to significant achievements.However,the substantial number of parameters and the computational requirements of super-resolution models result in slow running speeds,limiting their practicality in real-world development and generation,particularly in mobile and edge devices.To address this issue,several lightweight super-resolution models have been proposed.Among these models,the Transformer-based approach stands out because it provides rich detail information in reconstructed images.However,this type of model still suffers from computational redundancy and large model size.To overcome these challenges,this study presents a novel lightweight super-resolution network based on the Transformer architecture.Method A blueprint separable convolution Transformer network(BSTN)is proposed for lightweight image superresolution.BSTN is divided into three parts:shallow feature extraction,deep feature extraction,and image reconstruction.In the shallow feature extraction stage,a 3×3 standard convolution operation is employed to extract low-level features from the input image.This initial feature extraction step helps capture basic image information,which is directly transmitted to the tail of the network to provide residual information via the long skip connection.The deep feature extraction component is composed of four successive residual attention Transformer groups(RATGs).The key elements within this stage are the shift channel attention module(SCAB)and the blueprint multi-head self-attention block(BMSAB).SCAB and BMSAB are combined to form the hybrid attention Transformer module(HATB).Two HATBs are connected together with a residual connection,and a standard convolution operation is applied to follow the two HATBs to construct the RATG.The blueprint feed-forward neural network is first designed for effectively suppressing low-information features and retaining only relevant and useful information.Then,the blueprint feed-forward neural network is introduced into the two aforementioned attention modules to efficiently extract the significant deep features for super-resolution.SCAB consists of three major components:shift convolution,contrast-aware channel attention,and blueprint feed-forward neural networks.Shift convolution reduces the number of network parameters and performs spatial information aggregation,enabling effective information fusion across different regions of the image.The contrast-aware channel attention mechanism focuses on important channel information,enhancing the representation of crucial features.BMSAB consists of a blueprint multihead self-attention and a blueprint feed-forward neural network.This module allows for the extraction of self-attention with reduced computational complexity while suppressing low-information features through the blueprint feed-forward neural network.Finally,the shallow features extracted in the earlier stage and the deep features obtained from the RATGs are added together.The combined features are then processed using pixel shuffle,a technique that rearranges features to increase their spatial resolution.This final step generates the reconstructed high-resolution image with improved quality and detail.By utilizing the designed architecture and specific components,the proposed lightweight super-resolution network achieves effective feature extraction,self-attention calculation,and image reconstruction,addressing the challenges of parameter redundancy and large model size commonly encountered in Transformer-based super-resolution models.Our method is implemented using PyTorch on NVIDIA RTX 3090 CPU.The training datasets used in this study are DIV2K and Flicr2K,which consist of 800 and 1000 images,respectively.Batch size is set to 32,and the patch size of the training data is set to 48 x 48 pixels.The initial learning rate is set to 5x10-and updated with an Adam optimizer by using a cosine descent strategy,while the total iteration is 10°.Result The proposed method is compared with 11 state-of-the-art approaches on 4 datasets.In accordance with the quantitative results,the proposed method has achieved varying degrees of improvement in different magnifications and datasets,while parameter size and floating-point operations are at low levels.When the magnification factor is 2,the peak signal to noise ratio(PSNR)of this model is ranked first place on Set5,Setl4,BSD100,and Urban100.It performs well on Set5 and Set14,surpassing the second-best model by 0.11 dB and 0.08 dB,respectively.When the magnification factor is 3,the PSNR also ranks first place,surpassing Set5 and Urban100 by 0.16 dB and 0.06 dB,respectively.When magnification is 4,it still ranks first place and outperforms the second-place models by 0.17,0.05,and 0.04 dB on Set5,BSD100,and Urban100,respectively.In accordance with the qualitative results,the reconstructed image of the proposed method is clear,the blurred area is small,and details are rich.Conclusion A large number of comparative experiments and ablation studies demonstrate that the proposed EBST not only achieves state-of-theart super-resolution results with excellent quantitative and visual performance,but it also has fewer parameters and floating point operations.In particular,the proposed blueprint separable multi-head self-attention can effectively perform selfattention in Transformer blocks through a concise structure.The proposed blueprint feed-forward neural network can focus on helpful information and filter out useless information for super-resolution,resulting in high efficiency and low cost.It can be seamlessly integrated into other modules.Although our method performs well,its advantages in terms of lightweight models are in evident and should be further enhanced.
作者 毕修平 陈实 张乐飞 Bi Xiuping;Chen Shi;Zhang Lefei(National Engineering Research Center for Multimedia Sofiware,School of Computer Science,Wuhan University Wuhan 430072,China;HubeiLuojia Laboratory,Wuhan 430079,China)
出处 《中国图象图形学报》 CSCD 北大核心 2024年第4期875-889,共15页 Journal of Image and Graphics
基金 国家自然科学基金项目(62122060) 湖北珞珈实验室专项基金项目(220100014)。
关键词 图像超分辨率 轻量级模型 TRANSFORMER 深度学习 注意力机制 image super-resolution lightweight model Transformer deep learning attention mechanism
  • 相关文献

参考文献4

二级参考文献66

  • 1宋磊,黄祥林,沈兰荪.视频监控中的快速异常检测与分析[J].高技术通讯,2004,14(10):1-6. 被引量:4
  • 2方帅,王东署,迟健男,徐心和.视频监控系统中小运动目标分类算法[J].信息与控制,2005,34(2):201-204. 被引量:1
  • 3王东升,李在铭.空域视频场景监视中运动对象的实时检测与跟踪技术[J].信号处理,2005,21(2):195-198. 被引量:5
  • 4佟雨兵,张其善,常青,祁云平.基于NN与SVM的图像质量评价模型[J].北京航空航天大学学报,2006,32(9):1031-1034. 被引量:30
  • 5Ramprasad Polana,Randal Nelson.Low level recognition of human motion.Motion of Non-Rigid and Articulated Objects[ A ].In:Proceedings of the 1994 IEEE Workshop on Motion of Non-Rigid and Articulated Objects[ C ],Texas,USA,1994:77-82.
  • 6Wojtaszek Daniel,Laganiere Robert.Tracking and recognition people in color using the earth mover's distance[ A ].In:Proceedings of IEEE International Workshop on Haptic Virtual Environments and Their Applications[ C ],Ottawa,Canada,2002:91-96.
  • 7Warren F Gardner,Daryl T Lawton.Interactive model-based vehicle tracking[ J].IEEE Transactions on Pattern Analysis and Intelligence,1996,18(11):1115-1121.
  • 8Lou Jian-guang,Tan Tie-niu,Hu Wei-ming,et al.3-D model-based vehicle tracking[J].IEEE Transactions on Image Processing,2005,14(10):1561-1569.
  • 9Ralf Plankers,Pascal Fua.Tracking and modeling people in video sequences[J].Computer Vision and Image Understanding,2001,81:285-302.
  • 10Koss M,Witkin A,Terzopoulous D.Snakes:Active contour models[ J].International Transactions on IT,1992,38 (2):617-643.

共引文献132

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部