通道注意力嵌入的Transformer图像超分辨率重构被引量：1

Image super-resolution with channel-attention-embedded Transformer

导出

摘要目的基于深度学习的图像超分辨率重构研究取得了重大进展,如何在更好提升重构性能的同时,有效降低重构模型的复杂度,以满足低成本及实时应用的需要,是该领域研究关注的重要问题。为此,提出了一种基于通道注意力(channel attention,CA)嵌入的Transformer图像超分辨率深度重构方法(image super-resolution with channelattention-embedded Transformer,CAET)。方法提出将通道注意力自适应地嵌入Transformer变换特征及卷积运算特征,不仅可充分利用卷积运算与Transformer变换在图像特征提取的各自优势,而且将对应特征进行自适应增强与融合,有效改进网络的学习能力及超分辨率性能。结果基于5个开源测试数据集,与6种代表性方法进行了实验比较,结果显示本文方法在不同放大倍数情形下均有最佳表现。具体在4倍放大因子时,比较先进的SwinIR(image restoration using swin Transformer)方法,峰值信噪比指标在Urban100数据集上得到了0.09 dB的提升,在Manga109数据集提升了0.30 dB,具有主观视觉质量的明显改善。结论提出的通道注意力嵌入的Transformer图像超分辨率方法,通过融合卷积特征与Transformer特征,并自适应嵌入通道注意力特征增强,可以在较好地平衡网络模型轻量化同时,得到图像超分辨率性能的有效提升,在多个公共实验数据集的测试结果验证了本文方法的有效性。 Objective Research on single image super-resolution reconstruction based on deep learning technology has made great progress in recent years.However,to improve reconstruction performance,previous studies have mostly focused on building complex networks with a large number of parameters.How to effectively reduce the complexity of the model while improving the reconstruction performance to meet the needs of low-cost and real-time applications has become an important research direction.While state-of-the-art lightweight super-resolution methods are mainly based on convolutional neural networks,only few methods have been designed with Transformer,which show an excellent performance in image restoration tasks.To solve these problems,we propose a lightweight super-resolution network called image superresolution with channel-attention-embedded Transformer(CAET),which can achieve excellent super-resolution perfor⁃mance with a small number of parameters.Method CAET involves four stages,namely,shallow feature extraction,hierar⁃chical feature extraction,multi-layer feature fusion,and image reconstruction.The hierarchical feature extraction stage is performed by a basic building block called channel-attention-embedded Transformer block(CAETB),which adaptively embeds channel attention(CA)into Transformer and convolutional features,hence not only taking full advantage of the convolutional network and Transformer in image feature extraction but also adaptively enhancing and fusing the correspond⁃ing features.Convolutional layers provide stable optimization and extraction results during early vision feature processing,and co-solution layers with spatially invariant filters can enhance the advection equivalence of the network.The stacking of convolutional layers can effectively increase the perceptual field of the network.Therefore,three cascaded convolutional layers are placed in front of CAETB to receive the features output from the previous module,and the LeakyReLU activation function is used to activate them.The features extracted by convolution layers are embedded with channel attention.To effectively adjust the channel attention parameters,we adopt a linear weighting method to combine channel attention with features from different levels.These features are then inputted into the swin Transformer layer(SwinIR)for further deep feature extraction.Given that increasing the network depth leads to saturation,we set the number of CAETB to 4 to main⁃tain a balance between model complexity and super-resolution performance.The hierarchical information at different stages is helpful in interpreting the final reconstruction results.Therefore,CAET combines all the low-and high-level information from the deep feature extraction and multi-level feature fusion stages.In the image reconstruction phase,we use a convolu⁃tion layer and the pixel shuffle layer to upsample the features to the corresponding dimensions of a high-resolution image.During the training stage,we use 800 images from the DIV2K dataset to train CAET,and we augment all training images by randomly flipping them vertically and horizontally to increase the diversity of the training data.For each mini-batch,we randomly crop image patches to a size of 64×64 pixels as our low-resolution(LR)images.We then optimize our network using the Adam algorithm and apply L1 loss as our loss function.Result We conduct experiments on five public datasets,namely,Set5,Set14,Berkeley segmentation dataset(BSD)100,Urban100,and Manga109,to compare the performance of our proposed method with that of six state-of-the-art models,including super-resolution convolutional neural network(SRCNN),cascading residual network(CARN),information multi-distillation network(IMDN),super-resolution with lattice block(LatticeNet),and image restoration using swin Transformer(SwinIR).We measure the performances of these methods using peak signal-to-noise ratio(PSNR)and structural similarity(SSIM)as metrics.Given that humans are highly sensitive to the brightness of images,we measure these metrics in the Y channel of an image.Experiment results show that the proposed method receives the highest PSNR and SSIM values and recovers more detailed information and more accurate texture compared with the state-of-the-art methods at×2,×3,and×4 amplification factors.At the×4 ampli⁃fication factor,the PSNR of the proposed method is improved by 0.09 dB on the Urban100 dataset and by 0.30 dB on the Manga109 dataset compared to that of SwinIR.In terms of model complexity,CAET achieves a better performance with fewer parameters and multiply-accumulator operations compared to SwinIR,which also uses Transformer as the backbone of the network.Although CAET consumes more parameters and multiply-accumulator operations compared to IMDN and the LatticeNet,this method achieves significantly higher performance in terms of PSNR and SSIM.Conclusion The pro⁃posed CAET can effectively improve the image super-resolution reconstruction performance by fusing convolution and Transformer features and applies adaptive embedding channel attention to enhance the features.CAET effectively improves the image super-resolution performance while controlling the complexity of the whole network.Experiment results on sev⁃eral public experimental datasets verify the effectiveness of our method.

作者熊巍熊承义高志荣陈文旗郑瑞华田金文 Xiong Wei;Xiong Chengyi;Gao Zhirong;Chen Wenqi;Zheng Ruihua;Tian Jinwen(School of Electronic and Information Engineering,South-Central Minzu University,Wuhan 430074,China;Hubei Key Laboratory of Intelligent Wireless Communication,South-Central Minzu University,Wuhan 430074,China;School of Computer Science,South-Central Minzu University,Wuhan 430074,China;State Key Laboratory of Multispectral Information Processing Technology,Huazhong University of Science and Technology,Wuhan 430074,China)

机构地区中南民族大学电子信息工程学院中南民族大学智能无线通信湖北省重点实验室中南民族大学计算机科学学院华中科技大学多谱信息处理技术国家重点实验室

出处《中国图象图形学报》 CSCD 北大核心 2023年第12期3744-3757,共14页 Journal of Image and Graphics

基金中央高校基本科研业务费专项资金项目(CZY21013) 多谱信息处理技术国家重点实验室基金项目(6142113210303)。

关键词超分辨率(SR) TRANSFORMER 卷积神经网络(CNN) 通道注意力(CA) 深度学习 super-resolution(SR) Transformer convolutional neural network(CNN) channel attention(CA) deep learning

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1雷鹏程,刘丛,唐坚刚,彭敦陆.分层特征融合注意力网络图像超分辨率重建[J].中国图象图形学报,2020,25(9):1773-1786. 被引量：13
2蒋梦洁,钱文华,徐丹,吴昊,柳春宇.残差密集结构的东巴画渐进式重建[J].中国图象图形学报,2022,27(4):1084-1096. 被引量：1

二级参考文献3

1王欢,吴成东,迟剑宁,于晓升,胡倩.联合多任务学习的人脸超分辨率重建[J].中国图象图形学报,2020,25(2):229-240. 被引量：5
2周登文,赵丽娟,段然,柴晓亮.基于递归残差网络的图像超分辨率重建[J].自动化学报,2019,45(6):1157-1165. 被引量：24
3应自炉,龙祥.多尺度密集残差网络的单幅图像超分辨率重建[J].中国图象图形学报,2019,0(3):410-419. 被引量：21

共引文献12

1张栋,靳洪超.基于锥形束CT处理的腺样体图像超分辨率重建系统[J].自动化与仪器仪表,2021(4):132-135. 被引量：1
2武苏雯,赵慧杰,刘鑫,王佳豪.基于迁移学习的图像分类在诗词中的应用研究[J].计算机技术与发展,2021,31(7):215-220. 被引量：2
3彭晏飞,张平甲,高艺,訾玲玲.融合注意力的生成式对抗网络单图像超分辨率重建[J].激光与光电子学进展,2021,58(20):174-183. 被引量：4
4王凡超,丁世飞.基于广泛激活深度残差网络的图像超分辨率重建[J].智能系统学报,2022,17(2):440-446. 被引量：3
5胡德敏,闵天悦.改进型轻量级GAN的红外图像超分辨率方法[J].小型微型计算机系统,2022,43(8):1711-1717. 被引量：1
6魏凌云,孙帮勇.Bayer阵列图像去马赛克算法综述[J].中国图象图形学报,2022,27(9):2683-2696. 被引量：3
7孟志青,张晶,邱健数.多监督损失函数光滑化图像超分辨率重建[J].中国图象图形学报,2022,27(10):2972-2983. 被引量：1
8孙红,张玉香,凌岳览.基于损失提取反馈注意网络的图像超分辨率重建研究[J].系统仿真学报,2023,35(2):308-317. 被引量：2
9刘羽,朱文瑜,成娟,陈勋.残差密集注意力网络多模态MR图像超分辨率重建[J].中国图象图形学报,2023,28(1):248-259. 被引量：3
10彭晏飞,张曼婷,张平甲,李健,顾丽睿.聚合残差注意力网络的单图像超分辨率重建[J].激光与光电子学进展,2023,60(10):182-191. 被引量：3

同被引文献15

1杨昆,杜瑀,钱武侠,薛林雁,刘琨,卢闫晔.基于多尺度上下文信息融合的条件生成对抗神经网络用于低剂量PET图像去噪[J].电子测量技术,2021,44(7):74-81. 被引量：7
2邓宇平,王桂棠.基于GoogleNet网络与残差网络的织物纹理分析[J].电子测量技术,2021,44(7):31-38. 被引量：4
3牛小兵,林玉池,赵美蓉,洪昕,程冬梅.光栅投影三维测量的原理及关键技术分析[J].仪器仪表学报,2001,22(z1):203-205. 被引量：13
4王建华,周玉国,杨延西.基于相位误差自校正的高速三维测量技术[J].电子测量与仪器学报,2019,31(12):116-125. 被引量：13
5彭广泽,陈文静.基于卷积神经网络去噪正则化的条纹图修复[J].光学学报,2020,40(18):83-92. 被引量：11
6王婕,罗静蕊,岳广德.一种改进的多尺度融合并行稠密残差去噪网络[J].小型微型计算机系统,2021,42(4):798-804. 被引量：6
7张申华,杨延西,秦峤孟.针对光栅图像的快速盲去噪方法[J].中国光学,2021,14(3):596-604. 被引量：8
8许萌,王丹,李致远,陈远方.IncepA-EEGNet:融合Inception网络和注意力机制的P300信号检测方法[J].浙江大学学报（工学版）,2022,56(4):745-753. 被引量：4
9张伟,龚渠,张俊杰,王生怀.基于改进U-net的条纹图去噪分析[J].光学技术,2022,48(3):334-340. 被引量：2
10许瑶瑶,单剑锋.基于密集连接和Inception模块的前列腺图像分割[J].电子测量技术,2022,45(15):151-157. 被引量：2

引证文献1

1张伟,张俊杰,宋杰,吕圣,王生怀.基于改进SwinIR的条纹图去噪方法[J].电子测量技术,2023,46(23):105-111.

1宋昕,王保云.基于改进SRCNN模型的图像超分辨率重构[J].现代信息科技,2023,7(20):54-57. 被引量：1
2黄磊,杨媛,杨成煜,杨威,李耀华.FS-YOLOv5:轻量化红外目标检测方法[J].计算机工程与应用,2023,59(9):215-224. 被引量：11
3卢小海,谭俊杰,陈士强.基于动态时间规整的电力系统突变信号自动识别方法[J].自动化技术与应用,2023,42(9):72-75. 被引量：1
4杨雪霁.面向多人语音识别的对话系统研究[J].自动化与仪器仪表,2023(8):286-290. 被引量：2
5彭晏飞,孟欣,李泳欣,刘蓝兮.结合坐标注意力与生成式对抗网络的图像超分辨率重建[J].计算机工程与科学,2024,46(1):122-131. 被引量：1
6Linlin Zhu,Yu Han,Xiaoqi Xi,Zhicun Zhang,Mengnan Liu,Lei Li,Siyu Tan,Bin Yan.Asymmetric Loss Based on Image Properties for Deep Learning-Based Image Restoration[J].Computers, Materials & Continua,2023,77(12):3367-3386.
7黄亚群,郑培煜,蒋慕蓉,杨磊,罗俊.结合门控融合网络和残差傅里叶变换重建太阳斑点图[J].计算机科学,2023,50(S02):266-272.
8袁青,陈成新,许云超,杜江齐,高英哲.基于CMOS结构与感光度对电子内窥镜图像宽容度与信噪比的分析研究[J].工业计量,2023,33(6):6-9.
9侯舒娟,朱文萍,李海.Stagewise Training for Hybrid-Distorted Image Restoration[J].Journal of Shanghai Jiaotong university(Science),2023,28(6):793-801.
10谷振宇,陈聪,郑家佳,孙棣华.考虑时空相似性的动态图卷积神经网络交通流预测[J].控制与决策,2023,38(12):3399-3408. 被引量：2

中国图象图形学报

2023年第12期

浏览历史

内容加载中请稍等...

通道注意力嵌入的Transformer图像超分辨率重构被引量：1

参考文献2

二级参考文献3

共引文献12

同被引文献15

引证文献1

相关作者

相关机构

相关主题

浏览历史

通道注意力嵌入的Transformer图像超分辨率重构 被引量：1

参考文献2

二级参考文献3

共引文献12

同被引文献15

引证文献1

相关作者

相关机构

相关主题

浏览历史

通道注意力嵌入的Transformer图像超分辨率重构被引量：1