多支路协同的RGB-T图像显著性目标检测被引量：8

Multi-path collaborative salient object detection based on RGB-T images

导出

摘要目的显著性目标检测是机器视觉应用的基础,然而目前很多方法在显著性物体与背景相似、低光照等一些复杂场景得到的效果并不理想。为了提升显著性检测的性能,提出一种多支路协同的RGB-T(thermal)图像显著性目标检测方法。方法将模型主体设计为两条主干网络和三条解码支路。主干网络用于提取RGB图像和Thermal图像的特征表示,解码支路则分别对RGB特征、Thermal特征以及两者的融合特征以协同互补的方式预测图像中的显著性物体。在特征提取的主干网络中,通过特征增强模块实现多模图像的融合互补,同时采用适当修正的金字塔池化模块,从深层次特征中获取全局语义信息。在解码过程中,利用通道注意力机制进一步区分卷积神经网络(convolutional neural networks,CNN)生成的特征在不同通道之间对应的语义信息差异。结果在VT821和VT1000两个数据集上进行测试,本文方法的最大F-measure值分别为0.8437和0.8805,平均绝对误差(mean absolute error,MAE)值分别为0.0394和0.0322,相较于对比方法,提升了整体检测性能。结论通过对比实验表明,本文提出的方法提高了显著性检测的稳定性,在一些低光照场景取得了更好效果。 Objective Saliency detection is a fundamental technology in computer vision and image processing,which aims to identify the most visually distinctive objects or regions in an image.As a preprocessing step,salient object detection plays a critical role in many computer vision applications,including visual tracking,scene classification,image retrieval,and content-based image compression.While numerous salient object detection methods have been presented,most of them are designed for RGB images only or depth RGB(RGB-D)images.However,these methods remain challenging in some complex scenarios.RGB methods may fail to distinguish salient objects from backgrounds when exposed to similar foreground and background or low-contrast conditions.RGB-D methods also suffer from challenging scenarios characterized by low-light conditions and variations in illumination.Considering that thermal infrared images are invariant to illumination conditions,we propose a multi-path collaborative salient object detection method in this study,which is designed to improve the performance of saliency detection by using the multi-mode feature information of RGB and thermal images.Method In this study,we design a novel end-to-end deep neural network for thermal RGB(RGB-T)salient object detection,which consists of an encoder network and a decoder network,including the feature enhance module,the pyramid pooling module,the channel attention module,and thel1-norm fusion strategy.First,the main body of the model contains two backbone networks for extracting the feature representations of RGB and thermal images,respectively.Then,three decoding branches are used to predict the saliency maps in a coordinated and complementary manner for extracted RGB feature,thermal feature,and fusion feature of both,respectively.The two backbone network streams have the same structure,which is based on Visual Geometry Group 19-layer(VGG-19)net.In order to make a better fit with saliency detection task,we only maintain five convolutional blocks of VGG-19 net and discard the last pooling and fully connected layers to preserve more spatial information from the input image.Second,the feature enhance module is used to fully extract and fuse multi-modal complementary cues from RGB and thermal streams.The modified pyramid pooling module is employed to capture global semantic information from deep-level features,which is used to locate salient objects.Finally,in the decoding process,the channel attention mechanism is designed to distinguish the semantic differences between the different channels,thereby improving the decoder’s ability to separate salient objects from backgrounds.The entire model is trained in an end-to-end manner.Our training set consists of 900 aligned RGB-T image pairs that are randomly selected from each subset of the VT1000 dataset.To prevent overfitting,we augment the training set by flipping and rotating operations.Our method is implemented with Py Torch toolbox and trained on a PC with GTX 1080 Ti GPU and 11 GB of memory.The input images are uniformly resized to 256×256 pixels.The momentum,weight decay,and learning rate are set as 0.9,0.0005,and1 E-9,respectively.During training,the softmax entropy loss is used to converge the entire network.Result We compare our model with four state-of-the-art saliency models,including two RGB-based methods and two RGB-D-based methods,on two public datasets,namely,VT821 and VT1000.The quantitative evaluation metrics contain F-measure,mean absolute error(MAE),and precision-recall(PR)curves,and we also provide several saliency maps of each method for visual comparison.The experimental results demonstrate that our model outperforms other methods,and the saliency maps have more refined shapes under challenging conditions,such as poor illumination and low contrast.Compared with the other four methods in VT821,our method obtains the best results on maximum F-measure and MAE.The maximum F-measure(higher is better)increases by 0.26%,and the MAE(less is better)decreases by 0.17%than the second-ranked method.Compared with the other four methods in VT1000,our model also achieves the best result on maximum F-measure,which reaches 88.05%and increases by 0.46%compared with the second-ranked method.However,the MAE is 3.22%,which increases by 0.09%and is slightly poorer than the first-ranked method.Conclusion We propose a CNN-based method for RGB-T salient object detection.To the best of our knowledge,existing saliency detection methods are mostly based on RGB or RGB-D images,so it is very meaningful to explore the application of CNN for RGB-T salient object detection.The experimental results on two public RGB-T datasets demonstrate that the method proposed in this study performs better than the state-of-the-art methods,especially for challenging scenes with poor illumination,complex background,or low contrast,which proves that it is effective to improve the performance by fusing multi-modal information from RGB and thermal images.However,public datasets for RGB-T salient detection are lacking,which is very important for the performance of deep learning network.At the same time,detection speed is a key measurement in the preprocessing step of other computer vision tasks.Thus,in the future work,we will collect more high-quality datasets for RGB-T salient detection and design more light-weight models to increase the speed of detection.

作者蒋亭亭刘昱马欣孙景林 Jiang Tingting;Liu Yu;Ma Xin;Sun Jinglin(School of Microelectronics,Tianjin University,Tianjin 300072,China;School of Electrical and Information Engineering,Tianjin University,Tianjin 300072,China)

机构地区天津大学微电子学院天津大学电气自动化与信息工程学院

出处《中国图象图形学报》 CSCD 北大核心 2021年第10期2388-2399,共12页 Journal of Image and Graphics

基金云南省重大科技专项:云南特色产业数字化研究与应用示范项目(202002AD080001) 天津市重大科技专项(18ZXRHSY00190)。

关键词 RGB-T显著性目标检测多模图像融合多支路协同预测通道注意力机制金字塔池化模块(PPM) RGB-T salient object detection multi-modal images fusion multi-path collaborative prediction channel attention mechanism pyramid pooling module(PPM)

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

同被引文献68

1许茗,于晓升,陈东岳,吴成东,贾同,茹敬雨.复杂热红外监控场景下行人检测[J].中国图象图形学报,2018,23(12):1829-1837. 被引量：14
2郑伟诗,吴岸聪.非对称行人重识别:跨摄像机持续行人追踪[J].中国科学：信息科学,2018,48(5):545-563. 被引量：10
3王鹏宇,游有鹏,杨雪峰.基于颜色量化和密度峰聚类的彩色图像分割[J].计算机工程与应用,2020,56(2):211-215. 被引量：11
4张志龙,李爱华,李楚为.基于密度峰值搜索聚类的超像素分割算法[J].计算机学报,2020,43(1):1-15. 被引量：21
5徐英,谷雨,彭冬亮,刘俊.基于DRGAN和支持向量机的合成孔径雷达图像目标识别[J].光学精密工程,2020,28(3):727-735. 被引量：20
6胡麟苗,张湧.基于生成对抗网络的短波红外-可见光人脸图像翻译[J].光学学报,2020,40(5):69-78. 被引量：16
7杨涛,戴军,吴钟建,金代中,周国家.基于深度学习的红外舰船目标识别[J].红外技术,2020,42(5):426-433. 被引量：5
8赵永强,饶元,董世鹏,张君毅.深度学习目标检测方法综述[J].中国图象图形学报,2020,25(4):629-654. 被引量：216
9吴帅,谢春思,李进军,桑雨,刘志赢.基于轮廓像处理的岛岸附近目标检测[J].战术导弹技术,2020(3):79-86. 被引量：3
10陈琴,朱磊,后云龙,邓慧萍,吴谨.基于深度中心邻域金字塔结构的显著目标检测[J].模式识别与人工智能,2020,33(6):496-506. 被引量：7

引证文献8

1王卫斌,卫旭涛.基于改进SVM的红外成像目标识别模型构建及仿真[J].自动化与仪器仪表,2022(11):40-45. 被引量：1
2徐温程,周之平,程家睿,盖杉.多尺度特征多径自适应复用的显著性目标检测[J].计算机应用研究,2023,40(2):628-633.
3别倩,王晓,徐新,赵启军,王正,陈军,胡瑞敏.红外-可见光跨模态的行人检测综述[J].中国图象图形学报,2023,28(5):1287-1307. 被引量：3
4倪波,蔡贤涛.Parzen窗算法下图像视觉显著目标识别仿真[J].计算机仿真,2023,40(11):161-164.
5胡艳婷,王璐娜,张通,李明亮.高斯函数在显著图像特征提取中的应用仿真[J].计算机仿真,2023,40(12):232-235.
6刘东,毕洪波,任思琪,于鑫,张丛.基于多模态RGB-T的显著性目标检测算法[J].吉林大学学报（信息科学版）,2024,42(3):573-578.
7叶欣悦,朱磊,王文武,付云.互补特征交互融合的RGB_D实时显著目标检测[J].中国图象图形学报,2024,29(5):1252-1264. 被引量：1
8韦少凡,张琼.无人机激光雷达遥感图像显著性目标检测方法[J].机械设计与研究,2024,40(2):69-72.

二级引证文献5

1朱荣花.基于机器视觉的物流运输轨迹分段拟合系统设计[J].自动化与仪器仪表,2023(4):196-200.
2王恩龙,李嘉伟,雷佳,周士华.基于深度学习的红外可见光图像融合综述[J].计算机科学与探索,2024,18(4):899-915. 被引量：1
3常天庆,张杰,赵立阳,韩斌,张雷.基于可见光与红外图像融合的装甲目标检测算法[J].兵工学报,2024,45(7):2085-2096.
4孟豪,王磊磊.图案矢量化技术在激光雕刻中的应用[J].信息技术与信息化,2024(8):69-72.
5许可,刘心溥,汪汉云,万建伟,郭裕兰.红外与可见光图像特征动态选择的目标检测网络[J].中国图象图形学报,2024,29(8):2350-2363.

1邓维立,刘震.太赫兹/毫米波人体安检和人脸识别技术深度融合应用研究[J].警察技术,2020(1):86-88. 被引量：2
2无,方驰华.复杂性肝脏肿瘤切除三维可视化精准诊治指南(2019版)[J].南方医科大学学报,2020,40(3):297-307. 被引量：27
3陈哲.海绵城市理念下市政道路给排水设计研究[J].智能城市,2021,7(18):48-49. 被引量：6
4王成,李千目.融合词频-逆向文件频率的受限玻尔兹曼机推荐算法[J].南京理工大学学报,2021,45(5):551-557. 被引量：8
5王苹苹,聂品,王丽芳,党艳莉,朱开国,陈宝莹.基于深度学习的乳腺DCE-MRI图像推演模型的构建[J].肿瘤影像学,2021,30(5):339-344.
6刁金凤,陈涛,任翔,王振宇.基于机器视觉的布氏硬度机测量技术研究[J].计量与测试技术,2021,48(10):33-35. 被引量：1
7高亮,王九龙,张博阳,李美丽,苏继恒,林秀影.电力电子技术教学中创新应用能力培养模式的探索[J].女人坊,2021(23):157-159.
8王珩安,肖凤翔.法政一体:校企合作中的企业权利保障机制[J].中国职业技术教育,2021,37(27):18-23. 被引量：1
9李莉,李文军,马德新,杨成飞,孟繁佳.基于LSTM的温室番茄蒸腾量预测模型研究[J].农业机械学报,2021,52(10):369-376. 被引量：13
10陈敏,庄学文,王振亚,方竹.一种涂料搅拌机设计[J].科技风,2021(33):4-6.

中国图象图形学报

2021年第10期

浏览历史

内容加载中请稍等...

多支路协同的RGB-T图像显著性目标检测被引量：8

同被引文献68

引证文献8

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

多支路协同的RGB-T图像显著性目标检测 被引量：8

同被引文献68

引证文献8

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

多支路协同的RGB-T图像显著性目标检测被引量：8