互补特征交互融合的RGB_D实时显著目标检测

RGB_D salient object detection algorithm based on complementary information interaction

导出

摘要目的通过融合颜色、深度和空间信息,利用RGB_D这两种模态数据的显著目标检测方案通常能比单一模态数据取得更加准确的预测结果。深度学习进一步推动RGB_D显著目标检测领域的发展。然而,现有RGB_D显著目标检测深度网络模型容易忽略模态的特异性,通常仅通过简单的元素相加、相乘或特征串联来融合多模态特征,如何实现RGB图像和深度图像之间的信息交互则缺乏合理性解释。为了探求两种模态数据中的互补信息重要性及更有效的交互方式,在分析了传统卷积网络中修正线性单元(rectified linear unit,ReLU)选通特性的基础上,设计了一种新的RGB和深度特征互补信息交互机制,并首次应用于RGB_D显著目标检测中。方法首先,根据该机制提出了互补信息交互模块将模态各自的“冗余”特征用于辅助对方。然后,将其阶段式插入两个轻量级主干网络分别用于提取RGB和深度特征并实施两者的交互。该模块核心功能基于修改的ReLU,具有结构简单的特点。在网络的顶层还设计了跨模态特征融合模块用于提取融合后特征的全局语义信息。该特征被馈送至主干网络每个尺度,并通过邻域尺度特征增强模块与多个尺度特征进行聚合。最后,采用了深度恢复监督、边缘监督和深度监督3种监督策略以有效监督提出模型的优化过程。结果在4个广泛使用的公开数据集NJU2K(Nanjing University2K)、NLPR(national laboratory of pattern recognition)、STERE(stereo dataset)和SIP(salient person)上的定量和定性的实验结果表明,以Max F-measure、MAE(mean absolute error)以及Max E-measure共3种主流测度评估,本文提出的显著目标检测模型相比较其他方法取得了更优秀的性能和显著的推理速度优势(373.8帧/s)。结论本文论证了在RGB_D显著目标检测中两种模态数据具有信息互补特点,提出的模型具有较好的性能和高效率推理能力,有较好的实际应用价值。 Objective By fusing color,depth,and spatial information,using RGB_D data in salient object detection typi⁃cally achieves more accurate predictions compared with using a single modality.Additionally,the rise of deep learning technology has further propelled the development of RGB_D salient object detection.However,existing RGB_D deep net⁃work models for salient object detection often overlook the specificity of different modalities.They typically rely on simple fusion methods, such as element-wise addition, multiplication, or feature concatenation, to combine multimodal features.However, the existing models of significant object detection in RGB_D deep networks often ignore the specificity of differ⁃ent modes. They often rely on simple fusion methods, such as element addition, multiplication, or feature joining, to com⁃bine multimodal features. These simple fusion techniques lack a reasonable explanation for the interaction between RGBand depth images. These methods do not effectively take advantage of the complementary information between RGB anddepth modes nor do they take advantage of the potential correlations between them. Therefore, more efficient methods mustbe proposed to facilitate the information interaction between RGB images and depth images so as to obtain more accuratesignificant object detection results. To solve this problem, the researchers simulated the relationship between RGB anddepth by analyzing traditional neural networks and linear correction units (ReLU)(e. g. , structures, such as constructedrecurrent neural networks or convolutional neural networks). Finally, a new interactive mechanism of complementary infor⁃mation between RGB and depth features is designed and applied to RGB_D salient target detection for the first time. Thismethod analyzes the correlations between RGB and depth features and uses these correlations to guide the fusion and inter⁃action process. To explore the importance of complementary information in both modalities and more effective ways of inter⁃action, we propose a new RGB and depth feature complementary information interaction mechanism based on analyzing theselectivity of ReLU in traditional convolutional networks. This mechanism is applied for the first time in RGB_D salientobject detection. Method First, on the basis of this mechanism, a complementary information interaction module is pro⁃posed to use the “redundancy” characteristics of each mode to assist each other. Then, it is inserted into two lightweightbackbone networks in phases to extract RGB and depth features and implement the interaction between them. The corefunction of the module is based on the modified ReLU, which has a simple structure. At the top layer of the network, across-modal feature fusion module is designed to extract the global semantic information of the fused features. These fea⁃tures are passed to each scale of the backbone network and aggregated with multiscale features via a neighborhood scale fea⁃ture enhancement module. In this manner, not only local and scale sensing features can be captured but also global seman⁃tic information can be obtained, thus improving the accuracy and robustness of salient target detection. At the same time,three monitoring strategies are adopted to supervise the optimization of the model effectively. First, the accuracy of depthinformation is constrained by depth recovery supervision to ensure the reliability of depth features. Second, edge supervi⁃sion is used to guide the model to capture the boundary information of important targets and improve the positioning accu⁃racy. Finally, deep supervision is used to improve the performance of the model further by monitoring the consistencybetween the fused features and the real significance graph. Result By conducting quantitative and qualitative experimentson widely used public datasets (Nanjing University 2K(NJU2K),national laboratory of pattern recognition(NLPR),stereodataset(STERE), and salient person(SIP)), the salient object detection model in this study shows remarkable advantageson three main evaluation measures: Max F-measure, mean absolute error(MAE), and Max E-measure. The model per⁃formed relatively well, especially on the SIP dataset, where it achieved the best results. In addition, the processing speedof the model remarkably improved to 373. 8 frame/s, while the parameter decreased to 10. 8 M. Compared with the othersix methods, the proposed complementary information aggregation module remarkably improved in the effect of salient tar⁃get detection. By using the complementary information of RGB and depth features and through the design of cross-modalfeature fusion module, the model can better capture the global semantic information of important targets and improve theaccuracy and robustness of detection. Conclusion The proposed salient object detection model in this study is based on thedesign of complementary information interaction module, lightweight backbone network, and cross-modal feature fusionmodule. The method maximizes the complementary information of RGB and depth features and achieves remarkable perfor⁃mance improvement through optimized network structure and monitoring strategy. Compared with other methods, thismodel shows better results in terms of accuracy, robustness, and computational efficiency. In RGB_D data, this work is ofcrucial to deepening the understanding of the importance of multimodal data fusion and promoting the research and applica⁃tion in the field of salient target detection.

作者叶欣悦朱磊王文武付云 Ye Xinyue;Zhu Lei;Wang Wenwu;Fu Yun(School of Information Science and Engineering,Wuhan University of Science and Technology,Wuhan 430081,China)

机构地区武汉科技大学信息科学与工程学院

出处《中国图象图形学报》 CSCD 北大核心 2024年第5期1252-1264,共13页 Journal of Image and Graphics

基金国家自然科学基金项目(61873196,61502358)。

关键词显著目标检测(SOD) RGB_D 深度卷积网络互补信息交互跨模态特征融合 salient object detection(SOD) RGB_D deep convolutional network complementary information interaction cross-modal feature fusion

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献8

1Tao Zhou,Deng-Ping Fan,Ming-Ming Cheng,Jianbing Shen,Ling Shao.RGB-D salient object detection:A survey[J].Computational Visual Media,2021,7(1):37-69. 被引量：16
2孙涵,刘译善,林昱涵.基于深度学习的显著性目标检测综述[J].数据采集与处理,2023,38(1):21-50. 被引量：5
3罗会兰,袁璞,童康.基于深度学习的显著性目标检测方法综述[J].电子学报,2021,49(7):1417-1427. 被引量：17
4蒋亭亭,刘昱,马欣,孙景林.多支路协同的RGB-T图像显著性目标检测[J].中国图象图形学报,2021,26(10):2388-2399. 被引量：8
5何伟,潘晨.注意力引导网络的显著性目标检测[J].中国图象图形学报,2022,27(4):1176-1190. 被引量：8
6何静,傅可人.小样本条件下的RGB-D显著性物体检测[J].中国图象图形学报,2022,27(10):2860-2872. 被引量：1
7范登平,季葛鹏,秦雪彬,程明明.认知规律启发的物体分割评价标准及损失函数[J].中国科学：信息科学,2021,51(9):1475-1489. 被引量：9
8丛润民,张晨,徐迈,刘鸿羽,赵耀.深度学习时代下的RGB-D显著性目标检测研究进展[J].软件学报,2023,34(4):1711-1731. 被引量：3

二级参考文献16

1项圣凯,曹铁勇,方正,洪施展.使用密集弱注意力机制的图像显著性检测[J].中国图象图形学报,2020,0(1):136-147. 被引量：5
2肖德贵,辛晨,张婷,朱欢,李小乐.显著性纹理结构特征及车载环境下的行人检测[J].软件学报,2014,25(3):675-689. 被引量：20
3毕威,黄伟国,张永萍,高冠琪,朱忠奎.基于图像显著轮廓的目标检测[J].电子学报,2017,45(8):1902-1910. 被引量：15
4丛润民,雷建军,付华柱,王文冠,黄庆明,牛力杰.视频显著性检测研究进展[J].软件学报,2018,29(8):2527-2544. 被引量：8
5王文冠,沈建冰,贾云得.视觉注意力检测综述[J].软件学报,2019,30(2):416-439. 被引量：56
6吴加莹,杨赛,堵俊,林宏达.自底向上的显著性目标检测研究综述[J].计算机科学,2019,46(3):48-52. 被引量：12
7Ali Borji,Ming-Ming Cheng,Qibin Hou,Huaizu Jiang,Jia Li.Salient object detection: A survey[J].Computational Visual Media,2019,5(2):117-150. 被引量：45
8丁颖,刘延伟,刘金霞,刘科栋,王利明,徐震.虚拟现实全景图像显著性检测研究进展综述[J].电子学报,2019,47(7):1575-1583. 被引量：15
9陈凯,王永雄.结合空间注意力多层特征融合显著性检测[J].中国图象图形学报,2020,25(6):1130-1141. 被引量：3
10刘亚美,张骏,张旭东,孙锐,高隽.光场显著性检测研究综述[J].中国图象图形学报,2020,25(12):2465-2483. 被引量：4

共引文献53

1文雅宏,巨琛.基于背景评估的贝叶斯模型显著性检测[J].计算机与现代化,2021(10):63-68.
2Shi-Min Hu.Message from the Editor-in-Chief[J].Computational Visual Media,2022,8(1):1-1.
3王立鹏,张佳鹏,张智,孟浩,肖绍桐,苏丽.基于SLAM定位的多位姿点云拼接与分割方法研究[J].实验技术与管理,2022,39(4):39-44. 被引量：4
4Wujie ZHOU,Chang LIU,Jingsheng LEI,Lu YU.RLLNet:a lightweight remaking learning network for saliency redetection on RGB-D images[J].Science China(Information Sciences),2022,65(6):75-76.
5方新林,方艳红,王迪.基于多模态特征融合的脑瘤图像分割方法[J].中国医学物理学杂志,2022,39(6):682-689. 被引量：6
6袁晓,肖云,江波,汤进.空间约束下自相互注意力的RGB-D显著目标检测[J].模式识别与人工智能,2022,35(6):526-535.
7刘志宇.基于深度图去噪的RGBD显著性目标检测的研究[J].信息与电脑,2022,34(7):130-134.
8罗顺,李春国,杨绿溪.基于Bi-SINet的复杂背景下的目标检测[J].无线电通信技术,2022,48(5):898-903.
9胡健,龚克,毛伊敏,陈志刚,陈亮.基于Im2col的并行深度卷积神经网络优化算法[J].计算机应用研究,2022,39(10):2950-2956. 被引量：6
10李睿,周晓光,侯东阳.基于正态分布的三维建筑物变化检测高度差阈值确定方法[J].测绘通报,2022(9):98-104. 被引量：1

1杨焰鑫,张振业,张小宁,张博文,孙国康.基于多特征的地铁运行监控图像目标识别研究[J].自动化应用,2024,65(4):166-169.
2陆军,刘霜,乔鹏飞,鲁林超.融合颜色与几何信息的点云配准[J].北京理工大学学报,2024,44(4):428-438.
3周一帆.建筑工程质量安全监督潜在问题及解决策略[J].中文科技期刊数据库（文摘版）工程技术,2016(12):205-205.
4张力.智能合约嵌入合同的功能主义阐释[J].社会科学辑刊,2023(5):46-56. 被引量：5
5雷元元.对道路桥梁施工质量监督重要性的思考[J].中文科技期刊数据库（全文版）工程技术,2017(1):154-154.
6徐建敏.浅析建筑工程施工过程中的质量监督与管理[J].房地产导刊,2024(5):111-112.
7叶兆元,张亮智,梁海泓,苏湘钿,黎志勇.基于深度学习的视野内状况检测方法研究[J].专用汽车,2024(6):123-125.
8Information for Contributors[J].China Ocean Engineering,2015,29(6).
9Information for Contributors[J].China Ocean Engineering,2012,26(1).
10Information for Contributors[J].China Ocean Engineering,2012,26(2).

中国图象图形学报

2024年第5期

浏览历史

内容加载中请稍等...

互补特征交互融合的RGB_D实时显著目标检测

参考文献8

二级参考文献16

共引文献53

相关作者

相关机构

相关主题

浏览历史