期刊文献+

互补特征交互融合的RGB_D实时显著目标检测

RGB_D salient object detection algorithm based on complementary information interaction
原文传递
导出
摘要 目的 通过融合颜色、深度和空间信息,利用RGB_D这两种模态数据的显著目标检测方案通常能比单一模态数据取得更加准确的预测结果。深度学习进一步推动RGB_D显著目标检测领域的发展。然而,现有RGB_D显著目标检测深度网络模型容易忽略模态的特异性,通常仅通过简单的元素相加、相乘或特征串联来融合多模态特征,如何实现RGB图像和深度图像之间的信息交互则缺乏合理性解释。为了探求两种模态数据中的互补信息重要性及更有效的交互方式,在分析了传统卷积网络中修正线性单元(rectified linear unit,ReLU)选通特性的基础上,设计了一种新的RGB和深度特征互补信息交互机制,并首次应用于RGB_D显著目标检测中。方法 首先,根据该机制提出了互补信息交互模块将模态各自的“冗余”特征用于辅助对方。然后,将其阶段式插入两个轻量级主干网络分别用于提取RGB和深度特征并实施两者的交互。该模块核心功能基于修改的ReLU,具有结构简单的特点。在网络的顶层还设计了跨模态特征融合模块用于提取融合后特征的全局语义信息。该特征被馈送至主干网络每个尺度,并通过邻域尺度特征增强模块与多个尺度特征进行聚合。最后,采用了深度恢复监督、边缘监督和深度监督3种监督策略以有效监督提出模型的优化过程。结果 在4个广泛使用的公开数据集NJU2K(Nanjing University2K)、NLPR(national laboratory of pattern recognition)、STERE(stereo dataset)和SIP(salient person)上的定量和定性的实验结果表明,以Max F-measure、MAE(mean absolute error)以及Max E-measure共3种主流测度评估,本文提出的显著目标检测模型相比较其他方法取得了更优秀的性能和显著的推理速度优势(373.8帧/s)。结论 本文论证了在RGB_D显著目标检测中两种模态数据具有信息互补特点,提出的模型具有较好的性能和高效率推理能力,有较好的实际应用价值。 Objective By fusing color,depth,and spatial information,using RGB_D data in salient object detection typi⁃cally achieves more accurate predictions compared with using a single modality.Additionally,the rise of deep learning technology has further propelled the development of RGB_D salient object detection.However,existing RGB_D deep net⁃work models for salient object detection often overlook the specificity of different modalities.They typically rely on simple fusion methods, such as element-wise addition, multiplication, or feature concatenation, to combine multimodal features.However, the existing models of significant object detection in RGB_D deep networks often ignore the specificity of differ⁃ent modes. They often rely on simple fusion methods, such as element addition, multiplication, or feature joining, to com⁃bine multimodal features. These simple fusion techniques lack a reasonable explanation for the interaction between RGBand depth images. These methods do not effectively take advantage of the complementary information between RGB anddepth modes nor do they take advantage of the potential correlations between them. Therefore, more efficient methods mustbe proposed to facilitate the information interaction between RGB images and depth images so as to obtain more accuratesignificant object detection results. To solve this problem, the researchers simulated the relationship between RGB anddepth by analyzing traditional neural networks and linear correction units (ReLU)(e. g. , structures, such as constructedrecurrent neural networks or convolutional neural networks). Finally, a new interactive mechanism of complementary infor⁃mation between RGB and depth features is designed and applied to RGB_D salient target detection for the first time. Thismethod analyzes the correlations between RGB and depth features and uses these correlations to guide the fusion and inter⁃action process. To explore the importance of complementary information in both modalities and more effective ways of inter⁃action, we propose a new RGB and depth feature complementary information interaction mechanism based on analyzing theselectivity of ReLU in traditional convolutional networks. This mechanism is applied for the first time in RGB_D salientobject detection. Method First, on the basis of this mechanism, a complementary information interaction module is pro⁃posed to use the “redundancy” characteristics of each mode to assist each other. Then, it is inserted into two lightweightbackbone networks in phases to extract RGB and depth features and implement the interaction between them. The corefunction of the module is based on the modified ReLU, which has a simple structure. At the top layer of the network, across-modal feature fusion module is designed to extract the global semantic information of the fused features. These fea⁃tures are passed to each scale of the backbone network and aggregated with multiscale features via a neighborhood scale fea⁃ture enhancement module. In this manner, not only local and scale sensing features can be captured but also global seman⁃tic information can be obtained, thus improving the accuracy and robustness of salient target detection. At the same time,three monitoring strategies are adopted to supervise the optimization of the model effectively. First, the accuracy of depthinformation is constrained by depth recovery supervision to ensure the reliability of depth features. Second, edge supervi⁃sion is used to guide the model to capture the boundary information of important targets and improve the positioning accu⁃racy. Finally, deep supervision is used to improve the performance of the model further by monitoring the consistencybetween the fused features and the real significance graph. Result By conducting quantitative and qualitative experimentson widely used public datasets (Nanjing University 2K(NJU2K),national laboratory of pattern recognition(NLPR),stereodataset(STERE), and salient person(SIP)), the salient object detection model in this study shows remarkable advantageson three main evaluation measures: Max F-measure, mean absolute error(MAE), and Max E-measure. The model per⁃formed relatively well, especially on the SIP dataset, where it achieved the best results. In addition, the processing speedof the model remarkably improved to 373. 8 frame/s, while the parameter decreased to 10. 8 M. Compared with the othersix methods, the proposed complementary information aggregation module remarkably improved in the effect of salient tar⁃get detection. By using the complementary information of RGB and depth features and through the design of cross-modalfeature fusion module, the model can better capture the global semantic information of important targets and improve theaccuracy and robustness of detection. Conclusion The proposed salient object detection model in this study is based on thedesign of complementary information interaction module, lightweight backbone network, and cross-modal feature fusionmodule. The method maximizes the complementary information of RGB and depth features and achieves remarkable perfor⁃mance improvement through optimized network structure and monitoring strategy. Compared with other methods, thismodel shows better results in terms of accuracy, robustness, and computational efficiency. In RGB_D data, this work is ofcrucial to deepening the understanding of the importance of multimodal data fusion and promoting the research and applica⁃tion in the field of salient target detection.
作者 叶欣悦 朱磊 王文武 付云 Ye Xinyue;Zhu Lei;Wang Wenwu;Fu Yun(School of Information Science and Engineering,Wuhan University of Science and Technology,Wuhan 430081,China)
出处 《中国图象图形学报》 CSCD 北大核心 2024年第5期1252-1264,共13页 Journal of Image and Graphics
基金 国家自然科学基金项目(61873196,61502358)。
关键词 显著目标检测(SOD) RGB_D 深度卷积网络 互补信息交互 跨模态特征融合 salient object detection(SOD) RGB_D deep convolutional network complementary information interaction cross-modal feature fusion
  • 相关文献

参考文献8

二级参考文献16

共引文献53

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部