期刊文献+

3分支多层次Transformer特征交互的RGB–D显著性目标检测

RGB-D Salient Object Detection with Three-branch Multi-level Transformer Feature Interaction
下载PDF
导出
摘要 RGB深度图像(RGB–D)显著性目标检测是计算机视觉领域的研究任务之一,很多模型在简单场景下取得了较好的检测效果,却无法有效地处理多目标、深度图质量低下及显著性目标色彩与背景相似等复杂场景。因此,本文提出一种3分支多层次Transformer特征交互的RGB–D显著性目标检测模型。首先,提出一个跨模态坐标注意力模块,该模块通过采用坐标注意力抑制RGB图像和深度图的噪声信息,从而提取出更为显著的特征信息用于后续解码。其次,通过特征融合模块将高层的3层特征图调整到相同的分辨率送入Transformer层,有效地获取远距离显著性目标之间的关联关系和整幅图像的全局信息。然后,提出一个多层次特征交互模块,该模块有效地聚合多层次信息进行特征交互,从而能够更精准地定位显著性目标的位置,同时对显著性目标的边界进行细化。最后,设计一个密集扩张特征细化模块,利用密集扩张卷积获取丰富的多尺度特征,有效地应对显著性目标数量和尺寸变化。将模型在5个公开的基准数据集上与19种主流模型相比,实验结果表明:本文方法在多个测评指标上有较好的提升效果,提高了在特定复杂场景下的检测精度;从P–R(precision–recall)曲线、F–measure曲线和显著图也可以直观看出,本文方法取得了较好的检测结果,生成的显著图更完整、清晰,相比其他模型更加接近真值图。 RGB depth map salient object detection(RGB–D SOD)is one of the research tasks in the field of computer vision.Existing models have achieved desired detection performance in simple scenes,but they cannot effectively handle complex scenes with multiple objects,low-quality depth maps,and background-similar object colors.In order to solve the above problems,a RGB–D salient object detection model is proposed based on three-branch multi-level Transformer feature interaction in this paper.Firstly,a cross-modal coordinate attention module is proposed to suppress the noise information of RGB and depth maps by employing coordinate attention,benefiting to extract more significant feature information for subsequent decoding stage.Secondly,through the feature fusion module,the highest three-layer feature maps are resampled to the same resolution and fed into the Transformer layer,which can effectively obtain the correlations between distant objects and the global information of the entire image.Then,a multi-level feature interaction module is proposed to effectively aggregate multi-level information for feature interaction,thus enabling more accurate location of salient objects as well as refinement of the boundary of salient objects.Finally,we design a Dense Dilated Feature Refinement Module to obtain rich multi-scale features by using dense dilation convolution to effectively address the number and size variations of objects.Experimental results on five public benchmark datasets with 19 models show that the proposed method achieves significant performance improvement over baselines on multiple evaluation metrics,which effectively improves the detection accuracy of salient objects in complex scenes.In addition,based on the P–R(precision-recall)curve and the F-measure curve,we can also intuitively observe that the proposed method achieves better detection results,with more complete and clearer saliency maps.
作者 孟令兵 袁梦雅 时雪涵 刘晴晴 程菲 黎玲利 何术锋 MENG Lingbing;YUAN Mengya;SHI Xuehan;LIU Qingqing;CHENG Fei;LI Lingli;HE Shufeng(School of Computer and Software Eng.,Anhui Inst.of Info.Technol.,Wuhu 241199,China;School of Electrical and Electronic Eng.,Anhui Inst.of Info.Technol.,Wuhu 241199,China;School of Management,Hangzhou Dianzi Univ.,Hangzhou 310018,China;School of Computer Sci.and Technol.,Heilongjiang Univ.,Harbin 150006,China;Center for Eco-environmental Research,Nanjing Hydraulic Research Inst.,Nanjing 210017,China)
出处 《工程科学与技术》 EI CAS CSCD 北大核心 2023年第6期245-256,共12页 Advanced Engineering Sciences
基金 黑龙江省自然科学基金优秀青年项目(YQ2019F016) 安徽省自然科学基金面上项目(2008085MF201) 安徽信息工程学院高层次人才科研启动项目(rckj2021A002) 安徽省教育厅自然科学重点项目(KJ2020a0824)。
关键词 显著性目标检测 坐标注意力 TRANSFORMER 特征交互 密集卷积 显著图 salient object detection coordinate attention Transformer feature interaction dilated convolution saliency map
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部