3分支多层次Transformer特征交互的RGB–D显著性目标检测

RGB-D Salient Object Detection with Three-branch Multi-level Transformer Feature Interaction

下载PDF

导出

摘要 RGB深度图像(RGB–D)显著性目标检测是计算机视觉领域的研究任务之一,很多模型在简单场景下取得了较好的检测效果,却无法有效地处理多目标、深度图质量低下及显著性目标色彩与背景相似等复杂场景。因此,本文提出一种3分支多层次Transformer特征交互的RGB–D显著性目标检测模型。首先,提出一个跨模态坐标注意力模块,该模块通过采用坐标注意力抑制RGB图像和深度图的噪声信息,从而提取出更为显著的特征信息用于后续解码。其次,通过特征融合模块将高层的3层特征图调整到相同的分辨率送入Transformer层,有效地获取远距离显著性目标之间的关联关系和整幅图像的全局信息。然后,提出一个多层次特征交互模块,该模块有效地聚合多层次信息进行特征交互,从而能够更精准地定位显著性目标的位置,同时对显著性目标的边界进行细化。最后,设计一个密集扩张特征细化模块,利用密集扩张卷积获取丰富的多尺度特征,有效地应对显著性目标数量和尺寸变化。将模型在5个公开的基准数据集上与19种主流模型相比,实验结果表明:本文方法在多个测评指标上有较好的提升效果,提高了在特定复杂场景下的检测精度;从P–R(precision–recall)曲线、F–measure曲线和显著图也可以直观看出,本文方法取得了较好的检测结果,生成的显著图更完整、清晰,相比其他模型更加接近真值图。 RGB depth map salient object detection(RGB–D SOD)is one of the research tasks in the field of computer vision.Existing models have achieved desired detection performance in simple scenes,but they cannot effectively handle complex scenes with multiple objects,low-quality depth maps,and background-similar object colors.In order to solve the above problems,a RGB–D salient object detection model is proposed based on three-branch multi-level Transformer feature interaction in this paper.Firstly,a cross-modal coordinate attention module is proposed to suppress the noise information of RGB and depth maps by employing coordinate attention,benefiting to extract more significant feature information for subsequent decoding stage.Secondly,through the feature fusion module,the highest three-layer feature maps are resampled to the same resolution and fed into the Transformer layer,which can effectively obtain the correlations between distant objects and the global information of the entire image.Then,a multi-level feature interaction module is proposed to effectively aggregate multi-level information for feature interaction,thus enabling more accurate location of salient objects as well as refinement of the boundary of salient objects.Finally,we design a Dense Dilated Feature Refinement Module to obtain rich multi-scale features by using dense dilation convolution to effectively address the number and size variations of objects.Experimental results on five public benchmark datasets with 19 models show that the proposed method achieves significant performance improvement over baselines on multiple evaluation metrics,which effectively improves the detection accuracy of salient objects in complex scenes.In addition,based on the P–R(precision-recall)curve and the F-measure curve,we can also intuitively observe that the proposed method achieves better detection results,with more complete and clearer saliency maps.

作者孟令兵袁梦雅时雪涵刘晴晴程菲黎玲利何术锋 MENG Lingbing;YUAN Mengya;SHI Xuehan;LIU Qingqing;CHENG Fei;LI Lingli;HE Shufeng(School of Computer and Software Eng.,Anhui Inst.of Info.Technol.,Wuhu 241199,China;School of Electrical and Electronic Eng.,Anhui Inst.of Info.Technol.,Wuhu 241199,China;School of Management,Hangzhou Dianzi Univ.,Hangzhou 310018,China;School of Computer Sci.and Technol.,Heilongjiang Univ.,Harbin 150006,China;Center for Eco-environmental Research,Nanjing Hydraulic Research Inst.,Nanjing 210017,China)

机构地区安徽信息工程学院计算机与软件工程学院安徽信息工程学院电气与电子工程学院杭州电子科技大学管理学院黑龙江大学计算机科学与技术学院南京水利科学研究院生态环境研究所

出处《工程科学与技术》 EI CAS CSCD 北大核心 2023年第6期245-256,共12页 Advanced Engineering Sciences

基金黑龙江省自然科学基金优秀青年项目(YQ2019F016) 安徽省自然科学基金面上项目(2008085MF201) 安徽信息工程学院高层次人才科研启动项目(rckj2021A002) 安徽省教育厅自然科学重点项目(KJ2020a0824)。

关键词显著性目标检测坐标注意力 TRANSFORMER 特征交互密集卷积显著图 salient object detection coordinate attention Transformer feature interaction dilated convolution saliency map

分类号 TP389.1 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

1王晓雯,刘芳芳,梁博.基于生成对抗网络的图像修复算法[J].山西大学学报（自然科学版）,2023,46(5):1085-1093.
2高磊,沈侯森,闵帆.基于密集扩张卷积残差网络的地震数据随机噪声压制方法[J].石油物探,2023,62(4):655-668. 被引量：1
3曹义亲,符杨逸,饶哲初.加权密集扩张卷积网络的随机脉冲噪声去除[J].计算机工程与应用,2023,59(18):179-189.
4陈元妹,王凤随,钱亚萍,王路遥.基于特征细化的多标签学习无监督行人重识别[J].浙江理工大学学报（自然科学版）,2023,49(6):755-763.
5徐贵冬,徐杨,邓辉,莫寒.改进高分辨率网络的多目标动物姿态估计研究[J].计算机工程与应用,2023,59(22):182-192.
6文凯,薛晓,季娟.面向复杂图像分类的共享转换矩阵胶囊网络[J].计算机应用,2023,43(11):3411-3417.
7关新宇,孙涵.基于不确定性加权混合训练的无源域自适应[J].计算机技术与发展,2023,33(11):135-142.
8苏晓嵩,方吉.基于刚-柔耦合的C80型货车疲劳分析[J].现代机械,2023(5):68-73.
9蒋武君,支力佳,张少敏,周涛.基于通道残差嵌套U结构的CT影像肺结节分割方法[J].图学学报,2023,44(5):879-889.
10王法胜,李富,尹双双,王星,孙福明,朱兵.全天实时跟踪无人机目标的多正则化相关滤波算法[J].自动化学报,2023,49(11):2409-2425. 被引量：2

工程科学与技术

2023年第6期

浏览历史

内容加载中请稍等...

3分支多层次Transformer特征交互的RGB–D显著性目标检测

相关作者

相关机构

相关主题

浏览历史