期刊文献+

结合Transformer的显著性目标检测

Salient Object Detection with Transformer
原文传递
导出
摘要 显著性目标检测中学习有效的全局卷积特征至关重要。卷积神经网络模型越深越能获得更好的全局感受野,但这样往往会丢失局部信息,还会导致目标边缘粗糙。为了解决这个问题,引用了一个新的基于注意力的编码器Vision Transformer,相比于CNN(convolutional neural network)而言,可以表示浅层到深层的全局特征,并建立图像中各区域的自注意力关系。具体地,首先采用Transformer编码器提取目标特征,编码器在浅层中保留了更多的局部边缘信息,以恢复最终显著图的空间细节。然后,利用Transformer编码器前后层之间继承的全局信息,将Transformer每一层输出特征最终预测。在此基础上,浅层的边缘监督以获取丰富的边缘信息,再将浅层信息与全局位置信息相结合。最后,在解码器中采用渐近融合的方式生成最终显著性图,促进高层信息和浅层信息地充分融合,更准确地定位显著目标及其边缘。实验结果表明,在5个广泛使用的数据集上,在不进行任何后处理的情况下,提出的方法性能好于最先进的方法。 It is essential to effectively learn global convolutional features for saliency object detection.However,global receptive fields can only be obtained by stacking network layers,which results in losing local information and rough object edges.To circumvent this issue,a new attention-based encoder vision transformer is introduced.Compared with the convolutional neural network,the proposed model can represent the features from shallow to deep layers globally and establish the self-attention relationship of each region in the image.Specifically,we use a transformer encoder to extract the object features.Remarkably,the encoder retains more local edge information in the shallow layer to recover the spatial details of the final saliency map.Consequently,as each layer of the transformer encoder inherits the global information of the previous layer,each of its output features benefits from the final prediction.Furthermore,we supervise the salient edge in the shallow layers to obtain edge information and then combined the shallow and global position information.In the decoder,the final salient image is generated using progressive fusion.Thus,not only the global and local information are fully fused,but also the salient object and its edge can be more accurately predicted.Experimental results show that this method achieved better results on five widely used benchmark datasets than most state-of-the-art methods without any post-processing.
作者 闫於虎 王永雄 潘志群 YAN Yuhu;WANG Yongxiong;PAN Zhiqun(College of Science,University of Shanghai for Science and Technology,Shanghai 200093,China;School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
出处 《信息与控制》 CSCD 北大核心 2023年第3期382-390,共9页 Information and Control
基金 上海市自然科学基金(22ZR1443700)。
关键词 TRANSFORMER 显著性检测 边缘监督 渐近融合 Transformer salient object detection edge supervise progressive fusion
  • 相关文献

参考文献2

二级参考文献16

共引文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部