一种基于ConvMixer骨干的显著性目标检测模型

A saliency object detection model based on ConvMixer backbone

下载PDF

导出

摘要显著性目标检测(Saliency Object Detection,SOD)算法多采用基于卷积神经网络(Convolutional Neural Network,CNN)的骨干网络提取特征,然而CNN无法获取图像的长范围特征依赖。视觉转换器(Vision Transformer,ViT)将图像划分为图块(patch),通过Transformer在patch之间传播全局上下文信息获得长范围特征依赖,但Transformer的自注意力层具有二次方的时间复杂性。因此,提出一种低复杂性的基于patch的SOD算法CM-PoolNet,对经典的显著性目标检测PoolNet模型的骨干网络进行改进,使用卷积模型ConvMixer替换VGG和RestNet,提出新的特征融合方法。基于U型结构,编码器对输入图像进行Patch Embedding,送入重复堆叠的由深度可分离卷积和膨胀卷积构成的ConvMixer特征提取器中。为解码器设计了基于patch的特征融合模块。设计了BCE、SSIM和IOU 3种损失,引导模型在像素级、图块级、特征图级3级层次中学习输入图像和真值图像之间的转换。在DUTS数据集和ECSSD数据集上进行实验,结果表明:提出的模型能够有效地分割突出的目标区域,并且准确预测具有清晰边界的精细结构。 Saliency object detection(SOD)algorithms mostly use a backbone network based on Convolutional Neural Network(CNN)to extract features.However,CNN cannot obtain long-range feature dependence of images.Vision Transformer(ViT)divides the image into patches and propagates the global context information between patches through the transformer to obtain long-range feature dependence,but the transformer s self attention layer has quadratic time complexity.Therefore,we propose a low-complexity patch-based SOD algorithm CM-PoolNet,which improves the backbone network of the classical PoolNet model for saliency target detection,replaces VGG and ResNet using the convolutional model ConvMixer and proposes a new feature fusion method.Specifically,based on the U-shaped structure,the encoder performs Patch Embedding on the input image and feeds it into the ConvMixer feature extractor consisting of deep detachable convolution and dilatation convolution,which is stacked repeatedly.A patch-based feature fusion module is designed for the decoder.Three kinds of losses,BCE,SSIM and IOU,are designed to guide the model to learn the conversion between the input image and the truth image at the pixel level,block level and feature level.Experiments on DUTS datasets and ECSSD datasets show that the proposed model can effectively segment prominent target areas and accurately predict fine structures with clear boundaries.

作者张斯博朱敬华奚赫然杜欣月 ZHANG Si-Bo;ZHU Jing-Hua;XI He-Ran;DU Xin-Yue(School of Computer Science and Technology,Heilongjiang University,Habin 150080,China)

机构地区黑龙江大学计算机科学技术学院

出处《黑龙江大学工程学报（中英俄文）》 2024年第1期48-57,共10页 Journal of Engineering of Heilongjiang University

基金国家自然科学基金项目(82374626)。

关键词显著性目标检测补丁嵌入混合损失函数 PoolNet ConvMixer saliency object detection patch embedding mixed loss function PoolNet ConvMixer

分类号 TP751 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

1徐昊,郭黎,李润泽.基于紧凑型Vision transformer的细粒度视觉分类[J].控制与决策,2024,39(3):893-900. 被引量：1
2Shuo Zhao,Peng Cui,Jing Shen,Haibo Liu.Local saliency consistency-based label inference for weakly supervised salient object detection using scribble annotations[J].CAAI Transactions on Intelligence Technology,2024,9(1):239-249.
3Chandra Setiawan.The Development of Confucianism as a Religion in Indonesia[J].走进孔子（中英文）,2024(1):114-123.

黑龙江大学工程学报（中英俄文）

2024年第1期

浏览历史

内容加载中请稍等...

一种基于ConvMixer骨干的显著性目标检测模型

相关作者

相关机构

相关主题

浏览历史