摘要
在图像语义分割中使用卷积网络进行特征提取时,由于最大池化和下采样操作的重复组合引起了特征分辨率降低,从而导致上下文信息丢失,使得分割结果失去对目标位置的敏感性。虽然基于编码器-解码器架构的网络通过跳跃连接在恢复分辨率的过程中逐渐细化了输出精度,但其将相邻特征简单求和的操作忽略了特征之间的差异性,容易导致目标局部误识别等问题。为此,文中提出了基于深度特征融合的图像语义分割方法。该方法采用多组全卷积VGG16模型并联组合的网络结构,结合空洞卷积并行高效地处理金字塔中的多尺度图像,提取了多个层级的上下文特征,并通过自顶向下的方法逐层融合,最大限度地捕获上下文信息;同时,以改进损失函数而得到的逐层标签监督策略为辅助支撑,联合后端像素建模的全连接条件随机场,无论是在模型训练的难易程度还是预测输出的精度方面都有一定的优化。实验数据表明,通过对表征不同尺度上下文信息的各层深度特征进行逐层融合,图像语义分割算法在目标对象的分类和空间细节的定位方面都有所提升。在PASCAL VOC 2012和PASCAL CONTEXT两个数据集上获得的实验结果显示,所提方法分别取得了80.5%和45.93%的mIoU准确率。实验数据充分说明,并联框架中的深度特征提取、特征逐层融合和逐层标签监督策略能够联合优化算法架构。特征对比表明,该模型能够捕获丰富的上下文信息,得到更加精细的图像语义特征,较同类方法具有明显的优势。
When feature extraction is performed by using convolutional networks in image semantic segmentation,the context information is lost due to the reduced resolution of features by the repeated combination of maximum pooling and downsampling operations,so that the segmentation result loses the sensitivity to the object location.Although the network based on the encoder-decoder architecture gradually refines the output precision through the jump connection in the process of restoring the resolution,the operation of simply summing the adjacent features ignores the difference between the features and easily leads to local mis-identification of objects and other issues.To this end,an image semantic segmentation method based on deep feature fusion was proposed.It adopts a network structure in which multiple sets of fully convolutional VGG16 models are combined in parallel,processes multi-scale images in the pyramid in parallel efficiently with atrous convolutions,extracts multi-level context feature,and fuses layer by layer through a top-down method to capture the context information as far as possible.At the same time,the layer-by-layer label supervision strategy based on the improved loss function is an auxiliary support with a dense conditional random field of pixels modeling in the backend,which has certain optimization in terms of the difficulty of model training and the accuracy of predictive output.Experimental data show that the image semantic segmentation algorithm improves the classification of target objects and the location of spatial details by layer-by-layer fusion of deep features that characterize different scale context information.The experimental results obtained on PASCAL VOC 2012 and PASCAL CONTEXT datasets show that the proposed method achieves mIoU accuracy of 80.5%and 45.93%,respectively.The experimental data fully demonstrate that deep feature extraction,feature layer-by-layer fusion and layer-by-layer label supervision strategy in the parallel framework can jointly optimize the algorithm architecture.The feature comparison shows that the model can capture rich context information and obtain more detailed image semantic features.Compared with similar methods,it has obvious advantages.
作者
周鹏程
龚声蓉
钟珊
包宗铭
戴兴华
ZHOU Peng-cheng;GONG Sheng-rong;ZHONG Shan;BAO Zong-ming;DAI Xing-hua(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China;School of Computer Science and Engineering,Changshu Institute of Technology,Suzhou,Jiangsu 215500,China)
出处
《计算机科学》
CSCD
北大核心
2020年第2期126-134,共9页
Computer Science
基金
国家自然科学基金项目(61272005
61702055)
江苏省自然科学基金项目(BK20151254,BK20151260)
江苏省六大高峰人才项目(DZXX-027)
教育部科技发展中心“云数融合科教创新”基金(2017B03112)~~
关键词
图像语义分割
深度特征
空洞卷积
特征融合
上下文信息
条件随机场
Image semantic segmentation
Deep feature
Atrous convolution
Feature fusion
Context information
Conditional random field