期刊文献+

尺度自适应生成调控的弱监督视频实例分割

Weakly supervised video instance segmentation with scale adaptive generation regulation
下载PDF
导出
摘要 视频实例分割是车辆辅助驾驶多目标感知和场景理解的一项关键技术。针对弱监督视频实例分割仅使用边界框对网络进行训练严重制约交通场景大尺度动态范围目标分割精度的问题,本文提出尺度自适应生成调控弱监督视频实例分割网络(Scale Adaptive Generation Regulation weakly supervised video instance segmentation network,SAGRNet)。首先,设计一种多尺度特征映射贡献度动态自适应调控模块,通过动态调整不同尺度特征映射信息贡献度取代原有的线性加权以强化对目标局部位置和整体轮廓的聚焦能力,解决了车辆、行人等目标由于成像距离远近造成的尺度动态范围过大问题;其次,构建目标实例多细粒度空间信息聚合生成调控模块,通过聚合基于不同空洞率提取的多细粒度空间信息生成权重参数以调控各尺度特征,实现了细化实例边界和增强跨通道信息交互掩码特征映射表征能力,有效弥补了实例边缘信息匮乏导致边缘轮廓分割mask连续性缺失问题。最后,为缓解边界框标签监督信息弱化,引入正交损失和颜色相似性损失缩小模型预测mask与真实边界框偏差并计算逐像素点对间标签属性归类模糊问题。Youtube-VIS2019提取的交通场景数据集实验结果表明,SAGRNet相较于弱监督基准网络平均分割精度提升5.1%达到38.1%,为实现多目标感知和实例级场景理解提供了有效算法依据。 Video instance segmentation is critical in multi-target perception and scene understanding in as⁃sisted driving.However,as weakly supervised video instance segmentation is often applied to bounding box annotations for network training,the segmentation accuracies of targets with large-scale dynamic rang⁃es in traffic scenes are severely restricted.To address this issue,we propose a scale adaptive generation regulation weakly supervised video instance segmentation network(SAGRNet).First,a multi-scale fea⁃ture mapping contribution dynamic adaptive control module is proposed to replace the original linear weighting.This enables placing the focus on the local position and global contour of the target by dynami⁃cally adjusting the contribution of different scale feature mapping information,which solves the problem of large-scale dynamic ranges caused by changes in the imaging distance between vehicles and pedestrians.Second,a target instance multi-fine-grained spatial information aggregation generation control module is constructed to regulate the feature maps of each scale using weight parameters,which are obtained by ag⁃gregating multi-fine-grained spatial information extracted based on different dilations.This module refines the instance boundary and improves the representation of cross-channel mask interaction information,effec⁃tively compensating for the lack of edge contour segmentation mask continuity caused by limited instance edge information.Finally,to alleviate the weak supervision derived from bounding box level annotations,orthogonal and color similarity losses are introduced to reduce the deviation between the model prediction mask and real bounding box and to address the pixel-wise label attribute classification ambiguity problem.Experimental results on a traffic scene dataset extracted from Youtube-VIS2019 indicate that the SAGRNet improves the mean accuracy by 5.1%to 38.1%compared with the weakly supervised base⁃line.These results prove that our method provides an effective theoretical basis for multi-target perception and instance level scene understanding.
作者 张印辉 海维琪 何自芬 黄滢 陈东东 ZHANG Yinhui;HAI Weiqi;HE Zifen;HUANG Ying;CHEN Dongdong(Faculty of Mechanical and Electrical Engineering,Kunming University of Science and Technology,Kunming 650500,China)
出处 《光学精密工程》 EI CAS CSCD 北大核心 2023年第18期2736-2751,共16页 Optics and Precision Engineering
基金 国家自然科学基金资助项目(No.62061022,No.62171206,No.61761024)。
关键词 辅助驾驶 弱监督 视频实例分割 自适应生成调控 细粒度 assisted driving weakly supervised video instance segmentation adaptive generation regu⁃lation fine grain
  • 相关文献

参考文献4

二级参考文献25

共引文献63

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部