摘要
文章主要研究半监督视频目标分割任务,输入一个完整视频及首帧的像素级标注(掩膜),使用端到端的深度神经网络模型来预测后续帧的掩膜。该模型使用残差卷积网络进行深度特征提取,通过层次级联模块实现各层次不同分辨率特征的交互融合,以此捕捉不同尺寸的目标,并通过尺度融合模块处理视频帧的细节和语义信息,生成像素级分类标注。在主流视频数据集上的实验结果表明,该模型具有优秀的预测分割能力和令人满意的运行速度,其各项测试指标均达到该领域的先进水平。
This paper mainly studies the task of semi-supervised video object segmentation,which takes a complete video and the pixel-level annotation(mask)of the first frame as input,and then an end-to-end deep neural network model will predict the mask of following frames.The model uses residual convolution network for deep feature extraction and in order to capture targets of different sizes,the layer cascade module is designed for features interaction and fusion of different resolutions at each layer,and the scale fusion module processes the details and semantic information of video frames and generate the pixel-level classification label.Experimental results on mainstream video datasets show that the model has excellent segmentation capabilities and satisfactory computing rate,and its various test indicators have reached the advanced level in this field.
作者
李家盛
LI Jia-sheng(Department of Computer Science and Technology,Hangzhou Dianzi University,Hangzhou 310018,China)
出处
《电脑与信息技术》
2022年第1期21-23,27,共4页
Computer and Information Technology
关键词
视频处理
视频目标分割
神经网络
video processing
video object segmentation
neural network