摘要
针对循环式视频帧预测架构存在精度低、训练缓慢,以及结构复杂和误差累积等问题,提出了一种基于门控时空注意力的视频帧预测模型。首先,通过空间编码器提取视频帧序列的高级语义信息,同时保留背景特征;其次,建立门控时空注意力机制,采用多尺度深度条形卷积和通道注意力来学习帧内及帧间的时空特征,并利用门控融合机制平衡时空注意力的特征学习能力;最后,由空间解码器将高级特征解码为预测的真实图像,并补充背景语义以完善细节。在Moving MNIST、TaxiBJ、WeatherBench、KITTI数据集上的实验结果显示,同多进多出模型SimVP相比,MSE分别降低了14.7%、6.7%、10.5%、18.5%,在消融扩展实验中,所提模型达到了较好的综合性能,具有预测精度高、计算量低和推理效率高等优势。
A video frame prediction model based on gated spatio-temporal attention was proposed to address the issues of low accuracy,slow training,complex structure,and error accumulation in recurrent video frame prediction architectures.Firstly,high-level semantic information of the video frame sequence was extracted by a spatial encoder while preserving background features.Secondly,a gated spatio-temporal attention mechanism was established,utilizing multi-scale deep bar convolutions and channel attention to learn both intra-frame and inter-frame spatio-temporal features.A gate fusion mechanism was employed to balance the feature learning capability of spatiotemporal attention.Finally,a spatial decoder reconstructed the high-level features into predicted realistic images and complements background semantics to enhance the details.Experimental results on the Moving MNIST,Taxi-BJ,WeatherBench,and KITTI datasets showed that compared to the multi-input multi-output model SimVP,the mean squared error(MSE)was reduced by 14.7%,6.7%,10.5%,and 18.5%,respectively.In ablation and expansion experiments,the proposed model achieved good overall performance,demonstrating advantages such as high prediction accuracy,low computational complexity,and efficient inference.
作者
李卫军
张新勇
高庾潇
顾建来
刘锦彤
LI Weijun;ZHANG Xinyong;GAO Yuxiao;GU Jianlai;LIU Jintong(School of Computer Science and Engineering,North Minzu University,Yinchuan 750021,China;The Key Laboratory of Images and Graphics Intelligent Processing of State Ethnic Affairs Commission,North Minzu University,Yinchuan 750021,China)
出处
《郑州大学学报(工学版)》
北大核心
2024年第1期70-77,121,共9页
Journal of Zhengzhou University(Engineering Science)
基金
中央高校基本科研业务费专项资金(2021JCYJ12)
国家自然科学基金资助项目(61962001)
宁夏自然科学基金资助项目(2021AAC03215)
北方民族大学研究生创新项目(YCX23147)。
关键词
视频帧预测
卷积神经网络
注意力机制
门控卷积
编解码网络
video frame prediction
convolutional neural network
attention mechanism
gated convolution
codec network