基于门控时空注意力的视频帧预测模型

Video Frame Prediction Model Based on Gated Spatio-Temporal Attention

下载PDF

导出

摘要针对循环式视频帧预测架构存在精度低、训练缓慢,以及结构复杂和误差累积等问题,提出了一种基于门控时空注意力的视频帧预测模型。首先,通过空间编码器提取视频帧序列的高级语义信息,同时保留背景特征;其次,建立门控时空注意力机制,采用多尺度深度条形卷积和通道注意力来学习帧内及帧间的时空特征,并利用门控融合机制平衡时空注意力的特征学习能力;最后,由空间解码器将高级特征解码为预测的真实图像,并补充背景语义以完善细节。在Moving MNIST、TaxiBJ、WeatherBench、KITTI数据集上的实验结果显示,同多进多出模型SimVP相比,MSE分别降低了14.7%、6.7%、10.5%、18.5%,在消融扩展实验中,所提模型达到了较好的综合性能,具有预测精度高、计算量低和推理效率高等优势。 A video frame prediction model based on gated spatio-temporal attention was proposed to address the issues of low accuracy,slow training,complex structure,and error accumulation in recurrent video frame prediction architectures.Firstly,high-level semantic information of the video frame sequence was extracted by a spatial encoder while preserving background features.Secondly,a gated spatio-temporal attention mechanism was established,utilizing multi-scale deep bar convolutions and channel attention to learn both intra-frame and inter-frame spatio-temporal features.A gate fusion mechanism was employed to balance the feature learning capability of spatiotemporal attention.Finally,a spatial decoder reconstructed the high-level features into predicted realistic images and complements background semantics to enhance the details.Experimental results on the Moving MNIST,Taxi-BJ,WeatherBench,and KITTI datasets showed that compared to the multi-input multi-output model SimVP,the mean squared error(MSE)was reduced by 14.7%,6.7%,10.5%,and 18.5%,respectively.In ablation and expansion experiments,the proposed model achieved good overall performance,demonstrating advantages such as high prediction accuracy,low computational complexity,and efficient inference.

作者李卫军张新勇高庾潇顾建来刘锦彤 LI Weijun;ZHANG Xinyong;GAO Yuxiao;GU Jianlai;LIU Jintong(School of Computer Science and Engineering,North Minzu University,Yinchuan 750021,China;The Key Laboratory of Images and Graphics Intelligent Processing of State Ethnic Affairs Commission,North Minzu University,Yinchuan 750021,China)

机构地区北方民族大学计算机科学与工程学院北方民族大学图像图形智能处理国家民委重点实验室

出处《郑州大学学报（工学版）》北大核心 2024年第1期70-77,121,共9页 Journal of Zhengzhou University（Engineering Science）

基金中央高校基本科研业务费专项资金(2021JCYJ12) 国家自然科学基金资助项目(61962001) 宁夏自然科学基金资助项目(2021AAC03215) 北方民族大学研究生创新项目(YCX23147)。

关键词视频帧预测卷积神经网络注意力机制门控卷积编解码网络 video frame prediction convolutional neural network attention mechanism gated convolution codec network

分类号 TP391.41 [自动化与计算机技术—计算机应用技术] TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

1侯焕鹏,陆继钊,李永杰,赵景隆.5G通信基站中的能效优化策略[J].通信电源技术,2023,40(20):171-173.
2刘硕,王启慧,王志鹏.激光增材制造体育器材用TC4钛合金疲劳裂纹扩展行为研究[J].精密成形工程,2024,16(1):59-65.
3李金泽,许杰.基于传输线模型法ITO/p-Si接触性能研究实验设计[J].实验科学与技术,2023,21(6):14-18.
4刘诗瑶,巩玉同,杨晓,张楠楠,刘会斌,梁长海,陈霄.有富电子镍位点的耐酸金属间化合物CaNi_(2)Si_(2)催化剂用于不饱和有机酸酐/酸的水相加氢[J].Chinese Journal of Catalysis,2023,50(7):260-272.

郑州大学学报（工学版）

2024年第1期

浏览历史

内容加载中请稍等...

基于门控时空注意力的视频帧预测模型

相关作者

相关机构

相关主题

浏览历史