期刊文献+

基于Vision Transformer的时空卷积网络设计

Spatiotemporal Convolutional Network Design Based on Vision Transformer
下载PDF
导出
摘要 目前主流人体动作识别大部分都是基于卷积神经网络(Convolutional Neural Network,CNN)实现,而CNN容易忽略视频中的空间位置信息,从而降低了视频空间频域中动作识别能力。同时传统CNN不能快速定位到关键的特征位置,并且在训练过程中不能并行计算导致效率低。为了解决传统CNN在处理时间频域和多并行计算问题,提出了基于视觉Transformer(Vision Transformer,ViT)和3D卷积网络学习时空特征(Learning Spatiotemporal Features with 3D Convolutional Network,C3D)的人体动作识别算法。使用C3D提取视频的多维特征图、ViT的特征切片窗口对多维特征进行全局特征分割;使用Transformer的编码-解码模块对视频中人体动作进行预测。实验结果表明,所提的人体动作识别算法在UCF-101、HMDB51数据集上提高了动作识别的准确率。 At present,the mainstream human action recognition is mostly based on Convolutional Neural Network(CNN),which tend to ignore the spatial position information in the video,thus reducing the action recognition ability in the spatial frequency domain of video.At the same time,the traditional CNN can not locate the key feature position quickly,and the parallel computation efficiency is low in the training process.In order to solve the problems of processing time-frequency domain and multi-parallel computation in traditional CNN,a human action recognition algorithm based on the Vision Transformer(ViT)and Learning Spatiotemporal Features with 3D Convolutional Network(C3D)is proposed.Firstly,the multi-dimensional feature map of the video is extracted by using C3D.The feature slice window of ViT is used for global feature segmentation of multi-dimensional features.Finally,the coding-decoding module of transformer is used to predict human actions in the video.The experimental results show that the accuracy of action recognition is improved by the human action recognition algorithm on UCF-101 and HMDB51 datasets.
作者 谢英红 郝岩 韩晓微 高强 阴彪 王朝辉 XIE Yinghong;HAO Yan;HAN Xiaowei;GAO Qiang;YIN Biao;WANG Zhaohui(School of Information Engineering,Shenyang University,Shenyang 110044,China)
出处 《计算机与网络》 2024年第4期283-288,共6页 Computer & Network
关键词 动作识别 视觉Transformer 卷积神经网络 3D卷积网络学习时空特征 注意力机制 action recognition ViT CNN C3D attention mechanism
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部