面向视频动作识别的时空解耦卷积神经网络

Spatio-temporal decoupled convolutional neural networkfor video action recognition

下载PDF

导出

摘要基于常规3D卷积的视频动作识别模型提取的时空特征是耦合的,无法有效区分动作类别,限制了其识别准确性。对此以最新的3D卷积视频动作识别网络为基础,设计了一种时空解耦模块,该模块包含时间和空间两个并行分支,分别沿时间方向和空间方向进行特征提取,最后将解耦的时空特征融合后输出。同时,为了降低冗余空间信息的影响,提出一种时间注意力模块,在时空解耦模块前对输入特征的冗余空间信息进行抑制。在HMDB51和UCF101数据集上进行了时空解耦模块和时间注意力模块的验证实验。实验结果表明,两种模块均能有效提高模型的识别准确率。在HMDB51数据集上的对比实验结果表明,时空解耦卷积神经网络模型准确率比基础网络MoViNetA0提高了2.66百分点。 Video action recognition models based on traditional 3D convolution tend to couple spatial and temporal features,which limits their effectiveness in distinguishing action categories and reduces the recognition accuracy.To overcome this challenge,based on the latest 3D convolutional video action recognition network,a spatio-temporal decoupling module was designed,which comprised two parallel branches,one for temporal feature extraction and another for spatial feature extraction,respectively.The extracted features from both branches were then fused and outputted as decoupled spatio-temporal features.Meanwhile,to reduce the impact of redundant spatial information,a temporal attention module was proposed to suppress the redundant spatial information before the spatio-temporal decoupling module.Validation experiments were conducted on HMDB51 and UCF101 dataset with the spatio-temporal decoupling module and the temporal attention module.The experimental results demonstrate that both proposed modules are effective in improving the accuracy of the model.The comparative experiment results on the HMDB51 dataset show that,compared with the basic network MoViNetA0,the spatio-temporal decoupled convolutional nenral network improves the accuracy by 2.66 percentage points.

作者郝伟吕学强韩晶 HAO Wei;L Xueqiang;HAN Jing(Beijing Key Laboratory of Measurement and Control of Mechanical and Electrical System,Beijing Information Science&Technology University,Beijing 100192,China;Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science&Technology University,Beijing 100101,China)

机构地区北京信息科技大学机电系统测控北京市重点实验室北京信息科技大学网络文化与数字传播北京市重点实验室

出处《北京信息科技大学学报（自然科学版）》 2023年第5期19-24,共6页 Journal of Beijing Information Science and Technology University

基金国家自然科学基金资助项目(62171043) 北京市自然科学基金资助项目(4212020)。

关键词视频动作识别 3D卷积卷积神经网络时间注意力时空解耦 video action recognition 3D convolution convolutional neural network(CNN) temporal attention spatio-temporal decoupling

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1胡杰,张志豪,陈瑞楠,陈锐鹏,刘昊岩,朱琪,陈晖.基于改进混合A^(*)的智能汽车时空联合规划方法[J].汽车工程,2023,45(7):1123-1133. 被引量：3
2张宁,穆静,钱智哲,张洁,郭岱朋.非受控环境下基于混合注意力机制的面部表情识别[J].西安工业大学学报,2023,43(5):495-502.
3刁曙豪,叶昊亮,金飞.基于SDN的高校全光网架构研究[J].移动信息,2023,45(9):53-55. 被引量：1
4李建威,吕晓琪,谷宇.基于改进ConvNeXt的皮肤镜图像分类方法[J].计算机工程,2023,49(10):239-246. 被引量：4
5杨克虎,龙启航,汪嘉文,彭宝山,金波,杨学孟.基于自注意力机制的矿井次光照图像语义分割研究[J].矿业安全与环保,2023,50(5):9-18. 被引量：2
6刘占省,武乐佳,刘子圣.面向全生命期的多维多尺度智能建造体系[J].天津大学学报（自然科学与工程技术版）,2023,56(12):1295-1306. 被引量：5

北京信息科技大学学报（自然科学版）

2023年第5期

浏览历史

内容加载中请稍等...

面向视频动作识别的时空解耦卷积神经网络

相关作者

相关机构

相关主题

浏览历史