摘要
基于常规3D卷积的视频动作识别模型提取的时空特征是耦合的,无法有效区分动作类别,限制了其识别准确性。对此以最新的3D卷积视频动作识别网络为基础,设计了一种时空解耦模块,该模块包含时间和空间两个并行分支,分别沿时间方向和空间方向进行特征提取,最后将解耦的时空特征融合后输出。同时,为了降低冗余空间信息的影响,提出一种时间注意力模块,在时空解耦模块前对输入特征的冗余空间信息进行抑制。在HMDB51和UCF101数据集上进行了时空解耦模块和时间注意力模块的验证实验。实验结果表明,两种模块均能有效提高模型的识别准确率。在HMDB51数据集上的对比实验结果表明,时空解耦卷积神经网络模型准确率比基础网络MoViNetA0提高了2.66百分点。
Video action recognition models based on traditional 3D convolution tend to couple spatial and temporal features,which limits their effectiveness in distinguishing action categories and reduces the recognition accuracy.To overcome this challenge,based on the latest 3D convolutional video action recognition network,a spatio-temporal decoupling module was designed,which comprised two parallel branches,one for temporal feature extraction and another for spatial feature extraction,respectively.The extracted features from both branches were then fused and outputted as decoupled spatio-temporal features.Meanwhile,to reduce the impact of redundant spatial information,a temporal attention module was proposed to suppress the redundant spatial information before the spatio-temporal decoupling module.Validation experiments were conducted on HMDB51 and UCF101 dataset with the spatio-temporal decoupling module and the temporal attention module.The experimental results demonstrate that both proposed modules are effective in improving the accuracy of the model.The comparative experiment results on the HMDB51 dataset show that,compared with the basic network MoViNetA0,the spatio-temporal decoupled convolutional nenral network improves the accuracy by 2.66 percentage points.
作者
郝伟
吕学强
韩晶
HAO Wei;L Xueqiang;HAN Jing(Beijing Key Laboratory of Measurement and Control of Mechanical and Electrical System,Beijing Information Science&Technology University,Beijing 100192,China;Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science&Technology University,Beijing 100101,China)
出处
《北京信息科技大学学报(自然科学版)》
2023年第5期19-24,共6页
Journal of Beijing Information Science and Technology University
基金
国家自然科学基金资助项目(62171043)
北京市自然科学基金资助项目(4212020)。
关键词
视频动作识别
3D卷积
卷积神经网络
时间注意力
时空解耦
video action recognition
3D convolution
convolutional neural network(CNN)
temporal attention
spatio-temporal decoupling