摘要
有效利用多模态数据的不同特征能够提高行为识别性能,其核心问题在于多模态融合,主要包括在数据层面、特征层面和预测分数层面融合不同模态数据的特征信息。研究在特征和预测分数2个层面通过多教师知识蒸馏的多模态融合方法,将多模态数据的互补特征迁移到RGB网络,以及采用不同知识蒸馏损失函数和模态组合的行为识别效果。提出一种基于知识蒸馏的多模态行为识别方法,通过在特征上采用MSE损失函数、在预测分数上采用KL散度进行知识蒸馏,并采用原始的骨骼模态和光流模态的教师网络的组合进行多模态融合,使RGB学生网络同时学习到光流和骨骼教师网络的特征语义信息和预测分布信息,从而提高识别准确率。实验结果表明,该方法在常用的多模态数据集NTU RGB+D 60、UTD-MHAD和N-UCLA以及单模态数据集HMDB51上分别达到90.09%、95.12%、97.82%和81.26%的准确率,在UTD-MHAD数据集上的识别准确率相比于单模态RGB数据分别提升3.49、2.54、3.21和7.34个百分点。
The multi-modality fusion method is a core technique for effectively exploring complementary features from multiple modalities to improve action recognition performance at data-,feature-,and decision-level fusion.This study mainly investigated the multimodality fusion method at the feature and decision levels through knowledge distillation,transferring feature learning from other modalities to the RGB model,including the effects of different loss functions and fusion strategies.A multi-modality distillation fusion method is proposed for action recognition,whereby knowledge distillation is performed using the MSE loss function at the feature level,KL divergence at the decision-prediction level,and a combination of the original skeleton and optical flow modalities as multi-teacher networks so that the RGB student network can simultaneously learn with better recognition accuracy.Extensive experiments show that the proposed method achieved state-of-the-art performance with 90.09%,95.12%,97.82%,and 81.26%accuracies on the NTU RGB+D 60,UTD-MHAD,N-UCLA,and HMDB51 datasets,respectively.The recognition accuracy on the UTD-MHAD dataset has increased by 3.49,2.54,3.21,and 7.34 percentage points compared to single mode RGB data,respectively.
作者
詹健浩
甘利鹏
毕永辉
曾鹏
李晓潮
ZHAN Jianhao;GAN Lipeng;BI Yonghui;ZENG Peng;LI Xiaochao(School of Electronic Science and Engineering,Xiamen University,Xiamen 361005,Fujian,China;Xiamen Meiya Pico Information Co.,Ltd.,Xiamen 361016,Fujian,China;Xiamen Public Security Bureau,Xiamen 361104,Fujian,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2023年第10期280-288,297,共10页
Computer Engineering
基金
福建省高校产学研联合创新项目(2022H6004)
集成电路设计与测试分析福建省高校重点实验室基金
厦门大学马来西亚研究基金(XMUMRF/2019-C4/IECE/0008)。
关键词
行为识别
知识蒸馏
多模态融合
深度学习
多教师网络
action recognition
knowledge distillation
multi-modality fusion
deep learning
multi-teacher network