期刊文献+

基于改进双流视觉Transformer的行为识别模型

Action Recognition Model Based on Improved Two Stream Vision Transformer
下载PDF
导出
摘要 针对现有行为识别方法中抗背景干扰能力差和准确率低等问题,提出了一种改进的双流视觉Transformer行为识别模型。该模型采用分段采样的方法来增加模型对长时序列数据的处理能力;在网络头部嵌入无参数的注意力模块,在降低动作背景干扰的同时,增强了模型的特征表示能力;在网络尾部嵌入时间注意力模块,通过融合时域高语义信息来充分提取时序特征。文中提出了一种新的联合损失函数,旨在增大类间差异并减少类内差异;采用决策融合层以充分利用光流与RGB流特征。针对上述改进模型,在基准数据集UCF101和HMDB51上进行消融及对比实验,消融实验结果验证了所提方法的有效性,对比实验结果表明,所提方法相比时间分段网络在两个数据集上的准确率分别提高了3.48%和7.76%,优于目前的主流算法,具有较好的识别效果。 To address the issues of poor resistance to background interference and low accuracy in existing action recognition methods,an improved dual stream visual Transformer action recognition model is proposed.The model adopts a segmented sampling method to increase its processing ability for long-term sequence data;embedding a parameter free attention module in the network header enhances the model’s feature representation ability while reducing action background interference;embedding a temporal attention module at the tail of the network to fully extract temporal features by integrating high semantic information in the time domain.A new joint loss function is proposed in the paper,aiming to increase inter class differences and reduce intra class differences.Adopting a decision fusion layer to fully utilize the features of optical flow and RGB flow.In response to the above improved model,comparative and ablation experiments are conducted on the benchmark datasets UCF101 and HMDB51.The ablation experiment results verify the effectiveness of the proposed method.The comparison results show that the accuracy of the proposed method is 3.48%and 7.76%higher than that of the time segmented network on the two datasets,respectively,which is better than the current mainstream algorithms and has good recognition performance.
作者 雷永升 丁锰 沈尧 李居昊 赵东越 陈福仕 LEI Yongsheng;DING Meng;SHEN Yao;LI Juhao;ZHAO Dongyue;CHEN Fushi(Department of Criminal Investigation,People’s Public Security University of China,Beijing 100038,China;Public Security Behavioral Science Lab,People’s Public Security University of China,Beijing 100038,China)
出处 《计算机科学》 CSCD 北大核心 2024年第7期229-235,共7页 Computer Science
基金 公安学一流学科培优行动及公共安全行为科学实验室建设项目(2023ZB02)。
关键词 行为识别 视觉Transformer SimAM无参注意力 时间注意力 联合损失 Action recognition Vision Transformer SimAM parameter-free attention Temporal attention Joint loss
  • 相关文献

参考文献10

二级参考文献14

共引文献59

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部