摘要
由于视频数据在时空维度上具有复杂和冗余的信息。针对这个问题,提出运动模块,该模块基于时空特征去计算像素特征之间的时空差异。将动态的时空差异分解为两个分支进行处理,一个分支用于修正相邻帧间特征差上的时空位移,另一个分支用于捕获此时间差上的上下文信息。在当前时间差中,对时空差异的像素点的概率分布进行建模。结果表明,在尽量不影响计算量(flops)与参数量的情况下,运动模块提高了视频识别任务方面的性能,并在公共数据集上证实了其有效性和效率。
Video data has complex and redundant information in time and space dimensions.In order to solve this problem,we designed a motion module.This module calculated the temporal and spatial differences between pixels based on time and space features.The dynamic spatiotemporal differences were decomposed into two branches for processing.One branch was used to correct the temporal and spatial displacements on adjacent frames,and the other one was used to capture contextual information at adjacent moments.In the time interval of adjacent frames,the temporal and spatial probability distribution of pixels was modeled.The results show that the motion module improves the performance of video recognition while slightly affecting flops and parameters.Its effectiveness and efficiency was verified on public datasets.
作者
史亚琪
赵峰
Shi Yaqi;Zhao Feng(School of Computer and Information Security,Guilin University of Electronic Technology,Guilin 541004,Guangxi,China)
出处
《计算机应用与软件》
北大核心
2024年第4期179-184,共6页
Computer Applications and Software
基金
广西重点研发计划项目(桂科AB19110044)。
关键词
深度学习
时空特征
特征融合
行为识别
Deep learning
Spatiotemporal features
Feature fusion
Behavior recognition