摘要
大多数视频行为识别任务中都是同等处理网络提取到的时空信息,为了忽略无关信息而关注重点信息,本文设计了含有压缩奖惩机制的卷积神经网络结构,用于视频行为识别.该网络结构基于时间分段网络构建,首先将视频分为多个等长片段,从每个片段随机提取堆叠光流图像和RGB视频帧,将其分别输入到含有压缩奖惩机制的时间与空间双流卷积神经网络,通过压缩与奖惩操作,对网络提取到的特征进行加权,根据加权后的时间与空间特征分别在时间与空间两个通道上对行为作出初步预测;然后对每个片段的时间与空间初步预测结果分别融合,得到视频级预测结果;最后将视频级时间与空间预测结果融合,得到最终视频行为识别结果.在数据集UCF101与HMDB51上进行了实验,结果表明,与其他不含压缩奖惩机制的多种网络模型相比,该模型具有较高的准确率.
In most video behavior recognition tasks,the temporal-spatio information extracted by the network is treated equally.In order to ignore the irrelevant information and focus on the key information,a convolutional neural network with squeeze and excitation mechanism was designed for video behavior recognition.The network was constructed based on a temporal segment network.Firstly,the video was divided into multiple equal-length segments,accordingly,stacked optical flow images and RGB video frames were extracted from each segment randomly.For each segment,the stacked optical flow image and RGB video frame were respectively input into the temporal and spatial two-stream convolutional neural network with squeeze and excitation mechanism.Furthermore,weights were added to the features extracted from the temporal and spatial convolution network by squeeze and excitation operation.Then,according to the weighted temporal and spatial features,the preliminary predictions of behavior were made on the temporal and spatial channels.The video-level predictions of temporal and spatial were obtained by merging the preliminary predictions of temporal and spatial for each segment.Finally,the video-level predictions of temporal and spatial were combined to obtain the final video behavior recognition result.Experiments were carried out on the datasets UCF101 and HMDB51.The results showed that the accuracy of the network was higher than many other networks without squeeze and excitation mechanism.
作者
张丽红
郭磊
ZHANG Lihong;GUO Lei(College of Physics And Electronic Engineering, Shanxi University, Taiyuan 030006, China)
出处
《测试技术学报》
2020年第5期418-424,共7页
Journal of Test and Measurement Technology
基金
山西省科技攻关计划(工业)资助项目(2015031003-1)。
关键词
视频行为识别
压缩奖惩机制
时间分段网络
双流卷积网络
特征融合
video action recognition
squeeze and excitation mechanism
temporal segment network
two-stream convolution network
feature fusion