摘要
视频行为识别是智能视频分析的重要组成部分。深度学习方法在该领域有了显著的进步,目前得到最佳效果的方法都使用了双流卷积神经网络。在长视频识别中,现有的行为识别方法大多以均匀分段固定采样得到的视频帧作为输入,这可能损失采样间隔中的重要信息。通过定义视频的信息量,提出了一种用于视频行为识别的片段划分和关键帧提取方法,使用多时间尺度双流网络提取视频特征,设计了视频行为识别系统,在UCF101数据集split1上达到了目前最高的94.2%准确率。
Video action recognition is an important part of intelligent video analysis. In recent years, deep learning methods, especially the two-stream convolutional neural network achieved the state-of-the-art performance. However, most methods simply use uniform sampling to get frames, which may cause the loss of information in sampling interval. We propose a segmentation method and a key-frame extraction method for video action recognition, and combine them with a multi-temporal-scale two-stream network. Our framework achieves a 94.2% accuracy at UCF101 split1, which is the same as the state-of-the-art method's performance.
作者
李鸣晓
庚琦川
莫红
吴威
周忠
Li Mingxiao;Geng Qichuan;Mo Hong;Wu Wei;Zhou Zhong(State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China)
出处
《系统仿真学报》
CAS
CSCD
北大核心
2018年第7期2787-2793,共7页
Journal of System Simulation
基金
国家自然科学基金(61572061
61472020)
国家"863"高技术研究发展计划(2015AA016403)