摘要
人体动作识别是计算机视觉领域的核心研究方向之一,在很多场合都有应用。深度卷积神经网络在静态图像识别方面已取得了巨大成功,并逐渐扩展到视频内容识别领域,但应用依然面临很大挑战。为此提出一种基于ResNeXt深度神经网络模型用于视频中的人体动作识别,主要包括:①使用新型ResNeXt网络结构代替原有的各种卷积神经网络结构,并使用RGB和光流2种模态的数据,使模型可充分地利用视频中动作外观及时序信息;②将端到端的视频时间分割策略应用于ResNeXt网络模型,同时将视频分为K段实现对视频序列的长范围时间结构进行建模,并通过测试得到最优视频分段值K,使模型能更好地区分存在子动作共享现象的相似动作,解决某些由于子动作相似而易发生的误判问题。通过在动作识别数据集UCF101和HMDB51上进行的测试表明,该模型和方法的动作识别准确率性能优于目前文献中的一些模型和方法的性能。
Human action recognition is one of the core research directions in the field of computer vision and is applied in many occasions.Deep convolutional neural networks have achieved great success in static image recognition and have gradually expanded into the field of video content recognition,but they still face great challenges in applications.This paper proposes a deep neural network model based on ResNeXt network for human action recognition in video.The main innovations of this paper include:①The new ResNeXt network structure was used to replace the original convolutional neural network structure.Two kinds of modal data of RGB and optical flow was collected to make full use of the appearance and temporal order information in the video.②The end-to-end video time segmentation strategy was applied to the proposed ResNeXt network model.The video was divided into K segments to model the long-range time structure of the video sequence,and the optimal value of K was obtained through tests,which enables the model to better distinguish the similar actions with sub-action sharing phenomenon and solve the problems of misjudgment that are easy to emerge due to similar sub-actions.Tests performed on the widely used action recognition data sets UCF101 and HMDB51 showed that the action recognition accuracy of the proposed model and method is better than that of the models and methods in the existing literature.
作者
蒋圣南
陈恩庆
郑铭耀
段建康
JIANG Sheng-nan;CHEN En-qing;ZHEN Ming-yao;DUAN Jian-kang(School of Information Engineering,Zhengzhou University,Zhengzhou Henan 450000,China)
出处
《图学学报》
CSCD
北大核心
2020年第2期277-282,共6页
Journal of Graphics
基金
国家自然科学基金项目(U1804152,61806180)。