摘要
随着网络多媒体技术的快速发展和视频采集设备的不断完善,越来越多的视频被共享到网络平台,视频逐渐占据了人类生活,因此视频理解已成为计算机视觉研究的热点之一。作为视频理解的首要任务,对动作识别的研究具有重要的意义。目前基于深度学习的二维图像识别分类方法已经取得了较大的进展,但是视频动作识别仍面临着巨大挑战。其原因在于视频和二维图像相差一个时间维度,对视频中行走、跑步、跳高和跳远等动作的理解不仅需要二维图像所具有的空间语义信息,还需要时序信息。因此,如何利用视频的时序信息对动作识别非常重要。首先介绍了动作识别的研究背景以及发展过程,分析了当前视频动作识别所面临的挑战,然后详细介绍了时序建模及参数优化的方法,分析了常用的动作识别数据集和度量参数,最后对未来的研究方向进行了展望。
With the rapid advancement of network multimedia technology and the continuous improvement of video capture equipment,an increasing number of videos are shared on network platforms,gradually becoming an integral part of human life.Consequently,video understanding has become one of the hot spots of computer vision research,with video understanding being a pivotal task.At present,2D image recognition classification methods based on deep learning have made significant strides.However,video action recognition still faces a formidable challenge.The reason is that videos differ from 2D images by an additional temporal dimension,and that understanding actions such as walking,running,high jumping,and long jumping in videos requires not only the spatial semantic information that 2D images possess but also temporal information.Therefore,effectively utilizing the temporal information of videos is critical for action recognition.This paper firstly introduced the research background and development process of action recognition,followed by an analysis of the current challenges in video action recognition.The methods of temporal modeling and parameter optimization were then presented in detail,along with an examination of the commonly used action recognition datasets and metric parameters.Finally,the paper outlined the future research directions in this field.
作者
毕春艳
刘越
BI Chun-yan;LIU Yue(Beijing Mixed Reality and New Display Engineering Technology Research Center,Beijing 100081,China;School of Optics and Photonics,Beijing Institute of Technology,Beijing 100081,China)
出处
《图学学报》
CSCD
北大核心
2023年第4期625-639,共15页
Journal of Graphics
基金
国家自然科学基金项目(61960206007)
高等学校学科创新引智计划项目(B18005)。
关键词
动作识别
视频理解
深度学习
卷积神经网络
计算机视觉
action recognition
video understanding
deep learning
convolutional neural network
computer vision