摘要
不同于传统的简单动作识别,群体动作识别需要理解场景中由若干人物的单人动作和他们之间的交互动作构成的复杂语义.近年来,群体动作识别在公共安全监控、体育视频分析和社会角色理解等领域的研究与应用引起了学者们的广泛关注.但是现有能够帮助学者们快速了解研究概况的中文文献很少且用于归纳和分析的依据较为笼统.为此,本文旨在综述近十年来基于深度学习的群体动作识别的研究进展.首先,本文介绍了群体动作识别的问题与定义,总结了现有解决方案的核心流程和该研究的关键挑战.然后,本文针对现有研究中的两个核心内容,即个体动作特征的提取及其关联建模,对现有文献作出了归纳与分析.具体而言,本文介绍并总结了群体动作研究中常用的人体行为特征,并将现有关联建模类型归纳为三类,即线性关联、序列关联和图关联.此外,本文还列举了现有的十二种可用于群体动作研究的视频数据集,并在三个常用数据集上对目前流行的方法进行了对比与分析.最后,本文研判了几个更具挑战的未来研究趋势.综上,本文剖析了群体动作识别的核心研究思路及未来研究趋势,有助于相关研究人员快速了解群体动作识别的研究概况.
Different from traditional action recognition focused on single individuals,group activity recognition aims to understand the complex semantics composed of individual actions and their interactions within a scene.In recent years,the application of group activity recognition in various domains such as public safety monitoring,sports video analysis,and social role understanding has garnered significant attention from researchers.However,there is a scarcity of Chinese literature providing a comprehensive overview of the research progress in this field,and the foundational aspects for induction and analysis remain vague.This paper aims to fill this gap by offering a thorough review of the progress in group activity recognition research over the past decade,with a particular focus on developments facilitated by deep learning technologies.To begin,we establish a clear problem definition for group activity recognition,differentiating it from individual action recognition by highlighting the significance of understanding group dynamics and interactions.Following this,we outline the basic pipeline common to most group activity recognition approaches,which typically involves the detection and tracking of individuals,the extraction of features pertinent to their actions,the recognition of individual actions,and the aggregation of these actions to infer group activities.Concurrently,we discuss the challenges inherent to this research field,such as the variability in group sizes,the complexity of interactions,and the diversity of possible group activities across different contexts.Delving deeper into the core aspects of group activity recognition research,this paper then provides an in-depth analysis of two critical components:the extraction of individual action features and their association modeling.We introduce several deep learningbased methods for extracting video features that are commonly employed in the study of group activities.These methods are adept at capturing the nuances of individual actions and the contextual information necessary for understanding group dynamics.Following this,we categorize existing approaches to modeling the associations between individual actions into three distinct types:linear association,sequence association,and graph association.Each type offers a unique perspective on how individual actions interact and combine to form coherent group activities,from simple linear relationships to complex,non-linear interactions represented by graphs.Furthermore,recognizing the importance of empirical research in advancing the field,this paper provides a comprehensive list of 12 existing video datasets specifically curated for group activity research.These datasets vary in terms of the scenarios they cover,from sports and public spaces to more controlled settings,thereby offering diverse opportunities for testing and improving group activity recognition algorithms.We also conduct a comparative analysis of existing methods using the two most popular datasets,highlighting their strengths and weaknesses and providing insights into their performance.In conclusion,this paper offers a comprehensive review of the advancements in group activity recognition based on deep learning over the past decade.It covers the problem definition,research challenges,feature extraction techniques,association modeling methods,evaluation datasets,and future research directions.By consolidating and analyzing the existing knowledge,this review provides researchers with valuable insights and guidance for further exploration and development in the field of group activity recognition.
作者
严锐
葛晓静
黄捧
舒祥波
唐金辉
YAN Rui;GE Xiao-Jing;HUANG Peng;SHU Xiang-Bo;TANG Jin-Hui(State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023;School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094)
出处
《计算机学报》
EI
CAS
CSCD
北大核心
2024年第11期2552-2578,共27页
Chinese Journal of Computers
基金
国家资助博士后研究人员计划(GZB20230302)
江苏省卓越博士后计划(2023ZB256)
国家自然科学基金(62302208,61925204,62222207,62072245)
江苏省自然科学基金(BK20211520)资助。
关键词
视频理解
动作识别
群体动作识别
深度学习
注意力机制
递归神经网络
图模型
video understanding
action recognition
group activity recognition
deep learning
attention mechanism
recurrent neural network
graph model