基于深度学习的群体动作识别综述

A Survey of Group Activity Recognition Based on Deep Learning

下载PDF

导出

摘要不同于传统的简单动作识别,群体动作识别需要理解场景中由若干人物的单人动作和他们之间的交互动作构成的复杂语义.近年来,群体动作识别在公共安全监控、体育视频分析和社会角色理解等领域的研究与应用引起了学者们的广泛关注.但是现有能够帮助学者们快速了解研究概况的中文文献很少且用于归纳和分析的依据较为笼统.为此,本文旨在综述近十年来基于深度学习的群体动作识别的研究进展.首先,本文介绍了群体动作识别的问题与定义,总结了现有解决方案的核心流程和该研究的关键挑战.然后,本文针对现有研究中的两个核心内容,即个体动作特征的提取及其关联建模,对现有文献作出了归纳与分析.具体而言,本文介绍并总结了群体动作研究中常用的人体行为特征,并将现有关联建模类型归纳为三类,即线性关联、序列关联和图关联.此外,本文还列举了现有的十二种可用于群体动作研究的视频数据集,并在三个常用数据集上对目前流行的方法进行了对比与分析.最后,本文研判了几个更具挑战的未来研究趋势.综上,本文剖析了群体动作识别的核心研究思路及未来研究趋势,有助于相关研究人员快速了解群体动作识别的研究概况. Different from traditional action recognition focused on single individuals,group activity recognition aims to understand the complex semantics composed of individual actions and their interactions within a scene.In recent years,the application of group activity recognition in various domains such as public safety monitoring,sports video analysis,and social role understanding has garnered significant attention from researchers.However,there is a scarcity of Chinese literature providing a comprehensive overview of the research progress in this field,and the foundational aspects for induction and analysis remain vague.This paper aims to fill this gap by offering a thorough review of the progress in group activity recognition research over the past decade,with a particular focus on developments facilitated by deep learning technologies.To begin,we establish a clear problem definition for group activity recognition,differentiating it from individual action recognition by highlighting the significance of understanding group dynamics and interactions.Following this,we outline the basic pipeline common to most group activity recognition approaches,which typically involves the detection and tracking of individuals,the extraction of features pertinent to their actions,the recognition of individual actions,and the aggregation of these actions to infer group activities.Concurrently,we discuss the challenges inherent to this research field,such as the variability in group sizes,the complexity of interactions,and the diversity of possible group activities across different contexts.Delving deeper into the core aspects of group activity recognition research,this paper then provides an in-depth analysis of two critical components:the extraction of individual action features and their association modeling.We introduce several deep learningbased methods for extracting video features that are commonly employed in the study of group activities.These methods are adept at capturing the nuances of individual actions and the contextual information necessary for understanding group dynamics.Following this,we categorize existing approaches to modeling the associations between individual actions into three distinct types:linear association,sequence association,and graph association.Each type offers a unique perspective on how individual actions interact and combine to form coherent group activities,from simple linear relationships to complex,non-linear interactions represented by graphs.Furthermore,recognizing the importance of empirical research in advancing the field,this paper provides a comprehensive list of 12 existing video datasets specifically curated for group activity research.These datasets vary in terms of the scenarios they cover,from sports and public spaces to more controlled settings,thereby offering diverse opportunities for testing and improving group activity recognition algorithms.We also conduct a comparative analysis of existing methods using the two most popular datasets,highlighting their strengths and weaknesses and providing insights into their performance.In conclusion,this paper offers a comprehensive review of the advancements in group activity recognition based on deep learning over the past decade.It covers the problem definition,research challenges,feature extraction techniques,association modeling methods,evaluation datasets,and future research directions.By consolidating and analyzing the existing knowledge,this review provides researchers with valuable insights and guidance for further exploration and development in the field of group activity recognition.

作者严锐葛晓静黄捧舒祥波唐金辉 YAN Rui;GE Xiao-Jing;HUANG Peng;SHU Xiang-Bo;TANG Jin-Hui(State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023;School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094)

机构地区南京大学计算机软件新技术国家重点实验室南京理工大学计算机科学与工程学院

出处《计算机学报》 EI CAS CSCD 北大核心 2024年第11期2552-2578,共27页 Chinese Journal of Computers

基金国家资助博士后研究人员计划(GZB20230302) 江苏省卓越博士后计划(2023ZB256) 国家自然科学基金(62302208,61925204,62222207,62072245) 江苏省自然科学基金(BK20211520)资助。

关键词视频理解动作识别群体动作识别深度学习注意力机制递归神经网络图模型 video understanding action recognition group activity recognition deep learning attention mechanism recurrent neural network graph model

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1由丹丹.阿司匹林联合硫酸氢氯吡格雷治疗脑梗死的应用价值分析[J].中外医药研究,2024,3(23):48-50.
2魏士磊,付江龙,王剑雄,沈英杰.基于Swin-TCN融合网络的运动视频理解的研究[J].长江信息通信,2024,37(9):6-9.
3李晓东.聚焦议题式教学:为什么(上)[J].青少年法治教育,2024(9):2-8.
4毕宏辉,岳增成.从数到算、图式联通、理法互融、算法优化——以“两位数乘两位数”为例谈乘法竖式的进阶[J].教学月刊（小学版）（数学）,2024(10):57-61.
5段欣然,王玫,韩天利,周洪宇,郭俊奇,计卫星,黄华.基于视频理解的教学过程感知与分析[J].计算机科学,2024,51(10):56-66.
6徐惟深,郑文俊,田少刚.智慧钢筋加工管理平台研究与实践[J].葛洲坝集团科技,2024(3):12-15.
7高莹,张毅,祝玉兰.平面向量最值问题的解法探析[J].数理天地（高中版）,2024(19):40-41.
8贾串.初中数学建模教学的实践与思考——以“设计遮阳篷问题”为例[J].中学数学教学参考,2024(18):39-41.
9韩宏彬.住宅公建建筑工程项目竣工验收管理实践[J].中国科技期刊数据库工业A,2024(9):0155-0159.
10卢从俊,崔振富,牛鎏,吴梦娟,宋淑娟,程文煜,李金菲.一种自定义式全自动计量测试平台[J].中国计量,2024(10):99-106.

计算机学报

2024年第11期

浏览历史

内容加载中请稍等...

基于深度学习的群体动作识别综述

相关作者

相关机构

相关主题

浏览历史