Action recognition and detection is an important research topic in computer vision,which can be divided into action recognition and action detection.At present,the distinction between action recognition and action det...Action recognition and detection is an important research topic in computer vision,which can be divided into action recognition and action detection.At present,the distinction between action recognition and action detection is not clear,and the relevant reviews are not comprehensive.Thus,this paper summarized the action recognition and detection methods and datasets based on deep learning to accurately present the research status in this field.Firstly,according to the way that temporal and spatial features are extracted from the model,the commonly used models of action recognition are divided into the two stream models,the temporal models,the spatiotemporal models and the transformer models according to the architecture.And this paper briefly analyzes the characteristics of the four models and introduces the accuracy of various algorithms in common data sets.Then,from the perspective of tasks to be completed,action detection is further divided into temporal action detection and spatiotemporal action detection,and commonly used datasets are introduced.From the perspectives of the twostage method and one-stage method,various algorithms of temporal action detection are reviewed,and the various algorithms of spatiotemporal action detection are summarized in detail.Finally,the relationship between different parts of action recognition and detection is discussed,the difficulties faced by the current research are summarized in detail,and future development was prospected。展开更多
Most of the intelligent surveillances in the industry only care about the safety of the workers.It is meaningful if the camera can know what,where and how the worker has performed the action in real time.In this paper...Most of the intelligent surveillances in the industry only care about the safety of the workers.It is meaningful if the camera can know what,where and how the worker has performed the action in real time.In this paper,we propose a light-weight and robust algorithm to meet these requirements.By only two hands'trajectories,our algorithm requires no Graphic Processing Unit(GPU)acceleration,which can be used in low-cost devices.In the training stage,in order to find potential topological structures of the training trajectories,spectral clustering with eigengap heuristic is applied to cluster trajectory points.A gradient descent based algorithm is proposed to find the topological structures,which reflects main representations for each cluster.In the fine-tuning stage,a topological optimization algorithm is proposed to fine-tune the parameters of topological structures in all training data.Finally,our method not only performs more robustly compared to some popular offline action detection methods,but also obtains better detection accuracy in an extended action sequence.展开更多
近年来,随着监控摄像头的不断增多和互联网的迅速发展,监控视频与网络视频越来越多,对视频进行自动行为冲突检测对降低人为审核导致的隐私信息泄露风险及维护社会治安、净化网络环境等具有重要意义.为了充分提取视频中的行为冲突特征,...近年来,随着监控摄像头的不断增多和互联网的迅速发展,监控视频与网络视频越来越多,对视频进行自动行为冲突检测对降低人为审核导致的隐私信息泄露风险及维护社会治安、净化网络环境等具有重要意义.为了充分提取视频中的行为冲突特征,并获得有较好泛化能力与检测效果的模型,采用I3D(inflated 3D convolutional network)与VGGish,基于XD-Violence进行多模态特征的提取,并提出了基于Transformer和图卷积网络的行为冲突检测模型TG-BCDM(behavior conflict detection model based on Transformer and graph convolution networks).该模型包含Transformer编码器模块和图卷积模块,可以在有效捕捉视频中长距离依赖关系的同时,关注视频特征的全局信息和局部信息.经过实验证明,该模型优于现有的8种方法.展开更多
Human action recognition has gained popularity because of its worldwide applications such as video surveillance, video retrieval and human– computer interaction. This paper provides a comprehensive overview of notabl...Human action recognition has gained popularity because of its worldwide applications such as video surveillance, video retrieval and human– computer interaction. This paper provides a comprehensive overview of notable advances made by deep neural networks in this field. Firstly, the basic conception of action recognition and its common applications were introduced. Secondly, action recognition was categorized as action classification and action detection according to its respective research goals. And various deep learning frameworks for recognition tasks were discussed in detail and the most challenging datasets and taxonomies were briefly reviewed. Finally, the limitations of the state-of-the-art and promising directions of the research were briefly outlined.展开更多
基金supported by the National Educational Science 13th Five-Year Plan Project(JYKYB2019012)the Basic Research Fund for the Engineering University of PAP(WJY201907)the Basic Research Fund of the Engineering University of PAP(WJY202120).
文摘Action recognition and detection is an important research topic in computer vision,which can be divided into action recognition and action detection.At present,the distinction between action recognition and action detection is not clear,and the relevant reviews are not comprehensive.Thus,this paper summarized the action recognition and detection methods and datasets based on deep learning to accurately present the research status in this field.Firstly,according to the way that temporal and spatial features are extracted from the model,the commonly used models of action recognition are divided into the two stream models,the temporal models,the spatiotemporal models and the transformer models according to the architecture.And this paper briefly analyzes the characteristics of the four models and introduces the accuracy of various algorithms in common data sets.Then,from the perspective of tasks to be completed,action detection is further divided into temporal action detection and spatiotemporal action detection,and commonly used datasets are introduced.From the perspectives of the twostage method and one-stage method,various algorithms of temporal action detection are reviewed,and the various algorithms of spatiotemporal action detection are summarized in detail.Finally,the relationship between different parts of action recognition and detection is discussed,the difficulties faced by the current research are summarized in detail,and future development was prospected。
基金Our research has been supported in part by National Natural Science Foundation of China under Grants 61673261 and 61703273.We gratefully acknowledge the support from some companies.
文摘Most of the intelligent surveillances in the industry only care about the safety of the workers.It is meaningful if the camera can know what,where and how the worker has performed the action in real time.In this paper,we propose a light-weight and robust algorithm to meet these requirements.By only two hands'trajectories,our algorithm requires no Graphic Processing Unit(GPU)acceleration,which can be used in low-cost devices.In the training stage,in order to find potential topological structures of the training trajectories,spectral clustering with eigengap heuristic is applied to cluster trajectory points.A gradient descent based algorithm is proposed to find the topological structures,which reflects main representations for each cluster.In the fine-tuning stage,a topological optimization algorithm is proposed to fine-tune the parameters of topological structures in all training data.Finally,our method not only performs more robustly compared to some popular offline action detection methods,but also obtains better detection accuracy in an extended action sequence.
文摘近年来,随着监控摄像头的不断增多和互联网的迅速发展,监控视频与网络视频越来越多,对视频进行自动行为冲突检测对降低人为审核导致的隐私信息泄露风险及维护社会治安、净化网络环境等具有重要意义.为了充分提取视频中的行为冲突特征,并获得有较好泛化能力与检测效果的模型,采用I3D(inflated 3D convolutional network)与VGGish,基于XD-Violence进行多模态特征的提取,并提出了基于Transformer和图卷积网络的行为冲突检测模型TG-BCDM(behavior conflict detection model based on Transformer and graph convolution networks).该模型包含Transformer编码器模块和图卷积模块,可以在有效捕捉视频中长距离依赖关系的同时,关注视频特征的全局信息和局部信息.经过实验证明,该模型优于现有的8种方法.
基金the National Science Foundation of China (Grant No. 61702350).
文摘Human action recognition has gained popularity because of its worldwide applications such as video surveillance, video retrieval and human– computer interaction. This paper provides a comprehensive overview of notable advances made by deep neural networks in this field. Firstly, the basic conception of action recognition and its common applications were introduced. Secondly, action recognition was categorized as action classification and action detection according to its respective research goals. And various deep learning frameworks for recognition tasks were discussed in detail and the most challenging datasets and taxonomies were briefly reviewed. Finally, the limitations of the state-of-the-art and promising directions of the research were briefly outlined.