In this article,a comprehensive survey of deep learning-based(DLbased)human pose estimation(HPE)that can help researchers in the domain of computer vision is presented.HPE is among the fastest-growing research domains...In this article,a comprehensive survey of deep learning-based(DLbased)human pose estimation(HPE)that can help researchers in the domain of computer vision is presented.HPE is among the fastest-growing research domains of computer vision and is used in solving several problems for human endeavours.After the detailed introduction,three different human body modes followed by the main stages of HPE and two pipelines of twodimensional(2D)HPE are presented.The details of the four components of HPE are also presented.The keypoints output format of two popular 2D HPE datasets and the most cited DL-based HPE articles from the year of breakthrough are both shown in tabular form.This study intends to highlight the limitations of published reviews and surveys respecting presenting a systematic review of the current DL-based solution to the 2D HPE model.Furthermore,a detailed and meaningful survey that will guide new and existing researchers on DL-based 2D HPE models is achieved.Finally,some future research directions in the field of HPE,such as limited data on disabled persons and multi-training DL-based models,are revealed to encourage researchers and promote the growth of HPE research.展开更多
Human pose estimation(HPE)is a procedure for determining the structure of the body pose and it is considered a challenging issue in the computer vision(CV)communities.HPE finds its applications in several fields namel...Human pose estimation(HPE)is a procedure for determining the structure of the body pose and it is considered a challenging issue in the computer vision(CV)communities.HPE finds its applications in several fields namely activity recognition and human-computer interface.Despite the benefits of HPE,it is still a challenging process due to the variations in visual appearances,lighting,occlusions,dimensionality,etc.To resolve these issues,this paper presents a squirrel search optimization with a deep convolutional neural network for HPE(SSDCNN-HPE)technique.The major intention of the SSDCNN-HPE technique is to identify the human pose accurately and efficiently.Primarily,the video frame conversion process is performed and pre-processing takes place via bilateral filtering-based noise removal process.Then,the EfficientNet model is applied to identify the body points of a person with no problem constraints.Besides,the hyperparameter tuning of the EfficientNet model takes place by the use of the squirrel search algorithm(SSA).In the final stage,the multiclass support vector machine(M-SVM)technique was utilized for the identification and classification of human poses.The design of bilateral filtering followed by SSA based EfficientNetmodel for HPE depicts the novelty of the work.To demonstrate the enhanced outcomes of the SSDCNN-HPE approach,a series of simulations are executed.The experimental results reported the betterment of the SSDCNN-HPE system over the recent existing techniques in terms of different measures.展开更多
3D human pose estimation is a major focus area in the field of computer vision,which plays an important role in practical applications.This article summarizes the framework and research progress related to the estimat...3D human pose estimation is a major focus area in the field of computer vision,which plays an important role in practical applications.This article summarizes the framework and research progress related to the estimation of monocular RGB images and videos.An overall perspective ofmethods integrated with deep learning is introduced.Novel image-based and video-based inputs are proposed as the analysis framework.From this viewpoint,common problems are discussed.The diversity of human postures usually leads to problems such as occlusion and ambiguity,and the lack of training datasets often results in poor generalization ability of the model.Regression methods are crucial for solving such problems.Considering image-based input,the multi-view method is commonly used to solve occlusion problems.Here,the multi-view method is analyzed comprehensively.By referring to video-based input,the human prior knowledge of restricted motion is used to predict human postures.In addition,structural constraints are widely used as prior knowledge.Furthermore,weakly supervised learningmethods are studied and discussed for these two types of inputs to improve the model generalization ability.The problem of insufficient training datasets must also be considered,especially because 3D datasets are usually biased and limited.Finally,emerging and popular datasets and evaluation indicators are discussed.The characteristics of the datasets and the relationships of the indicators are explained and highlighted.Thus,this article can be useful and instructive for researchers who are lacking in experience and find this field confusing.In addition,by providing an overview of 3D human pose estimation,this article sorts and refines recent studies on 3D human pose estimation.It describes kernel problems and common useful methods,and discusses the scope for further research.展开更多
Human Action Recognition(HAR)and pose estimation from videos have gained significant attention among research communities due to its applica-tion in several areas namely intelligent surveillance,human robot interaction...Human Action Recognition(HAR)and pose estimation from videos have gained significant attention among research communities due to its applica-tion in several areas namely intelligent surveillance,human robot interaction,robot vision,etc.Though considerable improvements have been made in recent days,design of an effective and accurate action recognition model is yet a difficult process owing to the existence of different obstacles such as variations in camera angle,occlusion,background,movement speed,and so on.From the literature,it is observed that hard to deal with the temporal dimension in the action recognition process.Convolutional neural network(CNN)models could be used widely to solve this.With this motivation,this study designs a novel key point extraction with deep convolutional neural networks based pose estimation(KPE-DCNN)model for activity recognition.The KPE-DCNN technique initially converts the input video into a sequence of frames followed by a three stage process namely key point extraction,hyperparameter tuning,and pose estimation.In the keypoint extraction process an OpenPose model is designed to compute the accurate key-points in the human pose.Then,an optimal DCNN model is developed to classify the human activities label based on the extracted key points.For improving the training process of the DCNN technique,RMSProp optimizer is used to optimally adjust the hyperparameters such as learning rate,batch size,and epoch count.The experimental results tested using benchmark dataset like UCF sports dataset showed that KPE-DCNN technique is able to achieve good results compared with benchmark algorithms like CNN,DBN,SVM,STAL,T-CNN and so on.展开更多
Human pose estimation has received significant attention recently due to its various applications in the real world. As the performance of the state-of-the-art human pose estimation methods can be improved by deep lea...Human pose estimation has received significant attention recently due to its various applications in the real world. As the performance of the state-of-the-art human pose estimation methods can be improved by deep learning, this paper presents a comprehensive survey of deep learning based human pose estimation methods and analyzes the methodologies employed. We summarize and discuss recent works with a methodologybased taxonomy. Single-person and multi-person pipelines are first reviewed separately. Then, the deep learning techniques applied in these pipelines are compared and analyzed. The datasets and metrics used in this task are also discussed and compared. The aim of this survey is to make every step in the estimation pipelines interpretable and to provide readers a readily comprehensible explanation. Moreover, the unsolved problems and challenges for future research are discussed.展开更多
This paper introduces a novel framework,i.e.,RFPose-OT,to enable three-dimensional(3D)human pose estimation from radio frequency(RF)signals.Different from existing methods that predict human poses from RF signals at t...This paper introduces a novel framework,i.e.,RFPose-OT,to enable three-dimensional(3D)human pose estimation from radio frequency(RF)signals.Different from existing methods that predict human poses from RF signals at the signal level directly,we consider the structure difference between the RF signals and the human poses,propose a transformation of the RF signals to the pose domain at the feature level based on the optimal transport(OT)theory,and generate human poses from the transformed features.To evaluate RFPose-OT,we build a radio system and a multi-view camera system to acquire the RF signal data and the ground-truth human poses.The experimental results in a basic indoor environment,an occlusion indoor environment,and an outdoor environment demonstrate that RFPose-OT can predict 3D human poses with higher precision than state-of-the-art methods.展开更多
为提高群体活动场景下细粒度人体姿态估计的准确率,优化网路中人体识别及姿态估计算法,在现有研究的基础上,提出一种结合多尺度预测以及改进并行注意力模块的多目标人体姿态估计算法。在充分利用不同尺度特征信息的基础上,实现高质量的...为提高群体活动场景下细粒度人体姿态估计的准确率,优化网路中人体识别及姿态估计算法,在现有研究的基础上,提出一种结合多尺度预测以及改进并行注意力模块的多目标人体姿态估计算法。在充分利用不同尺度特征信息的基础上,实现高质量的人体姿态估计;针对运动场景下多目标人体姿态数据集较少,提出一种数据集CUPB Sport Dataset。实验结果表明,该算法在公开基准数据集和自制数据集上分别达到了81.4 mAP和79.7 mAP,验证了该算法在运动场景下针对多目标的高效性。展开更多
基金supported by the[Universiti Sains Malaysia]under FRGS Grant Number[FRGS/1/2020/STG07/USM/02/12(203.PKOMP.6711930)]FRGS Grant Number[304PTEKIND.6316497.USM.].
文摘In this article,a comprehensive survey of deep learning-based(DLbased)human pose estimation(HPE)that can help researchers in the domain of computer vision is presented.HPE is among the fastest-growing research domains of computer vision and is used in solving several problems for human endeavours.After the detailed introduction,three different human body modes followed by the main stages of HPE and two pipelines of twodimensional(2D)HPE are presented.The details of the four components of HPE are also presented.The keypoints output format of two popular 2D HPE datasets and the most cited DL-based HPE articles from the year of breakthrough are both shown in tabular form.This study intends to highlight the limitations of published reviews and surveys respecting presenting a systematic review of the current DL-based solution to the 2D HPE model.Furthermore,a detailed and meaningful survey that will guide new and existing researchers on DL-based 2D HPE models is achieved.Finally,some future research directions in the field of HPE,such as limited data on disabled persons and multi-training DL-based models,are revealed to encourage researchers and promote the growth of HPE research.
文摘Human pose estimation(HPE)is a procedure for determining the structure of the body pose and it is considered a challenging issue in the computer vision(CV)communities.HPE finds its applications in several fields namely activity recognition and human-computer interface.Despite the benefits of HPE,it is still a challenging process due to the variations in visual appearances,lighting,occlusions,dimensionality,etc.To resolve these issues,this paper presents a squirrel search optimization with a deep convolutional neural network for HPE(SSDCNN-HPE)technique.The major intention of the SSDCNN-HPE technique is to identify the human pose accurately and efficiently.Primarily,the video frame conversion process is performed and pre-processing takes place via bilateral filtering-based noise removal process.Then,the EfficientNet model is applied to identify the body points of a person with no problem constraints.Besides,the hyperparameter tuning of the EfficientNet model takes place by the use of the squirrel search algorithm(SSA).In the final stage,the multiclass support vector machine(M-SVM)technique was utilized for the identification and classification of human poses.The design of bilateral filtering followed by SSA based EfficientNetmodel for HPE depicts the novelty of the work.To demonstrate the enhanced outcomes of the SSDCNN-HPE approach,a series of simulations are executed.The experimental results reported the betterment of the SSDCNN-HPE system over the recent existing techniques in terms of different measures.
基金supported by the Program of Entrepreneurship and Innovation Ph.D.in Jiangsu Province(JSSCBS20211175)the School Ph.D.Talent Funding(Z301B2055)the Natural Science Foundation of the Jiangsu Higher Education Institutions of China(21KJB520002).
文摘3D human pose estimation is a major focus area in the field of computer vision,which plays an important role in practical applications.This article summarizes the framework and research progress related to the estimation of monocular RGB images and videos.An overall perspective ofmethods integrated with deep learning is introduced.Novel image-based and video-based inputs are proposed as the analysis framework.From this viewpoint,common problems are discussed.The diversity of human postures usually leads to problems such as occlusion and ambiguity,and the lack of training datasets often results in poor generalization ability of the model.Regression methods are crucial for solving such problems.Considering image-based input,the multi-view method is commonly used to solve occlusion problems.Here,the multi-view method is analyzed comprehensively.By referring to video-based input,the human prior knowledge of restricted motion is used to predict human postures.In addition,structural constraints are widely used as prior knowledge.Furthermore,weakly supervised learningmethods are studied and discussed for these two types of inputs to improve the model generalization ability.The problem of insufficient training datasets must also be considered,especially because 3D datasets are usually biased and limited.Finally,emerging and popular datasets and evaluation indicators are discussed.The characteristics of the datasets and the relationships of the indicators are explained and highlighted.Thus,this article can be useful and instructive for researchers who are lacking in experience and find this field confusing.In addition,by providing an overview of 3D human pose estimation,this article sorts and refines recent studies on 3D human pose estimation.It describes kernel problems and common useful methods,and discusses the scope for further research.
文摘Human Action Recognition(HAR)and pose estimation from videos have gained significant attention among research communities due to its applica-tion in several areas namely intelligent surveillance,human robot interaction,robot vision,etc.Though considerable improvements have been made in recent days,design of an effective and accurate action recognition model is yet a difficult process owing to the existence of different obstacles such as variations in camera angle,occlusion,background,movement speed,and so on.From the literature,it is observed that hard to deal with the temporal dimension in the action recognition process.Convolutional neural network(CNN)models could be used widely to solve this.With this motivation,this study designs a novel key point extraction with deep convolutional neural networks based pose estimation(KPE-DCNN)model for activity recognition.The KPE-DCNN technique initially converts the input video into a sequence of frames followed by a three stage process namely key point extraction,hyperparameter tuning,and pose estimation.In the keypoint extraction process an OpenPose model is designed to compute the accurate key-points in the human pose.Then,an optimal DCNN model is developed to classify the human activities label based on the extracted key points.For improving the training process of the DCNN technique,RMSProp optimizer is used to optimally adjust the hyperparameters such as learning rate,batch size,and epoch count.The experimental results tested using benchmark dataset like UCF sports dataset showed that KPE-DCNN technique is able to achieve good results compared with benchmark algorithms like CNN,DBN,SVM,STAL,T-CNN and so on.
基金supported by the National Natural Science Foundation of China (Nos. 61673192, 61573219, and 61472163)the Fund for Outstanding Youth of Shandong Provincial High School (No. ZR2016JL023)+1 种基金the National High-Tech Research and Development Plan (No. 2015AA042306)the National Social Science Fund Project (No. 13CTQ010)
文摘Human pose estimation has received significant attention recently due to its various applications in the real world. As the performance of the state-of-the-art human pose estimation methods can be improved by deep learning, this paper presents a comprehensive survey of deep learning based human pose estimation methods and analyzes the methodologies employed. We summarize and discuss recent works with a methodologybased taxonomy. Single-person and multi-person pipelines are first reviewed separately. Then, the deep learning techniques applied in these pipelines are compared and analyzed. The datasets and metrics used in this task are also discussed and compared. The aim of this survey is to make every step in the estimation pipelines interpretable and to provide readers a readily comprehensible explanation. Moreover, the unsolved problems and challenges for future research are discussed.
基金supported by the National Natural Science Foundation of China(Nos.62201542 and 62172381)the National Key R&D Programmes of China(Nos.2022YFC2503405 and 2022YFC0869800)+1 种基金the Fellowship of China Postdoctoral Science Foundation(No.2022M723069)the Fundamental Research Funds for the Central Universities,China。
文摘This paper introduces a novel framework,i.e.,RFPose-OT,to enable three-dimensional(3D)human pose estimation from radio frequency(RF)signals.Different from existing methods that predict human poses from RF signals at the signal level directly,we consider the structure difference between the RF signals and the human poses,propose a transformation of the RF signals to the pose domain at the feature level based on the optimal transport(OT)theory,and generate human poses from the transformed features.To evaluate RFPose-OT,we build a radio system and a multi-view camera system to acquire the RF signal data and the ground-truth human poses.The experimental results in a basic indoor environment,an occlusion indoor environment,and an outdoor environment demonstrate that RFPose-OT can predict 3D human poses with higher precision than state-of-the-art methods.
文摘为提高群体活动场景下细粒度人体姿态估计的准确率,优化网路中人体识别及姿态估计算法,在现有研究的基础上,提出一种结合多尺度预测以及改进并行注意力模块的多目标人体姿态估计算法。在充分利用不同尺度特征信息的基础上,实现高质量的人体姿态估计;针对运动场景下多目标人体姿态数据集较少,提出一种数据集CUPB Sport Dataset。实验结果表明,该算法在公开基准数据集和自制数据集上分别达到了81.4 mAP和79.7 mAP,验证了该算法在运动场景下针对多目标的高效性。