Human pose estimation aims to localize the body joints from image or video data.With the development of deeplearning,pose estimation has become a hot research topic in the field of computer vision.In recent years,huma...Human pose estimation aims to localize the body joints from image or video data.With the development of deeplearning,pose estimation has become a hot research topic in the field of computer vision.In recent years,humanpose estimation has achieved great success in multiple fields such as animation and sports.However,to obtainaccurate positioning results,existing methods may suffer from large model sizes,a high number of parameters,and increased complexity,leading to high computing costs.In this paper,we propose a new lightweight featureencoder to construct a high-resolution network that reduces the number of parameters and lowers the computingcost.We also introduced a semantic enhancement module that improves global feature extraction and networkperformance by combining channel and spatial dimensions.Furthermore,we propose a dense connected spatialpyramid pooling module to compensate for the decrease in image resolution and information loss in the network.Finally,ourmethod effectively reduces the number of parameters and complexitywhile ensuring high performance.Extensive experiments show that our method achieves a competitive performance while dramatically reducing thenumber of parameters,and operational complexity.Specifically,our method can obtain 89.9%AP score on MPIIVAL,while the number of parameters and the complexity of operations were reduced by 41%and 36%,respectively.展开更多
Electric power training is essential for ensuring the safety and reliability of the system.In this study,we introduce a novel Abnormal Action Recognition(AAR)system that utilizes a Lightweight Pose Estimation Network(...Electric power training is essential for ensuring the safety and reliability of the system.In this study,we introduce a novel Abnormal Action Recognition(AAR)system that utilizes a Lightweight Pose Estimation Network(LPEN)to efficiently and effectively detect abnormal fall-down and trespass incidents in electric power training scenarios.The LPEN network,comprising three stages—MobileNet,Initial Stage,and Refinement Stage—is employed to swiftly extract image features,detect human key points,and refine them for accurate analysis.Subsequently,a Pose-aware Action Analysis Module(PAAM)captures the positional coordinates of human skeletal points in each frame.Finally,an Abnormal Action Inference Module(AAIM)evaluates whether abnormal fall-down or unauthorized trespass behavior is occurring.For fall-down recognition,three criteria—falling speed,main angles of skeletal points,and the person’s bounding box—are considered.To identify unauthorized trespass,emphasis is placed on the position of the ankles.Extensive experiments validate the effectiveness and efficiency of the proposed system in ensuring the safety and reliability of electric power training.展开更多
Human pose estimation is a critical research area in the field of computer vision,playing a significant role in applications such as human-computer interaction,behavior analysis,and action recognition.In this paper,we...Human pose estimation is a critical research area in the field of computer vision,playing a significant role in applications such as human-computer interaction,behavior analysis,and action recognition.In this paper,we propose a U-shaped keypoint detection network(DAUNet)based on an improved ResNet subsampling structure and spatial grouping mechanism.This network addresses key challenges in traditional methods,such as information loss,large network redundancy,and insufficient sensitivity to low-resolution features.DAUNet is composed of three main components.First,we introduce an improved BottleNeck block that employs partial convolution and strip pooling to reduce computational load and mitigate feature loss.Second,after upsampling,the network eliminates redundant features,improving the overall efficiency.Finally,a lightweight spatial grouping attention mechanism is applied to enhance low-resolution semantic features within the feature map,allowing for better restoration of the original image size and higher accuracy.Experimental results demonstrate that DAUNet achieves superior accuracy compared to most existing keypoint detection models,with a mean PCKh@0.5 score of 91.6%on the MPII dataset and an AP of 76.1%on the COCO dataset.Moreover,real-world experiments further validate the robustness and generalizability of DAUNet for detecting human bodies in unknown environments,highlighting its potential for broader applications.展开更多
Human pose estimation is a basic and critical task in the field of computer vision that involves determining the position(or spatial coordinates)of the joints of the human body in a given image or video.It is widely u...Human pose estimation is a basic and critical task in the field of computer vision that involves determining the position(or spatial coordinates)of the joints of the human body in a given image or video.It is widely used in motion analysis,medical evaluation,and behavior monitoring.In this paper,the authors propose a method for multi-view human pose estimation.Two image sensors were placed orthogonally with respect to each other to capture the pose of the subject as they moved,and this yielded accurate and comprehensive results of three-dimensional(3D)motion reconstruction that helped capture their multi-directional poses.Following this,we propose a method based on 3D pose estimation to assess the similarity of the features of motion of patients with motor dysfunction by comparing differences between their range of motion and that of normal subjects.We converted these differences into Fugl–Meyer assessment(FMA)scores in order to quantify them.Finally,we implemented the proposed method in the Unity framework,and built a Virtual Reality platform that provides users with human–computer interaction to make the task more enjoyable for them and ensure their active participation in the assessment process.The goal is to provide a suitable means of assessing movement disorders without requiring the immediate supervision of a physician.展开更多
3D human pose estimation is a major focus area in the field of computer vision,which plays an important role in practical applications.This article summarizes the framework and research progress related to the estimat...3D human pose estimation is a major focus area in the field of computer vision,which plays an important role in practical applications.This article summarizes the framework and research progress related to the estimation of monocular RGB images and videos.An overall perspective ofmethods integrated with deep learning is introduced.Novel image-based and video-based inputs are proposed as the analysis framework.From this viewpoint,common problems are discussed.The diversity of human postures usually leads to problems such as occlusion and ambiguity,and the lack of training datasets often results in poor generalization ability of the model.Regression methods are crucial for solving such problems.Considering image-based input,the multi-view method is commonly used to solve occlusion problems.Here,the multi-view method is analyzed comprehensively.By referring to video-based input,the human prior knowledge of restricted motion is used to predict human postures.In addition,structural constraints are widely used as prior knowledge.Furthermore,weakly supervised learningmethods are studied and discussed for these two types of inputs to improve the model generalization ability.The problem of insufficient training datasets must also be considered,especially because 3D datasets are usually biased and limited.Finally,emerging and popular datasets and evaluation indicators are discussed.The characteristics of the datasets and the relationships of the indicators are explained and highlighted.Thus,this article can be useful and instructive for researchers who are lacking in experience and find this field confusing.In addition,by providing an overview of 3D human pose estimation,this article sorts and refines recent studies on 3D human pose estimation.It describes kernel problems and common useful methods,and discusses the scope for further research.展开更多
Human pose estimation(HPE)is a procedure for determining the structure of the body pose and it is considered a challenging issue in the computer vision(CV)communities.HPE finds its applications in several fields namel...Human pose estimation(HPE)is a procedure for determining the structure of the body pose and it is considered a challenging issue in the computer vision(CV)communities.HPE finds its applications in several fields namely activity recognition and human-computer interface.Despite the benefits of HPE,it is still a challenging process due to the variations in visual appearances,lighting,occlusions,dimensionality,etc.To resolve these issues,this paper presents a squirrel search optimization with a deep convolutional neural network for HPE(SSDCNN-HPE)technique.The major intention of the SSDCNN-HPE technique is to identify the human pose accurately and efficiently.Primarily,the video frame conversion process is performed and pre-processing takes place via bilateral filtering-based noise removal process.Then,the EfficientNet model is applied to identify the body points of a person with no problem constraints.Besides,the hyperparameter tuning of the EfficientNet model takes place by the use of the squirrel search algorithm(SSA).In the final stage,the multiclass support vector machine(M-SVM)technique was utilized for the identification and classification of human poses.The design of bilateral filtering followed by SSA based EfficientNetmodel for HPE depicts the novelty of the work.To demonstrate the enhanced outcomes of the SSDCNN-HPE approach,a series of simulations are executed.The experimental results reported the betterment of the SSDCNN-HPE system over the recent existing techniques in terms of different measures.展开更多
Human Action Recognition(HAR)and pose estimation from videos have gained significant attention among research communities due to its applica-tion in several areas namely intelligent surveillance,human robot interaction...Human Action Recognition(HAR)and pose estimation from videos have gained significant attention among research communities due to its applica-tion in several areas namely intelligent surveillance,human robot interaction,robot vision,etc.Though considerable improvements have been made in recent days,design of an effective and accurate action recognition model is yet a difficult process owing to the existence of different obstacles such as variations in camera angle,occlusion,background,movement speed,and so on.From the literature,it is observed that hard to deal with the temporal dimension in the action recognition process.Convolutional neural network(CNN)models could be used widely to solve this.With this motivation,this study designs a novel key point extraction with deep convolutional neural networks based pose estimation(KPE-DCNN)model for activity recognition.The KPE-DCNN technique initially converts the input video into a sequence of frames followed by a three stage process namely key point extraction,hyperparameter tuning,and pose estimation.In the keypoint extraction process an OpenPose model is designed to compute the accurate key-points in the human pose.Then,an optimal DCNN model is developed to classify the human activities label based on the extracted key points.For improving the training process of the DCNN technique,RMSProp optimizer is used to optimally adjust the hyperparameters such as learning rate,batch size,and epoch count.The experimental results tested using benchmark dataset like UCF sports dataset showed that KPE-DCNN technique is able to achieve good results compared with benchmark algorithms like CNN,DBN,SVM,STAL,T-CNN and so on.展开更多
In this article,a comprehensive survey of deep learning-based(DLbased)human pose estimation(HPE)that can help researchers in the domain of computer vision is presented.HPE is among the fastest-growing research domains...In this article,a comprehensive survey of deep learning-based(DLbased)human pose estimation(HPE)that can help researchers in the domain of computer vision is presented.HPE is among the fastest-growing research domains of computer vision and is used in solving several problems for human endeavours.After the detailed introduction,three different human body modes followed by the main stages of HPE and two pipelines of twodimensional(2D)HPE are presented.The details of the four components of HPE are also presented.The keypoints output format of two popular 2D HPE datasets and the most cited DL-based HPE articles from the year of breakthrough are both shown in tabular form.This study intends to highlight the limitations of published reviews and surveys respecting presenting a systematic review of the current DL-based solution to the 2D HPE model.Furthermore,a detailed and meaningful survey that will guide new and existing researchers on DL-based 2D HPE models is achieved.Finally,some future research directions in the field of HPE,such as limited data on disabled persons and multi-training DL-based models,are revealed to encourage researchers and promote the growth of HPE research.展开更多
Deep neural networks are vulnerable to attacks from adversarial inputs.Corresponding attack research on human pose estimation(HPE),particularly for body joint detection,has been largely unexplored.Transferring classif...Deep neural networks are vulnerable to attacks from adversarial inputs.Corresponding attack research on human pose estimation(HPE),particularly for body joint detection,has been largely unexplored.Transferring classification-based attack methods to body joint regression tasks is not straightforward.Another issue is that the attack effectiveness and imperceptibility contradict each other.To solve these issues,we propose local imperceptible attacks on HPE networks.In particular,we reformulate imperceptible attacks on body joint regression into a constrained maximum allowable attack.Furthermore,we approximate the solution using iterative gradient-based strength refinement and greedy-based pixel selection.Our method crafts effective perceptual adversarial attacks that consider both human perception and attack effectiveness.We conducted a series of imperceptible attacks against state-of-the-art HPE methods,including HigherHRNet,DEKR,and ViTPose.The experimental results demonstrate that the proposed method achieves excellent imperceptibility while maintaining attack effectiveness by significantly reducing the number of perturbed pixels.Approximately 4%of the pixels can achieve sufficient attacks on HPE.展开更多
Spacecraft pose estimation is an important technology to maintain or change the spacecraft orientation in space.For spacecraft pose estimation,when two spacecraft are relatively distant,the depth information of the sp...Spacecraft pose estimation is an important technology to maintain or change the spacecraft orientation in space.For spacecraft pose estimation,when two spacecraft are relatively distant,the depth information of the space point is less than that of the measuring distance,so the camera model can be seen as a weak perspective projection model.In this paper,a spacecraft pose estimation algorithm based on four symmetrical points of the spacecraft outline is proposed.The analytical solution of the spacecraft pose is obtained by solving the weak perspective projection model,which can satisfy the requirements of the measurement model when the measurement distance is long.The optimal solution is obtained from the weak perspective projection model to the perspective projection model,which can meet the measurement requirements when the measuring distance is small.The simulation results show that the proposed algorithm can obtain better results,even though the noise is large.展开更多
Multi-view multi-person 3D human pose estimation is a hot topic in the field of human pose estimation due to its wide range of application scenarios.With the introduction of end-to-end direct regression methods,the fi...Multi-view multi-person 3D human pose estimation is a hot topic in the field of human pose estimation due to its wide range of application scenarios.With the introduction of end-to-end direct regression methods,the field has entered a new stage of development.However,the regression results of joints that are more heavily influenced by external factors are not accurate enough even for the optimal method.In this paper,we propose an effective feature recalibration module based on the channel attention mechanism and a relative optimal calibration strategy,which is applied to themulti-viewmulti-person 3D human pose estimation task to achieve improved detection accuracy for joints that are more severely affected by external factors.Specifically,it achieves relative optimal weight adjustment of joint feature information through the recalibration module and strategy,which enables the model to learn the dependencies between joints and the dependencies between people and their corresponding joints.We call this method as the Efficient Recalibration Network(ER-Net).Finally,experiments were conducted on two benchmark datasets for this task,Campus and Shelf,in which the PCP reached 97.3% and 98.3%,respectively.展开更多
Scale variation is amajor challenge inmulti-person pose estimation.In scenes where persons are present at various distances,models tend to perform better on larger-scale persons,while the performance for smaller-scale...Scale variation is amajor challenge inmulti-person pose estimation.In scenes where persons are present at various distances,models tend to perform better on larger-scale persons,while the performance for smaller-scale persons often falls short of expectations.Therefore,effectively balancing the persons of different scales poses a significant challenge.So this paper proposes a newmulti-person pose estimation model called FSANet to improve themodel’s performance in complex scenes.Our model utilizes High-Resolution Network(HRNet)as the backbone and feeds the outputs of the last stage’s four branches into the DCB module.The dilated convolution-based(DCB)module employs a parallel structure that incorporates dilated convolutions with different rates to expand the receptive field of each branch.Subsequently,the attention operation-based(AOB)module performs attention operations at both branch and channel levels to enhance high-frequency features and reduce the influence of noise.Finally,predictions are made using the heatmap representation.The model can recognize images with diverse scales and more complex semantic information.Experimental results demonstrate that FSA Net achieves competitive results on the MSCOCO and MPII datasets,validating the effectiveness of our proposed approach.展开更多
Background In computer vision,simultaneously estimating human pose,shape,and clothing is a practical issue in real life,but remains a challenging task owing to the variety of clothing,complexity of de-formation,shorta...Background In computer vision,simultaneously estimating human pose,shape,and clothing is a practical issue in real life,but remains a challenging task owing to the variety of clothing,complexity of de-formation,shortage of large-scale datasets,and difficulty in estimating clothing style.Methods We propose a multistage weakly supervised method that makes full use of data with less labeled information for learning to estimate human body shape,pose,and clothing deformation.In the first stage,the SMPL human-body model parameters were regressed using the multi-view 2D key points of the human body.Using multi-view information as weakly supervised information can avoid the deep ambiguity problem of a single view,obtain a more accurate human posture,and access supervisory information easily.In the second stage,clothing is represented by a PCA-based model that uses two-dimensional key points of clothing as supervised information to regress the parameters.In the third stage,we predefine an embedding graph for each type of clothing to describe the deformation.Then,the mask information of the clothing is used to further adjust the deformation of the clothing.To facilitate training,we constructed a multi-view synthetic dataset that included BCNet and SURREAL.Results The Experiments show that the accuracy of our method reaches the same level as that of SOTA methods using strong supervision information while only using weakly supervised information.Because this study uses only weakly supervised information,which is much easier to obtain,it has the advantage of utilizing existing data as training data.Experiments on the DeepFashion2 dataset show that our method can make full use of the existing weak supervision information for fine-tuning on a dataset with little supervision information,compared with the strong supervision information that cannot be trained or adjusted owing to the lack of exact annotation information.Conclusions Our weak supervision method can accurately estimate human body size,pose,and several common types of clothing and overcome the issues of the current shortage of clothing data.展开更多
This article studies distributed pose(orientation and position)estimation of leader–follower multi-agent systems over𝜅-layer graphs in 2-D plane.Only the leaders have access to their orientations and position...This article studies distributed pose(orientation and position)estimation of leader–follower multi-agent systems over𝜅-layer graphs in 2-D plane.Only the leaders have access to their orientations and positions,while the followers can measure the relative bearings or(angular and linear)velocities in their unknown local coordinate frames.For the orientation estimation,the local relative bearings are used to obtain the relative orientations among the agents,based on which a distributed orientation estimation algorithm is proposed for each follower to estimate its orientation.For the position estimation,the local relative bearings are used to obtain the position constraints among the agents,and a distributed position estimation algorithm is proposed for each follower to estimate its position by solving its position constraints.Both the orientation and position estimation errors converge to zero asymptotically.A simulation example is given to verify the theoretical results.展开更多
Identifying workers’construction activities or behaviors can enable managers to better monitor labor efficiency and construction progress.However,current activity analysis methods for construction workers rely solely...Identifying workers’construction activities or behaviors can enable managers to better monitor labor efficiency and construction progress.However,current activity analysis methods for construction workers rely solely on manual observations and recordings,which consumes considerable time and has high labor costs.Researchers have focused on monitoring on-site construction activities of workers.However,when multiple workers are working together,current research cannot accu rately and automatically identify the construction activity.This research proposes a deep learning framework for the automated analysis of the construction activities of multiple workers.In this framework,multiple deep neural network models are designed and used to complete worker key point extraction,worker tracking,and worker construction activity analysis.The designed framework was tested at an actual construction site,and activity recognition for multiple workers was performed,indicating the feasibility of the framework for the automated monitoring of work efficiency.展开更多
Human Interaction Recognition(HIR)was one of the challenging issues in computer vision research due to the involvement of multiple individuals and their mutual interactions within video frames generated from their mov...Human Interaction Recognition(HIR)was one of the challenging issues in computer vision research due to the involvement of multiple individuals and their mutual interactions within video frames generated from their movements.HIR requires more sophisticated analysis than Human Action Recognition(HAR)since HAR focuses solely on individual activities like walking or running,while HIR involves the interactions between people.This research aims to develop a robust system for recognizing five common human interactions,such as hugging,kicking,pushing,pointing,and no interaction,from video sequences using multiple cameras.In this study,a hybrid Deep Learning(DL)and Machine Learning(ML)model was employed to improve classification accuracy and generalizability.The dataset was collected in an indoor environment with four-channel cameras capturing the five types of interactions among 13 participants.The data was processed using a DL model with a fine-tuned ResNet(Residual Networks)architecture based on 2D Convolutional Neural Network(CNN)layers for feature extraction.Subsequently,machine learning models were trained and utilized for interaction classification using six commonly used ML algorithms,including SVM,KNN,RF,DT,NB,and XGBoost.The results demonstrate a high accuracy of 95.45%in classifying human interactions.The hybrid approach enabled effective learning,resulting in highly accurate performance across different interaction types.Future work will explore more complex scenarios involving multiple individuals based on the application of this architecture.展开更多
Inpatient falls from beds in hospitals are a common problem.Such falls may result in severe injuries.This problem can be addressed by continuous monitoring of patients using cameras.Recent advancements in deep learnin...Inpatient falls from beds in hospitals are a common problem.Such falls may result in severe injuries.This problem can be addressed by continuous monitoring of patients using cameras.Recent advancements in deep learning-based video analytics have made this task of fall detection more effective and efficient.Along with fall detection,monitoring of different activities of the patients is also of significant concern to assess the improvement in their health.High computation-intensive models are required to monitor every action of the patient precisely.This requirement limits the applicability of such networks.Hence,to keep the model lightweight,the already designed fall detection networks can be extended to monitor the general activities of the patients along with the fall detection.Motivated by the same notion,we propose a novel,lightweight,and efficient patient activity monitoring system that broadly classifies the patients’activities into fall,activity,and rest classes based on their poses.The whole network comprises three sub-networks,namely a Convolutional Neural Networks(CNN)based video compression network,a Lightweight Pose Network(LPN)and a Residual Network(ResNet)Mixer block-based activity recognition network.The compression network compresses the video streams using deep learning networks for efficient storage and retrieval;after that,LPN estimates human poses.Finally,the activity recognition network classifies the patients’activities based on their poses.The proposed system shows an overall accuracy of approx.99.7% over a standard dataset with 99.63% fall detection accuracy and efficiently monitors different events,which may help monitor the falls and improve the inpatients’health.展开更多
With the rapid development of space science projects,large deployable mechanisms have been widely used.However,due to the effects of mechanical friction and gravitational acceleration,on-orbit mechanisms cannot be alw...With the rapid development of space science projects,large deployable mechanisms have been widely used.However,due to the effects of mechanical friction and gravitational acceleration,on-orbit mechanisms cannot be always deployed to the expected pose.For some precision optical mechanisms,even a minor deviation can result in significant error,so it needs to be measured and corrected.In this paper,the deployment process was modeled and simplified as rotation under single-rotation-axis constraint and translation under single-direction constraint.To solve the problem,a method based on cross-ratio invariability was proposed.The proposed method does not rely on camera calibration techniques,as well as artificial marking points,both of which are necessary in PnP.Instead,only three calibration images before launch and a measurement image on orbit were required.Simulations and experiments demonstrated that the proposed method is more accurate than PnP.In addition,experiments also proved that the feasibility of the proposed method under dark conditions with the aid of a light source and some reflective marking points.展开更多
Unmanned aerial vehicles(UAVs)have been widely used in military,medical,wireless communications,aerial surveillance,etc.One key topic involving UAVs is pose estimation in autonomous navigation.A standard procedure for...Unmanned aerial vehicles(UAVs)have been widely used in military,medical,wireless communications,aerial surveillance,etc.One key topic involving UAVs is pose estimation in autonomous navigation.A standard procedure for this process is to combine inertial navigation system sensor information with the global navigation satellite system(GNSS)signal.However,some factors can interfere with the GNSS signal,such as ionospheric scintillation,jamming,or spoofing.One alternative method to avoid using the GNSS signal is to apply an image processing approach by matching UAV images with georeferenced images.But a high effort is required for image edge extraction.Here a support vector regression(SVR)model is proposed to reduce this computational load and processing time.The dynamic partial reconfiguration(DPR)of part of the SVR datapath is implemented to accelerate the process,reduce the area,and analyze its granularity by increasing the grain size of the reconfigurable region.Results show that the implementation in hardware is 68 times faster than that in software.This architecture with DPR also facilitates the low power consumption of 4 mW,leading to a reduction of 57%than that without DPR.This is also the lowest power consumption in current machine learning hardware implementations.Besides,the circuitry area is 41 times smaller.SVR with Gaussian kernel shows a success rate of 99.18%and minimum square error of 0.0146 for testing with the planning trajectory.This system is useful for adaptive applications where the user/designer can modify/reconfigure the hardware layout during its application,thus contributing to lower power consumption,smaller hardware area,and shorter execution time.展开更多
Real-time indoor camera localization is a significant problem in indoor robot navigation and surveillance systems.The scene can change during the image sequence and plays a vital role in the localization performance o...Real-time indoor camera localization is a significant problem in indoor robot navigation and surveillance systems.The scene can change during the image sequence and plays a vital role in the localization performance of robotic applications in terms of accuracy and speed.This research proposed a real-time indoor camera localization system based on a recurrent neural network that detects scene change during the image sequence.An annotated image dataset trains the proposed system and predicts the camera pose in real-time.The system mainly improved the localization performance of indoor cameras by more accurately predicting the camera pose.It also recognizes the scene changes during the sequence and evaluates the effects of these changes.This system achieved high accuracy and real-time performance.The scene change detection process was performed using visual rhythm and the proposed recurrent deep architecture,which performed camera pose prediction and scene change impact evaluation.Overall,this study proposed a novel real-time localization system for indoor cameras that detects scene changes and shows how they affect localization performance.展开更多
基金the National Natural Science Foundation of China(Grant Number 62076246).
文摘Human pose estimation aims to localize the body joints from image or video data.With the development of deeplearning,pose estimation has become a hot research topic in the field of computer vision.In recent years,humanpose estimation has achieved great success in multiple fields such as animation and sports.However,to obtainaccurate positioning results,existing methods may suffer from large model sizes,a high number of parameters,and increased complexity,leading to high computing costs.In this paper,we propose a new lightweight featureencoder to construct a high-resolution network that reduces the number of parameters and lowers the computingcost.We also introduced a semantic enhancement module that improves global feature extraction and networkperformance by combining channel and spatial dimensions.Furthermore,we propose a dense connected spatialpyramid pooling module to compensate for the decrease in image resolution and information loss in the network.Finally,ourmethod effectively reduces the number of parameters and complexitywhile ensuring high performance.Extensive experiments show that our method achieves a competitive performance while dramatically reducing thenumber of parameters,and operational complexity.Specifically,our method can obtain 89.9%AP score on MPIIVAL,while the number of parameters and the complexity of operations were reduced by 41%and 36%,respectively.
基金supportted by Natural Science Foundation of Jiangsu Province(No.BK20230696).
文摘Electric power training is essential for ensuring the safety and reliability of the system.In this study,we introduce a novel Abnormal Action Recognition(AAR)system that utilizes a Lightweight Pose Estimation Network(LPEN)to efficiently and effectively detect abnormal fall-down and trespass incidents in electric power training scenarios.The LPEN network,comprising three stages—MobileNet,Initial Stage,and Refinement Stage—is employed to swiftly extract image features,detect human key points,and refine them for accurate analysis.Subsequently,a Pose-aware Action Analysis Module(PAAM)captures the positional coordinates of human skeletal points in each frame.Finally,an Abnormal Action Inference Module(AAIM)evaluates whether abnormal fall-down or unauthorized trespass behavior is occurring.For fall-down recognition,three criteria—falling speed,main angles of skeletal points,and the person’s bounding box—are considered.To identify unauthorized trespass,emphasis is placed on the position of the ankles.Extensive experiments validate the effectiveness and efficiency of the proposed system in ensuring the safety and reliability of electric power training.
基金supported by the Natural Science Foundation of Hubei Province of China under grant number 2022CFB536the National Natural Science Foundation of China under grant number 62367006the 15th Graduate Education Innovation Fund of Wuhan Institute of Technology under grant number CX2023579.
文摘Human pose estimation is a critical research area in the field of computer vision,playing a significant role in applications such as human-computer interaction,behavior analysis,and action recognition.In this paper,we propose a U-shaped keypoint detection network(DAUNet)based on an improved ResNet subsampling structure and spatial grouping mechanism.This network addresses key challenges in traditional methods,such as information loss,large network redundancy,and insufficient sensitivity to low-resolution features.DAUNet is composed of three main components.First,we introduce an improved BottleNeck block that employs partial convolution and strip pooling to reduce computational load and mitigate feature loss.Second,after upsampling,the network eliminates redundant features,improving the overall efficiency.Finally,a lightweight spatial grouping attention mechanism is applied to enhance low-resolution semantic features within the feature map,allowing for better restoration of the original image size and higher accuracy.Experimental results demonstrate that DAUNet achieves superior accuracy compared to most existing keypoint detection models,with a mean PCKh@0.5 score of 91.6%on the MPII dataset and an AP of 76.1%on the COCO dataset.Moreover,real-world experiments further validate the robustness and generalizability of DAUNet for detecting human bodies in unknown environments,highlighting its potential for broader applications.
基金This work was supported by grants fromthe Natural Science Foundation of Hebei Province,under Grant No.F2021202021the S&T Program of Hebei,under Grant No.22375001Dthe National Key R&D Program of China,under Grant No.2019YFB1312500.
文摘Human pose estimation is a basic and critical task in the field of computer vision that involves determining the position(or spatial coordinates)of the joints of the human body in a given image or video.It is widely used in motion analysis,medical evaluation,and behavior monitoring.In this paper,the authors propose a method for multi-view human pose estimation.Two image sensors were placed orthogonally with respect to each other to capture the pose of the subject as they moved,and this yielded accurate and comprehensive results of three-dimensional(3D)motion reconstruction that helped capture their multi-directional poses.Following this,we propose a method based on 3D pose estimation to assess the similarity of the features of motion of patients with motor dysfunction by comparing differences between their range of motion and that of normal subjects.We converted these differences into Fugl–Meyer assessment(FMA)scores in order to quantify them.Finally,we implemented the proposed method in the Unity framework,and built a Virtual Reality platform that provides users with human–computer interaction to make the task more enjoyable for them and ensure their active participation in the assessment process.The goal is to provide a suitable means of assessing movement disorders without requiring the immediate supervision of a physician.
基金supported by the Program of Entrepreneurship and Innovation Ph.D.in Jiangsu Province(JSSCBS20211175)the School Ph.D.Talent Funding(Z301B2055)the Natural Science Foundation of the Jiangsu Higher Education Institutions of China(21KJB520002).
文摘3D human pose estimation is a major focus area in the field of computer vision,which plays an important role in practical applications.This article summarizes the framework and research progress related to the estimation of monocular RGB images and videos.An overall perspective ofmethods integrated with deep learning is introduced.Novel image-based and video-based inputs are proposed as the analysis framework.From this viewpoint,common problems are discussed.The diversity of human postures usually leads to problems such as occlusion and ambiguity,and the lack of training datasets often results in poor generalization ability of the model.Regression methods are crucial for solving such problems.Considering image-based input,the multi-view method is commonly used to solve occlusion problems.Here,the multi-view method is analyzed comprehensively.By referring to video-based input,the human prior knowledge of restricted motion is used to predict human postures.In addition,structural constraints are widely used as prior knowledge.Furthermore,weakly supervised learningmethods are studied and discussed for these two types of inputs to improve the model generalization ability.The problem of insufficient training datasets must also be considered,especially because 3D datasets are usually biased and limited.Finally,emerging and popular datasets and evaluation indicators are discussed.The characteristics of the datasets and the relationships of the indicators are explained and highlighted.Thus,this article can be useful and instructive for researchers who are lacking in experience and find this field confusing.In addition,by providing an overview of 3D human pose estimation,this article sorts and refines recent studies on 3D human pose estimation.It describes kernel problems and common useful methods,and discusses the scope for further research.
文摘Human pose estimation(HPE)is a procedure for determining the structure of the body pose and it is considered a challenging issue in the computer vision(CV)communities.HPE finds its applications in several fields namely activity recognition and human-computer interface.Despite the benefits of HPE,it is still a challenging process due to the variations in visual appearances,lighting,occlusions,dimensionality,etc.To resolve these issues,this paper presents a squirrel search optimization with a deep convolutional neural network for HPE(SSDCNN-HPE)technique.The major intention of the SSDCNN-HPE technique is to identify the human pose accurately and efficiently.Primarily,the video frame conversion process is performed and pre-processing takes place via bilateral filtering-based noise removal process.Then,the EfficientNet model is applied to identify the body points of a person with no problem constraints.Besides,the hyperparameter tuning of the EfficientNet model takes place by the use of the squirrel search algorithm(SSA).In the final stage,the multiclass support vector machine(M-SVM)technique was utilized for the identification and classification of human poses.The design of bilateral filtering followed by SSA based EfficientNetmodel for HPE depicts the novelty of the work.To demonstrate the enhanced outcomes of the SSDCNN-HPE approach,a series of simulations are executed.The experimental results reported the betterment of the SSDCNN-HPE system over the recent existing techniques in terms of different measures.
文摘Human Action Recognition(HAR)and pose estimation from videos have gained significant attention among research communities due to its applica-tion in several areas namely intelligent surveillance,human robot interaction,robot vision,etc.Though considerable improvements have been made in recent days,design of an effective and accurate action recognition model is yet a difficult process owing to the existence of different obstacles such as variations in camera angle,occlusion,background,movement speed,and so on.From the literature,it is observed that hard to deal with the temporal dimension in the action recognition process.Convolutional neural network(CNN)models could be used widely to solve this.With this motivation,this study designs a novel key point extraction with deep convolutional neural networks based pose estimation(KPE-DCNN)model for activity recognition.The KPE-DCNN technique initially converts the input video into a sequence of frames followed by a three stage process namely key point extraction,hyperparameter tuning,and pose estimation.In the keypoint extraction process an OpenPose model is designed to compute the accurate key-points in the human pose.Then,an optimal DCNN model is developed to classify the human activities label based on the extracted key points.For improving the training process of the DCNN technique,RMSProp optimizer is used to optimally adjust the hyperparameters such as learning rate,batch size,and epoch count.The experimental results tested using benchmark dataset like UCF sports dataset showed that KPE-DCNN technique is able to achieve good results compared with benchmark algorithms like CNN,DBN,SVM,STAL,T-CNN and so on.
基金supported by the[Universiti Sains Malaysia]under FRGS Grant Number[FRGS/1/2020/STG07/USM/02/12(203.PKOMP.6711930)]FRGS Grant Number[304PTEKIND.6316497.USM.].
文摘In this article,a comprehensive survey of deep learning-based(DLbased)human pose estimation(HPE)that can help researchers in the domain of computer vision is presented.HPE is among the fastest-growing research domains of computer vision and is used in solving several problems for human endeavours.After the detailed introduction,three different human body modes followed by the main stages of HPE and two pipelines of twodimensional(2D)HPE are presented.The details of the four components of HPE are also presented.The keypoints output format of two popular 2D HPE datasets and the most cited DL-based HPE articles from the year of breakthrough are both shown in tabular form.This study intends to highlight the limitations of published reviews and surveys respecting presenting a systematic review of the current DL-based solution to the 2D HPE model.Furthermore,a detailed and meaningful survey that will guide new and existing researchers on DL-based 2D HPE models is achieved.Finally,some future research directions in the field of HPE,such as limited data on disabled persons and multi-training DL-based models,are revealed to encourage researchers and promote the growth of HPE research.
基金National Natural Science Foundation of China,No.61972458Natural Science Foundation of Zhejiang Province,No.LZ23F020002.
文摘Deep neural networks are vulnerable to attacks from adversarial inputs.Corresponding attack research on human pose estimation(HPE),particularly for body joint detection,has been largely unexplored.Transferring classification-based attack methods to body joint regression tasks is not straightforward.Another issue is that the attack effectiveness and imperceptibility contradict each other.To solve these issues,we propose local imperceptible attacks on HPE networks.In particular,we reformulate imperceptible attacks on body joint regression into a constrained maximum allowable attack.Furthermore,we approximate the solution using iterative gradient-based strength refinement and greedy-based pixel selection.Our method crafts effective perceptual adversarial attacks that consider both human perception and attack effectiveness.We conducted a series of imperceptible attacks against state-of-the-art HPE methods,including HigherHRNet,DEKR,and ViTPose.The experimental results demonstrate that the proposed method achieves excellent imperceptibility while maintaining attack effectiveness by significantly reducing the number of perturbed pixels.Approximately 4%of the pixels can achieve sufficient attacks on HPE.
基金Supported by National Natural Science Foundation of China(Grant No.12272104).
文摘Spacecraft pose estimation is an important technology to maintain or change the spacecraft orientation in space.For spacecraft pose estimation,when two spacecraft are relatively distant,the depth information of the space point is less than that of the measuring distance,so the camera model can be seen as a weak perspective projection model.In this paper,a spacecraft pose estimation algorithm based on four symmetrical points of the spacecraft outline is proposed.The analytical solution of the spacecraft pose is obtained by solving the weak perspective projection model,which can satisfy the requirements of the measurement model when the measurement distance is long.The optimal solution is obtained from the weak perspective projection model to the perspective projection model,which can meet the measurement requirements when the measuring distance is small.The simulation results show that the proposed algorithm can obtain better results,even though the noise is large.
基金supported in part by the Key Program of NSFC (Grant No.U1908214)Special Project of Central Government Guiding Local Science and Technology Development (Grant No.2021JH6/10500140)+3 种基金Program for the Liaoning Distinguished Professor,Program for Innovative Research Team in University of Liaoning Province (LT2020015)Dalian (2021RT06)and Dalian University (XLJ202010)the Science and Technology Innovation Fund of Dalian (Grant No.2020JJ25CY001)Dalian University Scientific Research Platform Project (No.202101YB03).
文摘Multi-view multi-person 3D human pose estimation is a hot topic in the field of human pose estimation due to its wide range of application scenarios.With the introduction of end-to-end direct regression methods,the field has entered a new stage of development.However,the regression results of joints that are more heavily influenced by external factors are not accurate enough even for the optimal method.In this paper,we propose an effective feature recalibration module based on the channel attention mechanism and a relative optimal calibration strategy,which is applied to themulti-viewmulti-person 3D human pose estimation task to achieve improved detection accuracy for joints that are more severely affected by external factors.Specifically,it achieves relative optimal weight adjustment of joint feature information through the recalibration module and strategy,which enables the model to learn the dependencies between joints and the dependencies between people and their corresponding joints.We call this method as the Efficient Recalibration Network(ER-Net).Finally,experiments were conducted on two benchmark datasets for this task,Campus and Shelf,in which the PCP reached 97.3% and 98.3%,respectively.
基金supported in part by the National Natural Science Foundation of China 6167246662011530130,Joint Fund of Zhejiang Provincial Natural Science Foundation LSZ19F010001.
文摘Scale variation is amajor challenge inmulti-person pose estimation.In scenes where persons are present at various distances,models tend to perform better on larger-scale persons,while the performance for smaller-scale persons often falls short of expectations.Therefore,effectively balancing the persons of different scales poses a significant challenge.So this paper proposes a newmulti-person pose estimation model called FSANet to improve themodel’s performance in complex scenes.Our model utilizes High-Resolution Network(HRNet)as the backbone and feeds the outputs of the last stage’s four branches into the DCB module.The dilated convolution-based(DCB)module employs a parallel structure that incorporates dilated convolutions with different rates to expand the receptive field of each branch.Subsequently,the attention operation-based(AOB)module performs attention operations at both branch and channel levels to enhance high-frequency features and reduce the influence of noise.Finally,predictions are made using the heatmap representation.The model can recognize images with diverse scales and more complex semantic information.Experimental results demonstrate that FSA Net achieves competitive results on the MSCOCO and MPII datasets,validating the effectiveness of our proposed approach.
基金Supported by the National Key Research and Development Programme of China(2018YFC0831201).
文摘Background In computer vision,simultaneously estimating human pose,shape,and clothing is a practical issue in real life,but remains a challenging task owing to the variety of clothing,complexity of de-formation,shortage of large-scale datasets,and difficulty in estimating clothing style.Methods We propose a multistage weakly supervised method that makes full use of data with less labeled information for learning to estimate human body shape,pose,and clothing deformation.In the first stage,the SMPL human-body model parameters were regressed using the multi-view 2D key points of the human body.Using multi-view information as weakly supervised information can avoid the deep ambiguity problem of a single view,obtain a more accurate human posture,and access supervisory information easily.In the second stage,clothing is represented by a PCA-based model that uses two-dimensional key points of clothing as supervised information to regress the parameters.In the third stage,we predefine an embedding graph for each type of clothing to describe the deformation.Then,the mask information of the clothing is used to further adjust the deformation of the clothing.To facilitate training,we constructed a multi-view synthetic dataset that included BCNet and SURREAL.Results The Experiments show that the accuracy of our method reaches the same level as that of SOTA methods using strong supervision information while only using weakly supervised information.Because this study uses only weakly supervised information,which is much easier to obtain,it has the advantage of utilizing existing data as training data.Experiments on the DeepFashion2 dataset show that our method can make full use of the existing weak supervision information for fine-tuning on a dataset with little supervision information,compared with the strong supervision information that cannot be trained or adjusted owing to the lack of exact annotation information.Conclusions Our weak supervision method can accurately estimate human body size,pose,and several common types of clothing and overcome the issues of the current shortage of clothing data.
基金supported by Nanyang Technological University,Singapore under the Wallenberg-NTU Presidential Postdoctoral Fellowship and the Natural Science Foundation in Heilongjiang Province,China(YQ2022F003).
文摘This article studies distributed pose(orientation and position)estimation of leader–follower multi-agent systems over𝜅-layer graphs in 2-D plane.Only the leaders have access to their orientations and positions,while the followers can measure the relative bearings or(angular and linear)velocities in their unknown local coordinate frames.For the orientation estimation,the local relative bearings are used to obtain the relative orientations among the agents,based on which a distributed orientation estimation algorithm is proposed for each follower to estimate its orientation.For the position estimation,the local relative bearings are used to obtain the position constraints among the agents,and a distributed position estimation algorithm is proposed for each follower to estimate its position by solving its position constraints.Both the orientation and position estimation errors converge to zero asymptotically.A simulation example is given to verify the theoretical results.
基金supported by the National Natural Science Foundation of China(52130801,U20A20312,52178271,and 52077213)the National Key Research and Development Program of China(2021YFF0500903)。
文摘Identifying workers’construction activities or behaviors can enable managers to better monitor labor efficiency and construction progress.However,current activity analysis methods for construction workers rely solely on manual observations and recordings,which consumes considerable time and has high labor costs.Researchers have focused on monitoring on-site construction activities of workers.However,when multiple workers are working together,current research cannot accu rately and automatically identify the construction activity.This research proposes a deep learning framework for the automated analysis of the construction activities of multiple workers.In this framework,multiple deep neural network models are designed and used to complete worker key point extraction,worker tracking,and worker construction activity analysis.The designed framework was tested at an actual construction site,and activity recognition for multiple workers was performed,indicating the feasibility of the framework for the automated monitoring of work efficiency.
基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.RS-2023-00218176)and the Soonchunhyang University Research Fund.
文摘Human Interaction Recognition(HIR)was one of the challenging issues in computer vision research due to the involvement of multiple individuals and their mutual interactions within video frames generated from their movements.HIR requires more sophisticated analysis than Human Action Recognition(HAR)since HAR focuses solely on individual activities like walking or running,while HIR involves the interactions between people.This research aims to develop a robust system for recognizing five common human interactions,such as hugging,kicking,pushing,pointing,and no interaction,from video sequences using multiple cameras.In this study,a hybrid Deep Learning(DL)and Machine Learning(ML)model was employed to improve classification accuracy and generalizability.The dataset was collected in an indoor environment with four-channel cameras capturing the five types of interactions among 13 participants.The data was processed using a DL model with a fine-tuned ResNet(Residual Networks)architecture based on 2D Convolutional Neural Network(CNN)layers for feature extraction.Subsequently,machine learning models were trained and utilized for interaction classification using six commonly used ML algorithms,including SVM,KNN,RF,DT,NB,and XGBoost.The results demonstrate a high accuracy of 95.45%in classifying human interactions.The hybrid approach enabled effective learning,resulting in highly accurate performance across different interaction types.Future work will explore more complex scenarios involving multiple individuals based on the application of this architecture.
基金the Deanship of Scientific Research at Majmaah University for funding this work under Project No.R-2023-667.
文摘Inpatient falls from beds in hospitals are a common problem.Such falls may result in severe injuries.This problem can be addressed by continuous monitoring of patients using cameras.Recent advancements in deep learning-based video analytics have made this task of fall detection more effective and efficient.Along with fall detection,monitoring of different activities of the patients is also of significant concern to assess the improvement in their health.High computation-intensive models are required to monitor every action of the patient precisely.This requirement limits the applicability of such networks.Hence,to keep the model lightweight,the already designed fall detection networks can be extended to monitor the general activities of the patients along with the fall detection.Motivated by the same notion,we propose a novel,lightweight,and efficient patient activity monitoring system that broadly classifies the patients’activities into fall,activity,and rest classes based on their poses.The whole network comprises three sub-networks,namely a Convolutional Neural Networks(CNN)based video compression network,a Lightweight Pose Network(LPN)and a Residual Network(ResNet)Mixer block-based activity recognition network.The compression network compresses the video streams using deep learning networks for efficient storage and retrieval;after that,LPN estimates human poses.Finally,the activity recognition network classifies the patients’activities based on their poses.The proposed system shows an overall accuracy of approx.99.7% over a standard dataset with 99.63% fall detection accuracy and efficiently monitors different events,which may help monitor the falls and improve the inpatients’health.
基金Supported in part by the National Natural Science Foundation of China(No.62271148).
文摘With the rapid development of space science projects,large deployable mechanisms have been widely used.However,due to the effects of mechanical friction and gravitational acceleration,on-orbit mechanisms cannot be always deployed to the expected pose.For some precision optical mechanisms,even a minor deviation can result in significant error,so it needs to be measured and corrected.In this paper,the deployment process was modeled and simplified as rotation under single-rotation-axis constraint and translation under single-direction constraint.To solve the problem,a method based on cross-ratio invariability was proposed.The proposed method does not rely on camera calibration techniques,as well as artificial marking points,both of which are necessary in PnP.Instead,only three calibration images before launch and a measurement image on orbit were required.Simulations and experiments demonstrated that the proposed method is more accurate than PnP.In addition,experiments also proved that the feasibility of the proposed method under dark conditions with the aid of a light source and some reflective marking points.
基金financially supported by the National Council for Scientific and Technological Development(CNPq,Brazil),Swedish-Brazilian Research and Innovation Centre(CISB),and Saab AB under Grant No.CNPq:200053/2022-1the National Council for Scientific and Technological Development(CNPq,Brazil)under Grants No.CNPq:312924/2017-8 and No.CNPq:314660/2020-8.
文摘Unmanned aerial vehicles(UAVs)have been widely used in military,medical,wireless communications,aerial surveillance,etc.One key topic involving UAVs is pose estimation in autonomous navigation.A standard procedure for this process is to combine inertial navigation system sensor information with the global navigation satellite system(GNSS)signal.However,some factors can interfere with the GNSS signal,such as ionospheric scintillation,jamming,or spoofing.One alternative method to avoid using the GNSS signal is to apply an image processing approach by matching UAV images with georeferenced images.But a high effort is required for image edge extraction.Here a support vector regression(SVR)model is proposed to reduce this computational load and processing time.The dynamic partial reconfiguration(DPR)of part of the SVR datapath is implemented to accelerate the process,reduce the area,and analyze its granularity by increasing the grain size of the reconfigurable region.Results show that the implementation in hardware is 68 times faster than that in software.This architecture with DPR also facilitates the low power consumption of 4 mW,leading to a reduction of 57%than that without DPR.This is also the lowest power consumption in current machine learning hardware implementations.Besides,the circuitry area is 41 times smaller.SVR with Gaussian kernel shows a success rate of 99.18%and minimum square error of 0.0146 for testing with the planning trajectory.This system is useful for adaptive applications where the user/designer can modify/reconfigure the hardware layout during its application,thus contributing to lower power consumption,smaller hardware area,and shorter execution time.
文摘Real-time indoor camera localization is a significant problem in indoor robot navigation and surveillance systems.The scene can change during the image sequence and plays a vital role in the localization performance of robotic applications in terms of accuracy and speed.This research proposed a real-time indoor camera localization system based on a recurrent neural network that detects scene change during the image sequence.An annotated image dataset trains the proposed system and predicts the camera pose in real-time.The system mainly improved the localization performance of indoor cameras by more accurately predicting the camera pose.It also recognizes the scene changes during the sequence and evaluates the effects of these changes.This system achieved high accuracy and real-time performance.The scene change detection process was performed using visual rhythm and the proposed recurrent deep architecture,which performed camera pose prediction and scene change impact evaluation.Overall,this study proposed a novel real-time localization system for indoor cameras that detects scene changes and shows how they affect localization performance.