Human pose estimation aims to localize the body joints from image or video data.With the development of deeplearning,pose estimation has become a hot research topic in the field of computer vision.In recent years,huma...Human pose estimation aims to localize the body joints from image or video data.With the development of deeplearning,pose estimation has become a hot research topic in the field of computer vision.In recent years,humanpose estimation has achieved great success in multiple fields such as animation and sports.However,to obtainaccurate positioning results,existing methods may suffer from large model sizes,a high number of parameters,and increased complexity,leading to high computing costs.In this paper,we propose a new lightweight featureencoder to construct a high-resolution network that reduces the number of parameters and lowers the computingcost.We also introduced a semantic enhancement module that improves global feature extraction and networkperformance by combining channel and spatial dimensions.Furthermore,we propose a dense connected spatialpyramid pooling module to compensate for the decrease in image resolution and information loss in the network.Finally,ourmethod effectively reduces the number of parameters and complexitywhile ensuring high performance.Extensive experiments show that our method achieves a competitive performance while dramatically reducing thenumber of parameters,and operational complexity.Specifically,our method can obtain 89.9%AP score on MPIIVAL,while the number of parameters and the complexity of operations were reduced by 41%and 36%,respectively.展开更多
Human pose estimation is a basic and critical task in the field of computer vision that involves determining the position(or spatial coordinates)of the joints of the human body in a given image or video.It is widely u...Human pose estimation is a basic and critical task in the field of computer vision that involves determining the position(or spatial coordinates)of the joints of the human body in a given image or video.It is widely used in motion analysis,medical evaluation,and behavior monitoring.In this paper,the authors propose a method for multi-view human pose estimation.Two image sensors were placed orthogonally with respect to each other to capture the pose of the subject as they moved,and this yielded accurate and comprehensive results of three-dimensional(3D)motion reconstruction that helped capture their multi-directional poses.Following this,we propose a method based on 3D pose estimation to assess the similarity of the features of motion of patients with motor dysfunction by comparing differences between their range of motion and that of normal subjects.We converted these differences into Fugl–Meyer assessment(FMA)scores in order to quantify them.Finally,we implemented the proposed method in the Unity framework,and built a Virtual Reality platform that provides users with human–computer interaction to make the task more enjoyable for them and ensure their active participation in the assessment process.The goal is to provide a suitable means of assessing movement disorders without requiring the immediate supervision of a physician.展开更多
Six degrees of freedom(6DoF)input interfaces are essential formanipulating virtual objects through translation or rotation in three-dimensional(3D)space.A traditional outside-in tracking controller requires the instal...Six degrees of freedom(6DoF)input interfaces are essential formanipulating virtual objects through translation or rotation in three-dimensional(3D)space.A traditional outside-in tracking controller requires the installation of expensive hardware in advance.While inside-out tracking controllers have been proposed,they often suffer from limitations such as interaction limited to the tracking range of the sensor(e.g.,a sensor on the head-mounted display(HMD))or the need for pose value modification to function as an input interface(e.g.,a sensor on the controller).This study investigates 6DoF pose estimation methods without restricting the tracking range,using a smartphone as a controller in augmented reality(AR)environments.Our approach involves proposing methods for estimating the initial pose of the controller and correcting the pose using an inside-out tracking approach.In addition,seven pose estimation algorithms were presented as candidates depending on the tracking range of the device sensor,the tracking method(e.g.,marker recognition,visual-inertial odometry(VIO)),and whether modification of the initial pose is necessary.Through two experiments(discrete and continuous data),the performance of the algorithms was evaluated.The results demonstrate enhanced final pose accuracy achieved by correcting the initial pose.Furthermore,the importance of selecting the tracking algorithm based on the tracking range of the devices and the actual input value of the 3D interaction was emphasized.展开更多
Human pose estimation is a critical research area in the field of computer vision,playing a significant role in applications such as human-computer interaction,behavior analysis,and action recognition.In this paper,we...Human pose estimation is a critical research area in the field of computer vision,playing a significant role in applications such as human-computer interaction,behavior analysis,and action recognition.In this paper,we propose a U-shaped keypoint detection network(DAUNet)based on an improved ResNet subsampling structure and spatial grouping mechanism.This network addresses key challenges in traditional methods,such as information loss,large network redundancy,and insufficient sensitivity to low-resolution features.DAUNet is composed of three main components.First,we introduce an improved BottleNeck block that employs partial convolution and strip pooling to reduce computational load and mitigate feature loss.Second,after upsampling,the network eliminates redundant features,improving the overall efficiency.Finally,a lightweight spatial grouping attention mechanism is applied to enhance low-resolution semantic features within the feature map,allowing for better restoration of the original image size and higher accuracy.Experimental results demonstrate that DAUNet achieves superior accuracy compared to most existing keypoint detection models,with a mean PCKh@0.5 score of 91.6%on the MPII dataset and an AP of 76.1%on the COCO dataset.Moreover,real-world experiments further validate the robustness and generalizability of DAUNet for detecting human bodies in unknown environments,highlighting its potential for broader applications.展开更多
基金the National Natural Science Foundation of China(Grant Number 62076246).
文摘Human pose estimation aims to localize the body joints from image or video data.With the development of deeplearning,pose estimation has become a hot research topic in the field of computer vision.In recent years,humanpose estimation has achieved great success in multiple fields such as animation and sports.However,to obtainaccurate positioning results,existing methods may suffer from large model sizes,a high number of parameters,and increased complexity,leading to high computing costs.In this paper,we propose a new lightweight featureencoder to construct a high-resolution network that reduces the number of parameters and lowers the computingcost.We also introduced a semantic enhancement module that improves global feature extraction and networkperformance by combining channel and spatial dimensions.Furthermore,we propose a dense connected spatialpyramid pooling module to compensate for the decrease in image resolution and information loss in the network.Finally,ourmethod effectively reduces the number of parameters and complexitywhile ensuring high performance.Extensive experiments show that our method achieves a competitive performance while dramatically reducing thenumber of parameters,and operational complexity.Specifically,our method can obtain 89.9%AP score on MPIIVAL,while the number of parameters and the complexity of operations were reduced by 41%and 36%,respectively.
基金This work was supported by grants fromthe Natural Science Foundation of Hebei Province,under Grant No.F2021202021the S&T Program of Hebei,under Grant No.22375001Dthe National Key R&D Program of China,under Grant No.2019YFB1312500.
文摘Human pose estimation is a basic and critical task in the field of computer vision that involves determining the position(or spatial coordinates)of the joints of the human body in a given image or video.It is widely used in motion analysis,medical evaluation,and behavior monitoring.In this paper,the authors propose a method for multi-view human pose estimation.Two image sensors were placed orthogonally with respect to each other to capture the pose of the subject as they moved,and this yielded accurate and comprehensive results of three-dimensional(3D)motion reconstruction that helped capture their multi-directional poses.Following this,we propose a method based on 3D pose estimation to assess the similarity of the features of motion of patients with motor dysfunction by comparing differences between their range of motion and that of normal subjects.We converted these differences into Fugl–Meyer assessment(FMA)scores in order to quantify them.Finally,we implemented the proposed method in the Unity framework,and built a Virtual Reality platform that provides users with human–computer interaction to make the task more enjoyable for them and ensure their active participation in the assessment process.The goal is to provide a suitable means of assessing movement disorders without requiring the immediate supervision of a physician.
文摘Six degrees of freedom(6DoF)input interfaces are essential formanipulating virtual objects through translation or rotation in three-dimensional(3D)space.A traditional outside-in tracking controller requires the installation of expensive hardware in advance.While inside-out tracking controllers have been proposed,they often suffer from limitations such as interaction limited to the tracking range of the sensor(e.g.,a sensor on the head-mounted display(HMD))or the need for pose value modification to function as an input interface(e.g.,a sensor on the controller).This study investigates 6DoF pose estimation methods without restricting the tracking range,using a smartphone as a controller in augmented reality(AR)environments.Our approach involves proposing methods for estimating the initial pose of the controller and correcting the pose using an inside-out tracking approach.In addition,seven pose estimation algorithms were presented as candidates depending on the tracking range of the device sensor,the tracking method(e.g.,marker recognition,visual-inertial odometry(VIO)),and whether modification of the initial pose is necessary.Through two experiments(discrete and continuous data),the performance of the algorithms was evaluated.The results demonstrate enhanced final pose accuracy achieved by correcting the initial pose.Furthermore,the importance of selecting the tracking algorithm based on the tracking range of the devices and the actual input value of the 3D interaction was emphasized.
基金supported by the Natural Science Foundation of Hubei Province of China under grant number 2022CFB536the National Natural Science Foundation of China under grant number 62367006the 15th Graduate Education Innovation Fund of Wuhan Institute of Technology under grant number CX2023579.
文摘Human pose estimation is a critical research area in the field of computer vision,playing a significant role in applications such as human-computer interaction,behavior analysis,and action recognition.In this paper,we propose a U-shaped keypoint detection network(DAUNet)based on an improved ResNet subsampling structure and spatial grouping mechanism.This network addresses key challenges in traditional methods,such as information loss,large network redundancy,and insufficient sensitivity to low-resolution features.DAUNet is composed of three main components.First,we introduce an improved BottleNeck block that employs partial convolution and strip pooling to reduce computational load and mitigate feature loss.Second,after upsampling,the network eliminates redundant features,improving the overall efficiency.Finally,a lightweight spatial grouping attention mechanism is applied to enhance low-resolution semantic features within the feature map,allowing for better restoration of the original image size and higher accuracy.Experimental results demonstrate that DAUNet achieves superior accuracy compared to most existing keypoint detection models,with a mean PCKh@0.5 score of 91.6%on the MPII dataset and an AP of 76.1%on the COCO dataset.Moreover,real-world experiments further validate the robustness and generalizability of DAUNet for detecting human bodies in unknown environments,highlighting its potential for broader applications.