In this paper we present a CNN based approach for a real time 3 D-hand pose estimation from the depth sequence.Prior discriminative approaches have achieved remarkable success but are facing two main challenges:Firstl...In this paper we present a CNN based approach for a real time 3 D-hand pose estimation from the depth sequence.Prior discriminative approaches have achieved remarkable success but are facing two main challenges:Firstly,the methods are fully supervised hence require large numbers of annotated training data to extract the dynamic information from a hand representation.Secondly,unreliable hand detectors based on strong assumptions or a weak detector which often fail in several situations like complex environment and multiple hands.In contrast to these methods,this paper presents an approach that can be considered as semi-supervised by performing predictive coding of image sequences of hand poses in order to capture latent features underlying a given image without supervision.The hand is modelled using a novel latent tree dependency model(LDTM)which transforms internal joint location to an explicit representation.Then the modeled hand topology is integrated with the pose estimator using data dependent method to jointly learn latent variables of the posterior pose appearance and the pose configuration respectively.Finally,an unsupervised error term which is a part of the recurrent architecture ensures smooth estimations of the final pose.Experiments on three challenging public datasets,ICVL,MSRA,and NYU demonstrate the significant performance of the proposed method which is comparable or better than state-of-the-art approaches.展开更多
The field of vision-based human hand three-dimensional(3D)shape and pose estimation has attracted significant attention recently owing to its key role in various applications,such as natural human computer interaction...The field of vision-based human hand three-dimensional(3D)shape and pose estimation has attracted significant attention recently owing to its key role in various applications,such as natural human computer interactions.With the availability of large-scale annotated hand datasets and the rapid developments of deep neural networks(DNNs),numerous DNN-based data-driven methods have been proposed for accurate and rapid hand shape and pose estimation.Nonetheless,the existence of complicated hand articulation,depth and scale ambiguities,occlusions,and finger similarity remain challenging.In this study,we present a comprehensive survey of state-of-the-art 3D hand shape and pose estimation approaches using RGB-D cameras.Related RGB-D cameras,hand datasets,and a performance analysis are also discussed to provide a holistic view of recent achievements.We also discuss the research potential of this rapidly growing field.展开更多
基金supported in part by the Fundamental Research Funds for the Central Universities(WK2350000002)。
文摘In this paper we present a CNN based approach for a real time 3 D-hand pose estimation from the depth sequence.Prior discriminative approaches have achieved remarkable success but are facing two main challenges:Firstly,the methods are fully supervised hence require large numbers of annotated training data to extract the dynamic information from a hand representation.Secondly,unreliable hand detectors based on strong assumptions or a weak detector which often fail in several situations like complex environment and multiple hands.In contrast to these methods,this paper presents an approach that can be considered as semi-supervised by performing predictive coding of image sequences of hand poses in order to capture latent features underlying a given image without supervision.The hand is modelled using a novel latent tree dependency model(LDTM)which transforms internal joint location to an explicit representation.Then the modeled hand topology is integrated with the pose estimator using data dependent method to jointly learn latent variables of the posterior pose appearance and the pose configuration respectively.Finally,an unsupervised error term which is a part of the recurrent architecture ensures smooth estimations of the final pose.Experiments on three challenging public datasets,ICVL,MSRA,and NYU demonstrate the significant performance of the proposed method which is comparable or better than state-of-the-art approaches.
基金the National Key R&D Program of China(2018YFB1004600)the National Natural Science Foundation of China(61502187,61876211)the National Science Foundation Grant CNS(1951952).
文摘The field of vision-based human hand three-dimensional(3D)shape and pose estimation has attracted significant attention recently owing to its key role in various applications,such as natural human computer interactions.With the availability of large-scale annotated hand datasets and the rapid developments of deep neural networks(DNNs),numerous DNN-based data-driven methods have been proposed for accurate and rapid hand shape and pose estimation.Nonetheless,the existence of complicated hand articulation,depth and scale ambiguities,occlusions,and finger similarity remain challenging.In this study,we present a comprehensive survey of state-of-the-art 3D hand shape and pose estimation approaches using RGB-D cameras.Related RGB-D cameras,hand datasets,and a performance analysis are also discussed to provide a holistic view of recent achievements.We also discuss the research potential of this rapidly growing field.