Hand Gesture Recognition(HGR)is a promising research area with an extensive range of applications,such as surgery,video game techniques,and sign language translation,where sign language is a complicated structured for...Hand Gesture Recognition(HGR)is a promising research area with an extensive range of applications,such as surgery,video game techniques,and sign language translation,where sign language is a complicated structured form of hand gestures.The fundamental building blocks of structured expressions in sign language are the arrangement of the fingers,the orientation of the hand,and the hand’s position concerning the body.The importance of HGR has increased due to the increasing number of touchless applications and the rapid growth of the hearing-impaired population.Therefore,real-time HGR is one of the most effective interaction methods between computers and humans.Developing a user-free interface with good recognition performance should be the goal of real-time HGR systems.Nowadays,Convolutional Neural Network(CNN)shows great recognition rates for different image-level classification tasks.It is challenging to train deep CNN networks like VGG-16,VGG-19,Inception-v3,and Efficientnet-B0 from scratch because only some significant labeled image datasets are available for static hand gesture images.However,an efficient and robust hand gesture recognition system of sign language employing finetuned Inception-v3 and Efficientnet-Bo network is proposed to identify hand gestures using a comparative small HGR dataset.Experiments show that Inception-v3 achieved 90%accuracy and 0.93%precision,0.91%recall,and 0.90%f1-score,respectively,while EfficientNet-B0 achieved 99%accuracy and 0.98%,0.97%,0.98%,precision,recall,and f1-score respectively.展开更多
Gesture detection is the primary and most significant step for sign language detection and sign language is the communication medium for people with speaking and hearing disabilities. This paper presents a novel metho...Gesture detection is the primary and most significant step for sign language detection and sign language is the communication medium for people with speaking and hearing disabilities. This paper presents a novel method for dynamic hand gesture detection using Hidden Markov Models (HMMs) where we detect different English alphabet letters by tracing hand movements. The process involves skin color-based segmentation for hand isolation in video frames, followed by morphological operations to enhance image trajectories. Our system employs hand tracking and trajectory smoothing techniques, such as the Kalman filter, to monitor hand movements and refine gesture paths. Quantized sequences are then analyzed using the Baum-Welch Re-estimation Algorithm, an HMM-based approach. A maximum likelihood classifier is used to identify the most probable letter from the test sequences. Our method demonstrates significant improvements over traditional recognition techniques in real-time, automatic hand gesture recognition, particularly in its ability to distinguish complex gestures. The experimental results confirm the effectiveness of our approach in enhancing gesture-based sign language detection to alleviate the barrier between the deaf and hard-of-hearing community and general people.展开更多
In the digital age,non-touch communication technologies are reshaping human-device interactions and raising security concerns.A major challenge in current technology is the misinterpretation of gestures by sensors and...In the digital age,non-touch communication technologies are reshaping human-device interactions and raising security concerns.A major challenge in current technology is the misinterpretation of gestures by sensors and cameras,often caused by environmental factors.This issue has spurred the need for advanced data processing methods to achieve more accurate gesture recognition and predictions.Our study presents a novel virtual keyboard allowing character input via distinct hand gestures,focusing on two key aspects:hand gesture recognition and character input mechanisms.We developed a novel model with LSTM and fully connected layers for enhanced sequential data processing and hand gesture recognition.We also integrated CNN,max-pooling,and dropout layers for improved spatial feature extraction.This model architecture processes both temporal and spatial aspects of hand gestures,using LSTM to extract complex patterns from frame sequences for a comprehensive understanding of input data.Our unique dataset,essential for training the model,includes 1,662 landmarks from dynamic hand gestures,33 postures,and 468 face landmarks,all captured in real-time using advanced pose estimation.The model demonstrated high accuracy,achieving 98.52%in hand gesture recognition and over 97%in character input across different scenarios.Its excellent performance in real-time testing underlines its practicality and effectiveness,marking a significant advancement in enhancing human-device interactions in the digital age.展开更多
Gesture recognition plays an increasingly important role as the requirements of intelligent systems for human-computer interaction methods increase.To improve the accuracy of the millimeter-wave radar gesture detectio...Gesture recognition plays an increasingly important role as the requirements of intelligent systems for human-computer interaction methods increase.To improve the accuracy of the millimeter-wave radar gesture detection algorithm with limited computational resources,this study improves the detection performance in terms of optimized features and interference filtering.The accuracy of the algorithm is improved by refining the combination of gesture features using a self-constructed dataset,and biometric filtering is introduced to reduce the interference of inanimate object motion.Finally,experiments demonstrate the effectiveness of the proposed algorithm in both mitigating interference from inanimate objects and accurately recognizing gestures.Results show a notable 93.29%average reduction in false detections achieved through the integration of biometric filtering into the algorithm’s interpretation of target movements.Additionally,the algorithm adeptly identifies the six gestures with an average accuracy of 96.84%on embedded systems.展开更多
Recognition of human gesture actions is a challenging issue due to the complex patterns in both visual andskeletal features. Existing gesture action recognition (GAR) methods typically analyze visual and skeletal data...Recognition of human gesture actions is a challenging issue due to the complex patterns in both visual andskeletal features. Existing gesture action recognition (GAR) methods typically analyze visual and skeletal data,failing to meet the demands of various scenarios. Furthermore, multi-modal approaches lack the versatility toefficiently process both uniformand disparate input patterns.Thus, in this paper, an attention-enhanced pseudo-3Dresidual model is proposed to address the GAR problem, called HgaNets. This model comprises two independentcomponents designed formodeling visual RGB (red, green and blue) images and 3Dskeletal heatmaps, respectively.More specifically, each component consists of two main parts: 1) a multi-dimensional attention module forcapturing important spatial, temporal and feature information in human gestures;2) a spatiotemporal convolutionmodule that utilizes pseudo-3D residual convolution to characterize spatiotemporal features of gestures. Then,the output weights of the two components are fused to generate the recognition results. Finally, we conductedexperiments on four datasets to assess the efficiency of the proposed model. The results show that the accuracy onfour datasets reaches 85.40%, 91.91%, 94.70%, and 95.30%, respectively, as well as the inference time is 0.54 s andthe parameters is 2.74M. These findings highlight that the proposed model outperforms other existing approachesin terms of recognition accuracy.展开更多
With technology advances and human requirements increasing, human-computer interaction plays an important role in our daily lives. Among these interactions, gesture-based recognition offers a natural and intuitive use...With technology advances and human requirements increasing, human-computer interaction plays an important role in our daily lives. Among these interactions, gesture-based recognition offers a natural and intuitive user experience that does not require physical contact and is becoming increasingly prevalent across various fields. Gesture recognition systems based on Frequency Modulated Continuous Wave (FMCW) millimeter-wave radar are receiving widespread attention due to their ability to operate without wearable sensors, their robustness to environmental factors, and the excellent penetrative ability of radar signals. This paper first reviews the current main gesture recognition applications. Subsequently, we introduce the system of gesture recognition based on FMCW radar and provide a general framework for gesture recognition, including gesture data acquisition, data preprocessing, and classification methods. We then discuss typical applications of gesture recognition systems and summarize the performance of these systems in terms of experimental environment, signal acquisition, signal processing, and classification methods. Specifically, we focus our study on four typical gesture recognition systems, including air-writing recognition, gesture command recognition, sign language recognition, and text input recognition. Finally, this paper addresses the challenges and unresolved problems in FMCW radar-based gesture recognition and provides insights into potential future research directions.展开更多
The use of hand gestures can be the most intuitive human-machine interaction medium.The early approaches for hand gesture recognition used device-based methods.These methods use mechanical or optical sensors attached ...The use of hand gestures can be the most intuitive human-machine interaction medium.The early approaches for hand gesture recognition used device-based methods.These methods use mechanical or optical sensors attached to a glove or markers,which hinder the natural human-machine communication.On the other hand,vision-based methods are less restrictive and allow for a more spontaneous communication without the need of an intermediary between human and machine.Therefore,vision gesture recognition has been a popular area of research for the past thirty years.Hand gesture recognition finds its application in many areas,particularly the automotive industry where advanced automotive human-machine interface(HMI)designers are using gesture recognition to improve driver and vehicle safety.However,technology advances go beyond active/passive safety and into convenience and comfort.In this context,one of America’s big three automakers has partnered with the Centre of Pattern Analysis and Machine Intelligence(CPAMI)at the University of Waterloo to investigate expanding their product segment through machine learning to provide an increased driver convenience and comfort with the particular application of hand gesture recognition for autonomous car parking.The present paper leverages the state-of-the-art deep learning and optimization techniques to develop a vision-based multiview dynamic hand gesture recognizer for a self-parking system.We propose a 3D-CNN gesture model architecture that we train on a publicly available hand gesture database.We apply transfer learning methods to fine-tune the pre-trained gesture model on custom-made data,which significantly improves the proposed system performance in a real world environment.We adapt the architecture of end-to-end solution to expand the state-of-the-art video classifier from a single image as input(fed by monocular camera)to a Multiview 360 feed,offered by a six cameras module.Finally,we optimize the proposed solution to work on a limited resource embedded platform(Nvidia Jetson TX2)that is used by automakers for vehicle-based features,without sacrificing the accuracy robustness and real time functionality of the system.展开更多
Hand gestures are a natural way for human-robot interaction.Vision based dynamic hand gesture recognition has become a hot research topic due to its various applications.This paper presents a novel deep learning netwo...Hand gestures are a natural way for human-robot interaction.Vision based dynamic hand gesture recognition has become a hot research topic due to its various applications.This paper presents a novel deep learning network for hand gesture recognition.The network integrates several well-proved modules together to learn both short-term and long-term features from video inputs and meanwhile avoid intensive computation.To learn short-term features,each video input is segmented into a fixed number of frame groups.A frame is randomly selected from each group and represented as an RGB image as well as an optical flow snapshot.These two entities are fused and fed into a convolutional neural network(Conv Net)for feature extraction.The Conv Nets for all groups share parameters.To learn longterm features,outputs from all Conv Nets are fed into a long short-term memory(LSTM)network,by which a final classification result is predicted.The new model has been tested with two popular hand gesture datasets,namely the Jester dataset and Nvidia dataset.Comparing with other models,our model produced very competitive results.The robustness of the new model has also been proved with an augmented dataset with enhanced diversity of hand gestures.展开更多
Hand gesture recognition is a popular topic in computer vision and makes human-computer interaction more flexible and convenient.The representation of hand gestures is critical for recognition.In this paper,we propose...Hand gesture recognition is a popular topic in computer vision and makes human-computer interaction more flexible and convenient.The representation of hand gestures is critical for recognition.In this paper,we propose a new method to measure the similarity between hand gestures and exploit it for hand gesture recognition.The depth maps of hand gestures captured via the Kinect sensors are used in our method,where the 3D hand shapes can be segmented from the cluttered backgrounds.To extract the pattern of salient 3D shape features,we propose a new descriptor-3D Shape Context,for 3D hand gesture representation.The 3D Shape Context information of each 3D point is obtained in multiple scales because both local shape context and global shape distribution are necessary for recognition.The description of all the 3D points constructs the hand gesture representation,and hand gesture recognition is explored via dynamic time warping algorithm.Extensive experiments are conducted on multiple benchmark datasets.The experimental results verify that the proposed method is robust to noise,articulated variations,and rigid transformations.Our method outperforms state-of-the-art methods in the comparisons of accuracy and efficiency.展开更多
In this article,to reduce the complexity and improve the generalization ability of current gesture recognition systems,we propose a novel SE-CNN attention architecture for sEMG-based hand gesture recognition.The propo...In this article,to reduce the complexity and improve the generalization ability of current gesture recognition systems,we propose a novel SE-CNN attention architecture for sEMG-based hand gesture recognition.The proposed algorithm introduces a temporal squeeze-and-excite block into a simple CNN architecture and then utilizes it to recalibrate the weights of the feature outputs from the convolutional layer.By enhancing important features while suppressing useless ones,the model realizes gesture recognition efficiently.The last procedure of the proposed algorithm is utilizing a simple attention mechanism to enhance the learned representations of sEMG signals to performmulti-channel sEMG-based gesture recognition tasks.To evaluate the effectiveness and accuracy of the proposed algorithm,we conduct experiments involving multi-gesture datasets Ninapro DB4 and Ninapro DB5 for both inter-session validation and subject-wise cross-validation.After a series of comparisons with the previous models,the proposed algorithm effectively increases the robustness with improved gesture recognition performance and generalization ability.展开更多
Gesture recognition is used in many practical applications such as human-robot interaction, medical rehabilitation and sign language. With increasing motion sensor development, multiple data sources have become availa...Gesture recognition is used in many practical applications such as human-robot interaction, medical rehabilitation and sign language. With increasing motion sensor development, multiple data sources have become available, which leads to the rise of multi-modal gesture recognition. Since our previous approach to gesture recognition depends on a unimodal system, it is difficult to classify similar motion patterns. In order to solve this problem, a novel approach which integrates motion, audio and video models is proposed by using dataset captured by Kinect. The proposed system can recognize observed gestures by using three models. Recognition results of three models are integrated by using the proposed framework and the output becomes the final result. The motion and audio models are learned by using Hidden Markov Model. Random Forest which is the video classifier is used to learn the video model. In the experiments to test the performances of the proposed system, the motion and audio models most suitable for gesture recognition are chosen by varying feature vectors and learning methods. Additionally, the unimodal and multi-modal models are compared with respect to recognition accuracy. All the experiments are conducted on dataset provided by the competition organizer of MMGRC, which is a workshop for Multi-Modal Gesture Recognition Challenge. The comparison results show that the multi-modal model composed of three models scores the highest recognition rate. This improvement of recognition accuracy means that the complementary relationship among three models improves the accuracy of gesture recognition. The proposed system provides the application technology to understand human actions of daily life more precisely.展开更多
Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream.Many researchers have been working on visionbase...Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream.Many researchers have been working on visionbased gesture recognition due to its various applications.This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network(3D-CNN)and a Long Short-Term Memory(LSTM)network.The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation.The 3D-CNN is used for the extraction of spectral and spatial features which are then given to the LSTM network through which classification is carried out.The proposed model is a light-weight architecture with only 3.7 million training parameters.The model has been evaluated on 15 classes from the 20BN-jester dataset available publicly.The model was trained on 2000 video-clips per class which were separated into 80%training and 20%validation sets.An accuracy of 99%and 97%was achieved on training and testing data,respectively.We further show that the combination of 3D-CNN with LSTM gives superior results as compared to MobileNetv2+LSTM.展开更多
In this study,we developed a system based on deep space–time neural networks for gesture recognition.When users change or the number of gesture categories increases,the accuracy of gesture recognition decreases consi...In this study,we developed a system based on deep space–time neural networks for gesture recognition.When users change or the number of gesture categories increases,the accuracy of gesture recognition decreases considerably because most gesture recognition systems cannot accommodate both user differentiation and gesture diversity.To overcome the limitations of existing methods,we designed a onedimensional parallel long short-term memory–fully convolutional network(LSTM–FCN)model to extract gesture features of different dimensions.LSTM can learn complex time dynamic information,whereas FCN can predict gestures efficiently by extracting the deep,abstract features of gestures in the spatial dimension.In the experiment,50 types of gestures of five users were collected and evaluated.The experimental results demonstrate the effectiveness of this system and robustness to various gestures and individual changes.Statistical analysis of the recognition results indicated that an average accuracy of approximately 98.9% was achieved.展开更多
The Hand Gestures Recognition(HGR)System can be employed to facilitate communication between humans and computers instead of using special input and output devices.These devices may complicate communication with compu...The Hand Gestures Recognition(HGR)System can be employed to facilitate communication between humans and computers instead of using special input and output devices.These devices may complicate communication with computers especially for people with disabilities.Hand gestures can be defined as a natural human-to-human communication method,which also can be used in human-computer interaction.Many researchers developed various techniques and methods that aimed to understand and recognize specific hand gestures by employing one or two machine learning algorithms with a reasonable accuracy.Thiswork aims to develop a powerful hand gesture recognition model with a 100%recognition rate.We proposed an ensemble classification model that combines the most powerful machine learning classifiers to obtain diversity and improve accuracy.The majority voting method was used to aggregate accuracies produced by each classifier and get the final classification result.Our model was trained using a self-constructed dataset containing 1600 images of ten different hand gestures.The employing of canny’s edge detector and histogram of oriented gradient method was a great combination with the ensemble classifier and the recognition rate.The experimental results had shown the robustness of our proposed model.Logistic Regression and Support Vector Machine have achieved 100%accuracy.The developed model was validated using two public datasets,and the findings have proved that our model outperformed other compared studies.展开更多
Device-free gesture recognition is an emerging wireless sensing technique which could recognize gestures by analyzing its influence on surrounding wireless signals,it may empower wireless networks with the augmented s...Device-free gesture recognition is an emerging wireless sensing technique which could recognize gestures by analyzing its influence on surrounding wireless signals,it may empower wireless networks with the augmented sensing ability.Researchers have made great achievements for singleperson device-free gesture recognition.However,when multiple persons conduct gestures simultaneously,the received signals will be mixed together,and thus traditional methods would not work well anymore.Moreover,the anonymity of persons and the change in the surrounding environment would cause feature shift and mismatch,and thus the recognition accuracy would degrade remarkably.To address these problems,we explore and exploit the diversity of spatial information and propose a multidimensional analysis method to separate the gesture feature of each person using a focusing sensing strategy.Meanwhile,we also present a deep-learning based robust device free gesture recognition framework,which leverages an adversarial approach to extract robust gesture feature that is insensitive to the change of persons and environment.Furthermore,we also develop a 77GHz mmWave prototype system and evaluate the proposed methods extensively.Experimental results reveal that the proposed system can achieve average accuracies of 93%and 84%when 10 gestures are conducted in Received:Jun.18,2020 Revised:Aug.06,2020 Editor:Ning Ge different environments by two and four persons simultaneously,respectively.展开更多
Machine learning is a technique for analyzing data that aids the construction of mathematical models.Because of the growth of the Internet of Things(IoT)and wearable sensor devices,gesture interfaces are becoming a mo...Machine learning is a technique for analyzing data that aids the construction of mathematical models.Because of the growth of the Internet of Things(IoT)and wearable sensor devices,gesture interfaces are becoming a more natural and expedient human-machine interaction method.This type of artificial intelligence that requires minimal or no direct human intervention in decision-making is predicated on the ability of intelligent systems to self-train and detect patterns.The rise of touch-free applications and the number of deaf people have increased the significance of hand gesture recognition.Potential applications of hand gesture recognition research span from online gaming to surgical robotics.The location of the hands,the alignment of the fingers,and the hand-to-body posture are the fundamental components of hierarchical emotions in gestures.Linguistic gestures may be difficult to distinguish from nonsensical motions in the field of gesture recognition.Linguistic gestures may be difficult to distinguish from nonsensical motions in the field of gesture recognition.In this scenario,it may be difficult to overcome segmentation uncertainty caused by accidental hand motions or trembling.When a user performs the same dynamic gesture,the hand shapes and speeds of each user,as well as those often generated by the same user,vary.A machine-learning-based Gesture Recognition Framework(ML-GRF)for recognizing the beginning and end of a gesture sequence in a continuous stream of data is suggested to solve the problem of distinguishing between meaningful dynamic gestures and scattered generation.We have recommended using a similarity matching-based gesture classification approach to reduce the overall computing cost associated with identifying actions,and we have shown how an efficient feature extraction method can be used to reduce the thousands of single gesture information to four binary digit gesture codes.The findings from the simulation support the accuracy,precision,gesture recognition,sensitivity,and efficiency rates.The Machine Learning-based Gesture Recognition Framework(ML-GRF)had an accuracy rate of 98.97%,a precision rate of 97.65%,a gesture recognition rate of 98.04%,a sensitivity rate of 96.99%,and an efficiency rate of 95.12%.展开更多
Aiming at the diversity of hand gesture traces by different people,the article presents novel method called cluster dynamic time warping( CDTW),which is based on the main axis classification and sample clustering of i...Aiming at the diversity of hand gesture traces by different people,the article presents novel method called cluster dynamic time warping( CDTW),which is based on the main axis classification and sample clustering of individuals. This method shows good performance on reducing the complexity of recognition and strong robustness of individuals. Data acquisition is implemented on a triaxial accelerometer with 100 Hz sampling frequency. A database of 2400 traces was created by ten subjects for the system testing and evaluation. The overall accuracy was found to be 98. 84% for user independent gesture recognition and 96. 7% for user dependent gesture recognition,higher than dynamic time warping( DTW),derivative DTW( DDTW) and piecewise DTW( PDTW) methods.Computation cost of CDTW in this project has been reduced 11 520 times compared with DTW.展开更多
This paper addresses the application of hand gesture recognition in monocular image sequences using Active Appearance Model (AAM), For this work, the proposed algorithm is composed of constricting AAMs and fitting t...This paper addresses the application of hand gesture recognition in monocular image sequences using Active Appearance Model (AAM), For this work, the proposed algorithm is composed of constricting AAMs and fitting the models to the interest region. In training stage, according to the manual labeled feature points, the relative AAM is constructed and the corresponding average feature is obtained. In recognition stage, the interesting hand gesture region is firstly segmented by skin and movement cues. Secondly, the models are fitted to the image that includes the hand gesture, and the relative features are extracted. Thirdly, the classification is done by comparing the extracted features and average features. 30 different gestures of Chinese sign language are applied for testing the effectiveness of the method. The Experimental results are given indicating good performance of the algorithm.展开更多
The trained Gaussian mixture model is used to make skincolour segmentation for the input image sequences. The hand gesture region is extracted, and the relative normalization images are obtained by interpolation opera...The trained Gaussian mixture model is used to make skincolour segmentation for the input image sequences. The hand gesture region is extracted, and the relative normalization images are obtained by interpolation operation. To solve the proem of hand gesture recognition, Fuzzy-Rough based nearest neighbour(RNN) algorithm is applied for classification. For avoiding the costly compute, an improved nearest neighbour classification algorithm based on fuzzy-rough set theory (FRNNC) is proposed. The algorithm employs the represented cluster points instead of the whole training samples, and takes the hand gesture data's fuzziness and the roughness into account, so the campute spending is decreased and the recognition rate is increased. The 30 gestures in Chinese sign language alphabet are used for approving the effectiveness of the proposed algorithm. The recognition rate is 94.96%, which is better than that of KNN (K nearest neighbor)and Fuzzy- KNN (Fuzzy K nearest neighbor).展开更多
This paper introduces a human gesture recognition algorithm using an impulse radio ultra-wide- band (IR-UWB) radar sensor. Human gesture recognition has been one of the hottest research topics for quite a long time. M...This paper introduces a human gesture recognition algorithm using an impulse radio ultra-wide- band (IR-UWB) radar sensor. Human gesture recognition has been one of the hottest research topics for quite a long time. Many gesture recognition algorithms or systems using other sensors have been proposed such as using cameras, RFID tags and so on. Among which gesture recognition systems using cameras have been extensively studied in past years and widely used in practical. While it might show some deficiencies in some cases. For example, the users might not like to be filmed by cameras considering their privacies. Besides, it might not work well in very dark environments. While RFID tags could be inconvenient to many people and are likely to be lost. Our gesture recognition algorithm uses IR-UWB radar sensor which has pretty high resolution in ranging and adjustable gesture recognition range, meanwhile, does not have problems in privacy issues or darkness. In this paper, the gesture recognition algorithm is based on the moving direction and distance change of the human hand and the change of the frontal surface area of hand towards radar sensor. By combining these changes while doing gestures, the algorithm may recognize basically 6 kinds of hand gestures. The experimental results show that these gestures are of quite good performance. The performance analysis from experiments is also given.展开更多
基金This research work was supported by the National Research Foundation of Korea(NRF)grant funded by the Korean government(MSIT)(NRF-2022R1A2C1004657).
文摘Hand Gesture Recognition(HGR)is a promising research area with an extensive range of applications,such as surgery,video game techniques,and sign language translation,where sign language is a complicated structured form of hand gestures.The fundamental building blocks of structured expressions in sign language are the arrangement of the fingers,the orientation of the hand,and the hand’s position concerning the body.The importance of HGR has increased due to the increasing number of touchless applications and the rapid growth of the hearing-impaired population.Therefore,real-time HGR is one of the most effective interaction methods between computers and humans.Developing a user-free interface with good recognition performance should be the goal of real-time HGR systems.Nowadays,Convolutional Neural Network(CNN)shows great recognition rates for different image-level classification tasks.It is challenging to train deep CNN networks like VGG-16,VGG-19,Inception-v3,and Efficientnet-B0 from scratch because only some significant labeled image datasets are available for static hand gesture images.However,an efficient and robust hand gesture recognition system of sign language employing finetuned Inception-v3 and Efficientnet-Bo network is proposed to identify hand gestures using a comparative small HGR dataset.Experiments show that Inception-v3 achieved 90%accuracy and 0.93%precision,0.91%recall,and 0.90%f1-score,respectively,while EfficientNet-B0 achieved 99%accuracy and 0.98%,0.97%,0.98%,precision,recall,and f1-score respectively.
文摘Gesture detection is the primary and most significant step for sign language detection and sign language is the communication medium for people with speaking and hearing disabilities. This paper presents a novel method for dynamic hand gesture detection using Hidden Markov Models (HMMs) where we detect different English alphabet letters by tracing hand movements. The process involves skin color-based segmentation for hand isolation in video frames, followed by morphological operations to enhance image trajectories. Our system employs hand tracking and trajectory smoothing techniques, such as the Kalman filter, to monitor hand movements and refine gesture paths. Quantized sequences are then analyzed using the Baum-Welch Re-estimation Algorithm, an HMM-based approach. A maximum likelihood classifier is used to identify the most probable letter from the test sequences. Our method demonstrates significant improvements over traditional recognition techniques in real-time, automatic hand gesture recognition, particularly in its ability to distinguish complex gestures. The experimental results confirm the effectiveness of our approach in enhancing gesture-based sign language detection to alleviate the barrier between the deaf and hard-of-hearing community and general people.
文摘In the digital age,non-touch communication technologies are reshaping human-device interactions and raising security concerns.A major challenge in current technology is the misinterpretation of gestures by sensors and cameras,often caused by environmental factors.This issue has spurred the need for advanced data processing methods to achieve more accurate gesture recognition and predictions.Our study presents a novel virtual keyboard allowing character input via distinct hand gestures,focusing on two key aspects:hand gesture recognition and character input mechanisms.We developed a novel model with LSTM and fully connected layers for enhanced sequential data processing and hand gesture recognition.We also integrated CNN,max-pooling,and dropout layers for improved spatial feature extraction.This model architecture processes both temporal and spatial aspects of hand gestures,using LSTM to extract complex patterns from frame sequences for a comprehensive understanding of input data.Our unique dataset,essential for training the model,includes 1,662 landmarks from dynamic hand gestures,33 postures,and 468 face landmarks,all captured in real-time using advanced pose estimation.The model demonstrated high accuracy,achieving 98.52%in hand gesture recognition and over 97%in character input across different scenarios.Its excellent performance in real-time testing underlines its practicality and effectiveness,marking a significant advancement in enhancing human-device interactions in the digital age.
基金supported by the National Natural Science Foundation of China(No.12172076)。
文摘Gesture recognition plays an increasingly important role as the requirements of intelligent systems for human-computer interaction methods increase.To improve the accuracy of the millimeter-wave radar gesture detection algorithm with limited computational resources,this study improves the detection performance in terms of optimized features and interference filtering.The accuracy of the algorithm is improved by refining the combination of gesture features using a self-constructed dataset,and biometric filtering is introduced to reduce the interference of inanimate object motion.Finally,experiments demonstrate the effectiveness of the proposed algorithm in both mitigating interference from inanimate objects and accurately recognizing gestures.Results show a notable 93.29%average reduction in false detections achieved through the integration of biometric filtering into the algorithm’s interpretation of target movements.Additionally,the algorithm adeptly identifies the six gestures with an average accuracy of 96.84%on embedded systems.
基金the National Natural Science Foundation of China under Grant No.62072255.
文摘Recognition of human gesture actions is a challenging issue due to the complex patterns in both visual andskeletal features. Existing gesture action recognition (GAR) methods typically analyze visual and skeletal data,failing to meet the demands of various scenarios. Furthermore, multi-modal approaches lack the versatility toefficiently process both uniformand disparate input patterns.Thus, in this paper, an attention-enhanced pseudo-3Dresidual model is proposed to address the GAR problem, called HgaNets. This model comprises two independentcomponents designed formodeling visual RGB (red, green and blue) images and 3Dskeletal heatmaps, respectively.More specifically, each component consists of two main parts: 1) a multi-dimensional attention module forcapturing important spatial, temporal and feature information in human gestures;2) a spatiotemporal convolutionmodule that utilizes pseudo-3D residual convolution to characterize spatiotemporal features of gestures. Then,the output weights of the two components are fused to generate the recognition results. Finally, we conductedexperiments on four datasets to assess the efficiency of the proposed model. The results show that the accuracy onfour datasets reaches 85.40%, 91.91%, 94.70%, and 95.30%, respectively, as well as the inference time is 0.54 s andthe parameters is 2.74M. These findings highlight that the proposed model outperforms other existing approachesin terms of recognition accuracy.
文摘With technology advances and human requirements increasing, human-computer interaction plays an important role in our daily lives. Among these interactions, gesture-based recognition offers a natural and intuitive user experience that does not require physical contact and is becoming increasingly prevalent across various fields. Gesture recognition systems based on Frequency Modulated Continuous Wave (FMCW) millimeter-wave radar are receiving widespread attention due to their ability to operate without wearable sensors, their robustness to environmental factors, and the excellent penetrative ability of radar signals. This paper first reviews the current main gesture recognition applications. Subsequently, we introduce the system of gesture recognition based on FMCW radar and provide a general framework for gesture recognition, including gesture data acquisition, data preprocessing, and classification methods. We then discuss typical applications of gesture recognition systems and summarize the performance of these systems in terms of experimental environment, signal acquisition, signal processing, and classification methods. Specifically, we focus our study on four typical gesture recognition systems, including air-writing recognition, gesture command recognition, sign language recognition, and text input recognition. Finally, this paper addresses the challenges and unresolved problems in FMCW radar-based gesture recognition and provides insights into potential future research directions.
文摘The use of hand gestures can be the most intuitive human-machine interaction medium.The early approaches for hand gesture recognition used device-based methods.These methods use mechanical or optical sensors attached to a glove or markers,which hinder the natural human-machine communication.On the other hand,vision-based methods are less restrictive and allow for a more spontaneous communication without the need of an intermediary between human and machine.Therefore,vision gesture recognition has been a popular area of research for the past thirty years.Hand gesture recognition finds its application in many areas,particularly the automotive industry where advanced automotive human-machine interface(HMI)designers are using gesture recognition to improve driver and vehicle safety.However,technology advances go beyond active/passive safety and into convenience and comfort.In this context,one of America’s big three automakers has partnered with the Centre of Pattern Analysis and Machine Intelligence(CPAMI)at the University of Waterloo to investigate expanding their product segment through machine learning to provide an increased driver convenience and comfort with the particular application of hand gesture recognition for autonomous car parking.The present paper leverages the state-of-the-art deep learning and optimization techniques to develop a vision-based multiview dynamic hand gesture recognizer for a self-parking system.We propose a 3D-CNN gesture model architecture that we train on a publicly available hand gesture database.We apply transfer learning methods to fine-tune the pre-trained gesture model on custom-made data,which significantly improves the proposed system performance in a real world environment.We adapt the architecture of end-to-end solution to expand the state-of-the-art video classifier from a single image as input(fed by monocular camera)to a Multiview 360 feed,offered by a six cameras module.Finally,we optimize the proposed solution to work on a limited resource embedded platform(Nvidia Jetson TX2)that is used by automakers for vehicle-based features,without sacrificing the accuracy robustness and real time functionality of the system.
文摘Hand gestures are a natural way for human-robot interaction.Vision based dynamic hand gesture recognition has become a hot research topic due to its various applications.This paper presents a novel deep learning network for hand gesture recognition.The network integrates several well-proved modules together to learn both short-term and long-term features from video inputs and meanwhile avoid intensive computation.To learn short-term features,each video input is segmented into a fixed number of frame groups.A frame is randomly selected from each group and represented as an RGB image as well as an optical flow snapshot.These two entities are fused and fed into a convolutional neural network(Conv Net)for feature extraction.The Conv Nets for all groups share parameters.To learn longterm features,outputs from all Conv Nets are fed into a long short-term memory(LSTM)network,by which a final classification result is predicted.The new model has been tested with two popular hand gesture datasets,namely the Jester dataset and Nvidia dataset.Comparing with other models,our model produced very competitive results.The robustness of the new model has also been proved with an augmented dataset with enhanced diversity of hand gestures.
基金supported by the National Natural Science Foundation of China(61773272,61976191)the Six Talent Peaks Project of Jiangsu Province,China(XYDXX-053)Suzhou Research Project of Technical Innovation,Jiangsu,China(SYG201711)。
文摘Hand gesture recognition is a popular topic in computer vision and makes human-computer interaction more flexible and convenient.The representation of hand gestures is critical for recognition.In this paper,we propose a new method to measure the similarity between hand gestures and exploit it for hand gesture recognition.The depth maps of hand gestures captured via the Kinect sensors are used in our method,where the 3D hand shapes can be segmented from the cluttered backgrounds.To extract the pattern of salient 3D shape features,we propose a new descriptor-3D Shape Context,for 3D hand gesture representation.The 3D Shape Context information of each 3D point is obtained in multiple scales because both local shape context and global shape distribution are necessary for recognition.The description of all the 3D points constructs the hand gesture representation,and hand gesture recognition is explored via dynamic time warping algorithm.Extensive experiments are conducted on multiple benchmark datasets.The experimental results verify that the proposed method is robust to noise,articulated variations,and rigid transformations.Our method outperforms state-of-the-art methods in the comparisons of accuracy and efficiency.
基金funded by the National Key Research and Development Program of China(2017YFB1303200)NSFC(81871444,62071241,62075098,and 62001240)+1 种基金Leading-Edge Technology and Basic Research Program of Jiangsu(BK20192004D)Jiangsu Graduate Scientific Research Innovation Programme(KYCX20_1391,KYCX21_1557).
文摘In this article,to reduce the complexity and improve the generalization ability of current gesture recognition systems,we propose a novel SE-CNN attention architecture for sEMG-based hand gesture recognition.The proposed algorithm introduces a temporal squeeze-and-excite block into a simple CNN architecture and then utilizes it to recalibrate the weights of the feature outputs from the convolutional layer.By enhancing important features while suppressing useless ones,the model realizes gesture recognition efficiently.The last procedure of the proposed algorithm is utilizing a simple attention mechanism to enhance the learned representations of sEMG signals to performmulti-channel sEMG-based gesture recognition tasks.To evaluate the effectiveness and accuracy of the proposed algorithm,we conduct experiments involving multi-gesture datasets Ninapro DB4 and Ninapro DB5 for both inter-session validation and subject-wise cross-validation.After a series of comparisons with the previous models,the proposed algorithm effectively increases the robustness with improved gesture recognition performance and generalization ability.
基金Supported by Grant-in-Aid for Young Scientists(A)(Grant No.26700021)Japan Society for the Promotion of Science and Strategic Information and Communications R&D Promotion Programme(Grant No.142103011)Ministry of Internal Affairs and Communications
文摘Gesture recognition is used in many practical applications such as human-robot interaction, medical rehabilitation and sign language. With increasing motion sensor development, multiple data sources have become available, which leads to the rise of multi-modal gesture recognition. Since our previous approach to gesture recognition depends on a unimodal system, it is difficult to classify similar motion patterns. In order to solve this problem, a novel approach which integrates motion, audio and video models is proposed by using dataset captured by Kinect. The proposed system can recognize observed gestures by using three models. Recognition results of three models are integrated by using the proposed framework and the output becomes the final result. The motion and audio models are learned by using Hidden Markov Model. Random Forest which is the video classifier is used to learn the video model. In the experiments to test the performances of the proposed system, the motion and audio models most suitable for gesture recognition are chosen by varying feature vectors and learning methods. Additionally, the unimodal and multi-modal models are compared with respect to recognition accuracy. All the experiments are conducted on dataset provided by the competition organizer of MMGRC, which is a workshop for Multi-Modal Gesture Recognition Challenge. The comparison results show that the multi-modal model composed of three models scores the highest recognition rate. This improvement of recognition accuracy means that the complementary relationship among three models improves the accuracy of gesture recognition. The proposed system provides the application technology to understand human actions of daily life more precisely.
文摘Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream.Many researchers have been working on visionbased gesture recognition due to its various applications.This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network(3D-CNN)and a Long Short-Term Memory(LSTM)network.The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation.The 3D-CNN is used for the extraction of spectral and spatial features which are then given to the LSTM network through which classification is carried out.The proposed model is a light-weight architecture with only 3.7 million training parameters.The model has been evaluated on 15 classes from the 20BN-jester dataset available publicly.The model was trained on 2000 video-clips per class which were separated into 80%training and 20%validation sets.An accuracy of 99%and 97%was achieved on training and testing data,respectively.We further show that the combination of 3D-CNN with LSTM gives superior results as compared to MobileNetv2+LSTM.
基金supported in part by the National Natural Science Foundation of China under Grant 61461013in part of the Natural Science Foundation of Guangxi Province under Grant 2018GXNSFAA281179in part of the Dean Project of Guangxi Key Laboratory of Wireless Broadband Communication and Signal Processing under Grant GXKL06160103.
文摘In this study,we developed a system based on deep space–time neural networks for gesture recognition.When users change or the number of gesture categories increases,the accuracy of gesture recognition decreases considerably because most gesture recognition systems cannot accommodate both user differentiation and gesture diversity.To overcome the limitations of existing methods,we designed a onedimensional parallel long short-term memory–fully convolutional network(LSTM–FCN)model to extract gesture features of different dimensions.LSTM can learn complex time dynamic information,whereas FCN can predict gestures efficiently by extracting the deep,abstract features of gestures in the spatial dimension.In the experiment,50 types of gestures of five users were collected and evaluated.The experimental results demonstrate the effectiveness of this system and robustness to various gestures and individual changes.Statistical analysis of the recognition results indicated that an average accuracy of approximately 98.9% was achieved.
文摘The Hand Gestures Recognition(HGR)System can be employed to facilitate communication between humans and computers instead of using special input and output devices.These devices may complicate communication with computers especially for people with disabilities.Hand gestures can be defined as a natural human-to-human communication method,which also can be used in human-computer interaction.Many researchers developed various techniques and methods that aimed to understand and recognize specific hand gestures by employing one or two machine learning algorithms with a reasonable accuracy.Thiswork aims to develop a powerful hand gesture recognition model with a 100%recognition rate.We proposed an ensemble classification model that combines the most powerful machine learning classifiers to obtain diversity and improve accuracy.The majority voting method was used to aggregate accuracies produced by each classifier and get the final classification result.Our model was trained using a self-constructed dataset containing 1600 images of ten different hand gestures.The employing of canny’s edge detector and histogram of oriented gradient method was a great combination with the ensemble classifier and the recognition rate.The experimental results had shown the robustness of our proposed model.Logistic Regression and Support Vector Machine have achieved 100%accuracy.The developed model was validated using two public datasets,and the findings have proved that our model outperformed other compared studies.
基金This work was supported by National Natural Science Foundation of China under grants U1933104 and 62071081LiaoNing Revitalization Talents Program under grant XLYC1807019,Liaoning Province Natural Science Foundation under grants 2019-MS-058+1 种基金Dalian Science and Technology Innovation Foundation under grant 2018J12GX044Fundamental Research Funds for the Central Universities under grants DUT20LAB113 and DUT20JC07,and Cooperative Scientific Research Project of Chunhui Plan of Ministry of Education.
文摘Device-free gesture recognition is an emerging wireless sensing technique which could recognize gestures by analyzing its influence on surrounding wireless signals,it may empower wireless networks with the augmented sensing ability.Researchers have made great achievements for singleperson device-free gesture recognition.However,when multiple persons conduct gestures simultaneously,the received signals will be mixed together,and thus traditional methods would not work well anymore.Moreover,the anonymity of persons and the change in the surrounding environment would cause feature shift and mismatch,and thus the recognition accuracy would degrade remarkably.To address these problems,we explore and exploit the diversity of spatial information and propose a multidimensional analysis method to separate the gesture feature of each person using a focusing sensing strategy.Meanwhile,we also present a deep-learning based robust device free gesture recognition framework,which leverages an adversarial approach to extract robust gesture feature that is insensitive to the change of persons and environment.Furthermore,we also develop a 77GHz mmWave prototype system and evaluate the proposed methods extensively.Experimental results reveal that the proposed system can achieve average accuracies of 93%and 84%when 10 gestures are conducted in Received:Jun.18,2020 Revised:Aug.06,2020 Editor:Ning Ge different environments by two and four persons simultaneously,respectively.
文摘Machine learning is a technique for analyzing data that aids the construction of mathematical models.Because of the growth of the Internet of Things(IoT)and wearable sensor devices,gesture interfaces are becoming a more natural and expedient human-machine interaction method.This type of artificial intelligence that requires minimal or no direct human intervention in decision-making is predicated on the ability of intelligent systems to self-train and detect patterns.The rise of touch-free applications and the number of deaf people have increased the significance of hand gesture recognition.Potential applications of hand gesture recognition research span from online gaming to surgical robotics.The location of the hands,the alignment of the fingers,and the hand-to-body posture are the fundamental components of hierarchical emotions in gestures.Linguistic gestures may be difficult to distinguish from nonsensical motions in the field of gesture recognition.Linguistic gestures may be difficult to distinguish from nonsensical motions in the field of gesture recognition.In this scenario,it may be difficult to overcome segmentation uncertainty caused by accidental hand motions or trembling.When a user performs the same dynamic gesture,the hand shapes and speeds of each user,as well as those often generated by the same user,vary.A machine-learning-based Gesture Recognition Framework(ML-GRF)for recognizing the beginning and end of a gesture sequence in a continuous stream of data is suggested to solve the problem of distinguishing between meaningful dynamic gestures and scattered generation.We have recommended using a similarity matching-based gesture classification approach to reduce the overall computing cost associated with identifying actions,and we have shown how an efficient feature extraction method can be used to reduce the thousands of single gesture information to four binary digit gesture codes.The findings from the simulation support the accuracy,precision,gesture recognition,sensitivity,and efficiency rates.The Machine Learning-based Gesture Recognition Framework(ML-GRF)had an accuracy rate of 98.97%,a precision rate of 97.65%,a gesture recognition rate of 98.04%,a sensitivity rate of 96.99%,and an efficiency rate of 95.12%.
基金National Key R&D Program of China(No.2016YFB1001401)
文摘Aiming at the diversity of hand gesture traces by different people,the article presents novel method called cluster dynamic time warping( CDTW),which is based on the main axis classification and sample clustering of individuals. This method shows good performance on reducing the complexity of recognition and strong robustness of individuals. Data acquisition is implemented on a triaxial accelerometer with 100 Hz sampling frequency. A database of 2400 traces was created by ten subjects for the system testing and evaluation. The overall accuracy was found to be 98. 84% for user independent gesture recognition and 96. 7% for user dependent gesture recognition,higher than dynamic time warping( DTW),derivative DTW( DDTW) and piecewise DTW( PDTW) methods.Computation cost of CDTW in this project has been reduced 11 520 times compared with DTW.
文摘This paper addresses the application of hand gesture recognition in monocular image sequences using Active Appearance Model (AAM), For this work, the proposed algorithm is composed of constricting AAMs and fitting the models to the interest region. In training stage, according to the manual labeled feature points, the relative AAM is constructed and the corresponding average feature is obtained. In recognition stage, the interesting hand gesture region is firstly segmented by skin and movement cues. Secondly, the models are fitted to the image that includes the hand gesture, and the relative features are extracted. Thirdly, the classification is done by comparing the extracted features and average features. 30 different gestures of Chinese sign language are applied for testing the effectiveness of the method. The Experimental results are given indicating good performance of the algorithm.
文摘The trained Gaussian mixture model is used to make skincolour segmentation for the input image sequences. The hand gesture region is extracted, and the relative normalization images are obtained by interpolation operation. To solve the proem of hand gesture recognition, Fuzzy-Rough based nearest neighbour(RNN) algorithm is applied for classification. For avoiding the costly compute, an improved nearest neighbour classification algorithm based on fuzzy-rough set theory (FRNNC) is proposed. The algorithm employs the represented cluster points instead of the whole training samples, and takes the hand gesture data's fuzziness and the roughness into account, so the campute spending is decreased and the recognition rate is increased. The 30 gestures in Chinese sign language alphabet are used for approving the effectiveness of the proposed algorithm. The recognition rate is 94.96%, which is better than that of KNN (K nearest neighbor)and Fuzzy- KNN (Fuzzy K nearest neighbor).
文摘This paper introduces a human gesture recognition algorithm using an impulse radio ultra-wide- band (IR-UWB) radar sensor. Human gesture recognition has been one of the hottest research topics for quite a long time. Many gesture recognition algorithms or systems using other sensors have been proposed such as using cameras, RFID tags and so on. Among which gesture recognition systems using cameras have been extensively studied in past years and widely used in practical. While it might show some deficiencies in some cases. For example, the users might not like to be filmed by cameras considering their privacies. Besides, it might not work well in very dark environments. While RFID tags could be inconvenient to many people and are likely to be lost. Our gesture recognition algorithm uses IR-UWB radar sensor which has pretty high resolution in ranging and adjustable gesture recognition range, meanwhile, does not have problems in privacy issues or darkness. In this paper, the gesture recognition algorithm is based on the moving direction and distance change of the human hand and the change of the frontal surface area of hand towards radar sensor. By combining these changes while doing gestures, the algorithm may recognize basically 6 kinds of hand gestures. The experimental results show that these gestures are of quite good performance. The performance analysis from experiments is also given.