Hand gesture recognition has become a vital subject in the fields of human-computer interaction and rehabilitation assessment.This paper presents a multi-modal fusion for hand gesture recognition(MFHG)model,which uses...Hand gesture recognition has become a vital subject in the fields of human-computer interaction and rehabilitation assessment.This paper presents a multi-modal fusion for hand gesture recognition(MFHG)model,which uses two heterogeneous networks to extract and fuse the features of the vision-based motion signals and the surface electromyography(s EMG)signals,respectively.To extract the features of the vision-based motion signals,a graph neural network,named the cumulation graph attention(CGAT)model,is first proposed to characterize the prior knowledge of motion coupling between finger joints.The CGAT model uses the cumulation mechanism to combine the early and late extracted features to improve motion-based hand gesture recognition.For the s EMG signals,a time-frequency convolutional neural network model,named TF-CNN,is proposed to extract both the signals'time-domain and frequency-domain information.To improve the performance of hand gesture recognition,the deep features from multiple modes are merged with an average layer,and then the regularization items containing center loss and the mutual information loss are employed to enhance the robustness of this multi-modal system.Finally,a data set containing the multi-modal signals from seven subjects on different days is built to verify the performance of the multi-modal model.The experimental results indicate that the MFHG can reach 99.96%and 92.46%accuracy on hand gesture recognition in the cases of within-session and cross-day,respectively.展开更多
基金supported by the National Key Research&Development Program of China(Grant No.2022YFB4703204)the Project for Young Scientists in Basic Research of Chinese Academy of Sciences(Grant No.YSBR-034)。
文摘Hand gesture recognition has become a vital subject in the fields of human-computer interaction and rehabilitation assessment.This paper presents a multi-modal fusion for hand gesture recognition(MFHG)model,which uses two heterogeneous networks to extract and fuse the features of the vision-based motion signals and the surface electromyography(s EMG)signals,respectively.To extract the features of the vision-based motion signals,a graph neural network,named the cumulation graph attention(CGAT)model,is first proposed to characterize the prior knowledge of motion coupling between finger joints.The CGAT model uses the cumulation mechanism to combine the early and late extracted features to improve motion-based hand gesture recognition.For the s EMG signals,a time-frequency convolutional neural network model,named TF-CNN,is proposed to extract both the signals'time-domain and frequency-domain information.To improve the performance of hand gesture recognition,the deep features from multiple modes are merged with an average layer,and then the regularization items containing center loss and the mutual information loss are employed to enhance the robustness of this multi-modal system.Finally,a data set containing the multi-modal signals from seven subjects on different days is built to verify the performance of the multi-modal model.The experimental results indicate that the MFHG can reach 99.96%and 92.46%accuracy on hand gesture recognition in the cases of within-session and cross-day,respectively.