Sign language,a visual-gestural language used by the deaf and hard-of-hearing community,plays a crucial role in facilitating communication and promoting inclusivity.Sign language recognition(SLR),the process of automa...Sign language,a visual-gestural language used by the deaf and hard-of-hearing community,plays a crucial role in facilitating communication and promoting inclusivity.Sign language recognition(SLR),the process of automatically recognizing and interpreting sign language gestures,has gained significant attention in recent years due to its potential to bridge the communication gap between the hearing impaired and the hearing world.The emergence and continuous development of deep learning techniques have provided inspiration and momentum for advancing SLR.This paper presents a comprehensive and up-to-date analysis of the advancements,challenges,and opportunities in deep learning-based sign language recognition,focusing on the past five years of research.We explore various aspects of SLR,including sign data acquisition technologies,sign language datasets,evaluation methods,and different types of neural networks.Convolutional Neural Networks(CNN)and Recurrent Neural Networks(RNN)have shown promising results in fingerspelling and isolated sign recognition.However,the continuous nature of sign language poses challenges,leading to the exploration of advanced neural network models such as the Transformer model for continuous sign language recognition(CSLR).Despite significant advancements,several challenges remain in the field of SLR.These challenges include expanding sign language datasets,achieving user independence in recognition systems,exploring different input modalities,effectively fusing features,modeling co-articulation,and improving semantic and syntactic understanding.Additionally,developing lightweight network architectures for mobile applications is crucial for practical implementation.By addressing these challenges,we can further advance the field of deep learning for sign language recognition and improve communication for the hearing-impaired community.展开更多
Sign language recognition is vital for enhancing communication accessibility among the Deaf and hard-of-hearing communities.In Japan,approximately 360,000 individualswith hearing and speech disabilities rely on Japane...Sign language recognition is vital for enhancing communication accessibility among the Deaf and hard-of-hearing communities.In Japan,approximately 360,000 individualswith hearing and speech disabilities rely on Japanese Sign Language(JSL)for communication.However,existing JSL recognition systems have faced significant performance limitations due to inherent complexities.In response to these challenges,we present a novel JSL recognition system that employs a strategic fusion approach,combining joint skeleton-based handcrafted features and pixel-based deep learning features.Our system incorporates two distinct streams:the first stream extracts crucial handcrafted features,emphasizing the capture of hand and body movements within JSL gestures.Simultaneously,a deep learning-based transfer learning stream captures hierarchical representations of JSL gestures in the second stream.Then,we concatenated the critical information of the first stream and the hierarchy of the second stream features to produce the multiple levels of the fusion features,aiming to create a comprehensive representation of the JSL gestures.After reducing the dimensionality of the feature,a feature selection approach and a kernel-based support vector machine(SVM)were used for the classification.To assess the effectiveness of our approach,we conducted extensive experiments on our Lab JSL dataset and a publicly available Arabic sign language(ArSL)dataset.Our results unequivocally demonstrate that our fusion approach significantly enhances JSL recognition accuracy and robustness compared to individual feature sets or traditional recognition methods.展开更多
Research on Chinese Sign Language(CSL)provides convenience and support for individuals with hearing impairments to communicate and integrate into society.This article reviews the relevant literature on Chinese Sign La...Research on Chinese Sign Language(CSL)provides convenience and support for individuals with hearing impairments to communicate and integrate into society.This article reviews the relevant literature on Chinese Sign Language Recognition(CSLR)in the past 20 years.Hidden Markov Models(HMM),Support Vector Machines(SVM),and Dynamic Time Warping(DTW)were found to be the most commonly employed technologies among traditional identificationmethods.Benefiting from the rapid development of computer vision and artificial intelligence technology,Convolutional Neural Networks(CNN),3D-CNN,YOLO,Capsule Network(CapsNet)and various deep neural networks have sprung up.Deep Neural Networks(DNNs)and their derived models are integral tomodern artificial intelligence recognitionmethods.In addition,technologies thatwerewidely used in the early days have also been integrated and applied to specific hybrid models and customized identification methods.Sign language data collection includes acquiring data from data gloves,data sensors(such as Kinect,LeapMotion,etc.),and high-definition photography.Meanwhile,facial expression recognition,complex background processing,and 3D sign language recognition have also attracted research interests among scholars.Due to the uniqueness and complexity of Chinese sign language,accuracy,robustness,real-time performance,and user independence are significant challenges for future sign language recognition research.Additionally,suitable datasets and evaluation criteria are also worth pursuing.展开更多
Deaf people or people facing hearing issues can communicate using sign language(SL),a visual language.Many works based on rich source language have been proposed;however,the work using poor resource language is still ...Deaf people or people facing hearing issues can communicate using sign language(SL),a visual language.Many works based on rich source language have been proposed;however,the work using poor resource language is still lacking.Unlike other SLs,the visuals of the Urdu Language are different.This study presents a novel approach to translating Urdu sign language(UrSL)using the UrSL-CNN model,a convolutional neural network(CNN)architecture specifically designed for this purpose.Unlike existingworks that primarily focus on languageswith rich resources,this study addresses the challenge of translating a sign language with limited resources.We conducted experiments using two datasets containing 1500 and 78,000 images,employing a methodology comprising four modules:data collection,pre-processing,categorization,and prediction.To enhance prediction accuracy,each sign image was transformed into a greyscale image and underwent noise filtering.Comparative analysis with machine learning baseline methods(support vectormachine,GaussianNaive Bayes,randomforest,and k-nearest neighbors’algorithm)on the UrSL alphabets dataset demonstrated the superiority of UrSL-CNN,achieving an accuracy of 0.95.Additionally,our model exhibited superior performance in Precision,Recall,and F1-score evaluations.This work not only contributes to advancing sign language translation but also holds promise for improving communication accessibility for individuals with hearing impairments.展开更多
The hands and face are the most important parts for expressing sign language morphemes in sign language videos.However,we find that existing Continuous Sign Language Recognition(CSLR)methods lack the mining of hand an...The hands and face are the most important parts for expressing sign language morphemes in sign language videos.However,we find that existing Continuous Sign Language Recognition(CSLR)methods lack the mining of hand and face information in visual backbones or use expensive and time-consuming external extractors to explore this information.In addition,the signs have different lengths,whereas previous CSLR methods typically use a fixed-length window to segment the video to capture sequential features and then perform global temporal modeling,which disturbs the perception of complete signs.In this study,we propose a Multi-Scale Context-Aware network(MSCA-Net)to solve the aforementioned problems.Our MSCA-Net contains two main modules:(1)Multi-Scale Motion Attention(MSMA),which uses the differences among frames to perceive information of the hands and face in multiple spatial scales,replacing the heavy feature extractors;and(2)Multi-Scale Temporal Modeling(MSTM),which explores crucial temporal information in the sign language video from different temporal scales.We conduct extensive experiments using three widely used sign language datasets,i.e.,RWTH-PHOENIX-Weather-2014,RWTH-PHOENIX-Weather-2014T,and CSL-Daily.The proposed MSCA-Net achieve state-of-the-art performance,demonstrating the effectiveness of our approach.展开更多
Continuous sign language recognition(CSLR)is challenging due to the complexity of video background,hand gesture variability,and temporal modeling difficulties.This work proposes a CSLR method based on a spatialtempora...Continuous sign language recognition(CSLR)is challenging due to the complexity of video background,hand gesture variability,and temporal modeling difficulties.This work proposes a CSLR method based on a spatialtemporal graph attention network to focus on essential features of video series.The method considers local details of sign language movements by taking the information on joints and bones as inputs and constructing a spatialtemporal graph to reflect inter-frame relevance and physical connections between nodes.The graph-based multihead attention mechanism is utilized with adjacent matrix calculation for better local-feature exploration,and short-term motion correlation modeling is completed via a temporal convolutional network.We adopted BLSTM to learn the long-termdependence and connectionist temporal classification to align the word-level sequences.The proposed method achieves competitive results regarding word error rates(1.59%)on the Chinese Sign Language dataset and the mean Jaccard Index(65.78%)on the ChaLearn LAP Continuous Gesture Dataset.展开更多
With advancements in computing powers and the overall quality of images captured on everyday cameras,a much wider range of possibilities has opened in various scenarios.This fact has several implications for deaf and ...With advancements in computing powers and the overall quality of images captured on everyday cameras,a much wider range of possibilities has opened in various scenarios.This fact has several implications for deaf and dumb people as they have a chance to communicate with a greater number of people much easier.More than ever before,there is a plethora of info about sign language usage in the real world.Sign languages,and by extension the datasets available,are of two forms,isolated sign language and continuous sign language.The main difference between the two types is that in isolated sign language,the hand signs cover individual letters of the alphabet.In continuous sign language,entire words’hand signs are used.This paper will explore a novel deep learning architecture that will use recently published large pre-trained image models to quickly and accurately recognize the alphabets in the American Sign Language(ASL).The study will focus on isolated sign language to demonstrate that it is possible to achieve a high level of classification accuracy on the data,thereby showing that interpreters can be implemented in the real world.The newly proposed Mobile-NetV2 architecture serves as the backbone of this study.It is designed to run on end devices like mobile phones and infer signals(what does it infer)from images in a relatively short amount of time.With the proposed architecture in this paper,the classification accuracy of 98.77%in the Indian Sign Language(ISL)and American Sign Language(ASL)is achieved,outperforming the existing state-of-the-art systems.展开更多
This study presents a novel and innovative approach to auto-matically translating Arabic Sign Language(ATSL)into spoken Arabic.The proposed solution utilizes a deep learning-based classification approach and the trans...This study presents a novel and innovative approach to auto-matically translating Arabic Sign Language(ATSL)into spoken Arabic.The proposed solution utilizes a deep learning-based classification approach and the transfer learning technique to retrain 12 image recognition models.The image-based translation method maps sign language gestures to corre-sponding letters or words using distance measures and classification as a machine learning technique.The results show that the proposed model is more accurate and faster than traditional image-based models in classifying Arabic-language signs,with a translation accuracy of 93.7%.This research makes a significant contribution to the field of ATSL.It offers a practical solution for improving communication for individuals with special needs,such as the deaf and mute community.This work demonstrates the potential of deep learning techniques in translating sign language into natural language and highlights the importance of ATSL in facilitating communication for individuals with disabilities.展开更多
Sign language includes the motion of the arms and hands to communicate with people with hearing disabilities.Several models have been available in the literature for sign language detection and classification for enha...Sign language includes the motion of the arms and hands to communicate with people with hearing disabilities.Several models have been available in the literature for sign language detection and classification for enhanced outcomes.But the latest advancements in computer vision enable us to perform signs/gesture recognition using deep neural networks.This paper introduces an Arabic Sign Language Gesture Classification using Deer Hunting Optimization with Machine Learning(ASLGC-DHOML)model.The presented ASLGC-DHOML technique mainly concentrates on recognising and classifying sign language gestures.The presented ASLGC-DHOML model primarily pre-processes the input gesture images and generates feature vectors using the densely connected network(DenseNet169)model.For gesture recognition and classification,a multilayer perceptron(MLP)classifier is exploited to recognize and classify the existence of sign language gestures.Lastly,the DHO algorithm is utilized for parameter optimization of the MLP model.The experimental results of the ASLGC-DHOML model are tested and the outcomes are inspected under distinct aspects.The comparison analysis highlighted that the ASLGC-DHOML method has resulted in enhanced gesture classification results than other techniques with maximum accuracy of 92.88%.展开更多
Sign language is used as a communication medium in the field of trade,defence,and in deaf-mute communities worldwide.Over the last few decades,research in the domain of translation of sign language has grown and becom...Sign language is used as a communication medium in the field of trade,defence,and in deaf-mute communities worldwide.Over the last few decades,research in the domain of translation of sign language has grown and become more challenging.This necessitates the development of a Sign Language Translation System(SLTS)to provide effective communication in different research domains.In this paper,novel Hybrid Adaptive Gaussian Thresholding with Otsu Algorithm(Hybrid-AO)for image segmentation is proposed for the translation of alphabet-level Indian Sign Language(ISLTS)with a 5-layer Convolution Neural Network(CNN).The focus of this paper is to analyze various image segmentation(Canny Edge Detection,Simple Thresholding,and Hybrid-AO),pooling approaches(Max,Average,and Global Average Pooling),and activation functions(ReLU,Leaky ReLU,and ELU).5-layer CNN with Max pooling,Leaky ReLU activation function,and Hybrid-AO(5MXLR-HAO)have outperformed other frameworks.An open-access dataset of ISL alphabets with approx.31 K images of 26 classes have been used to train and test the model.The proposed framework has been developed for translating alphabet-level Indian Sign Language into text.The proposed framework attains 98.95%training accuracy,98.05%validation accuracy,and 0.0721 training loss and 0.1021 validation loss and the perfor-mance of the proposed system outperforms other existing systems.展开更多
Sign language fills the communication gap for people with hearing and speaking ailments.It includes both visual modalities,manual gestures consisting of movements of hands,and non-manual gestures incorporating body mo...Sign language fills the communication gap for people with hearing and speaking ailments.It includes both visual modalities,manual gestures consisting of movements of hands,and non-manual gestures incorporating body movements including head,facial expressions,eyes,shoulder shrugging,etc.Previously both gestures have been detected;identifying separately may have better accuracy,butmuch communicational information is lost.Aproper sign language mechanism is needed to detect manual and non-manual gestures to convey the appropriate detailed message to others.Our novel proposed system contributes as Sign LanguageAction Transformer Network(SLATN),localizing hand,body,and facial gestures in video sequences.Here we are expending a Transformer-style structural design as a“base network”to extract features from a spatiotemporal domain.Themodel impulsively learns to track individual persons and their action context inmultiple frames.Furthermore,a“head network”emphasizes hand movement and facial expression simultaneously,which is often crucial to understanding sign language,using its attention mechanism for creating tight bounding boxes around classified gestures.The model’s work is later compared with the traditional identification methods of activity recognition.It not only works faster but achieves better accuracy as well.Themodel achieves overall 82.66%testing accuracy with a very considerable performance of computation with 94.13 Giga-Floating Point Operations per Second(G-FLOPS).Another contribution is a newly created dataset of Pakistan Sign Language forManual and Non-Manual(PkSLMNM)gestures.展开更多
Sign language is mainly utilized in communication with people who have hearing disabilities.Sign language is used to communicate with people hav-ing developmental impairments who have some or no interaction skills.The...Sign language is mainly utilized in communication with people who have hearing disabilities.Sign language is used to communicate with people hav-ing developmental impairments who have some or no interaction skills.The inter-action via Sign language becomes a fruitful means of communication for hearing and speech impaired persons.A Hand gesture recognition systemfinds helpful for deaf and dumb people by making use of human computer interface(HCI)and convolutional neural networks(CNN)for identifying the static indications of Indian Sign Language(ISL).This study introduces a shark smell optimization with deep learning based automated sign language recognition(SSODL-ASLR)model for hearing and speaking impaired people.The presented SSODL-ASLR technique majorly concentrates on the recognition and classification of sign lan-guage provided by deaf and dumb people.The presented SSODL-ASLR model encompasses a two stage process namely sign language detection and sign lan-guage classification.In thefirst stage,the Mask Region based Convolution Neural Network(Mask RCNN)model is exploited for sign language recognition.Sec-ondly,SSO algorithm with soft margin support vector machine(SM-SVM)model can be utilized for sign language classification.To assure the enhanced classifica-tion performance of the SSODL-ASLR model,a brief set of simulations was car-ried out.The extensive results portrayed the supremacy of the SSODL-ASLR model over other techniques.展开更多
Hand gestures have been used as a significant mode of communication since the advent of human civilization.By facilitating human-computer interaction(HCI),hand gesture recognition(HGRoc)technology is crucial for seaml...Hand gestures have been used as a significant mode of communication since the advent of human civilization.By facilitating human-computer interaction(HCI),hand gesture recognition(HGRoc)technology is crucial for seamless and error-free HCI.HGRoc technology is pivotal in healthcare and communication for the deaf community.Despite significant advancements in computer vision-based gesture recognition for language understanding,two considerable challenges persist in this field:(a)limited and common gestures are considered,(b)processing multiple channels of information across a network takes huge computational time during discriminative feature extraction.Therefore,a novel hand vision-based convolutional neural network(CNN)model named(HVCNNM)offers several benefits,notably enhanced accuracy,robustness to variations,real-time performance,reduced channels,and scalability.Additionally,these models can be optimized for real-time performance,learn from large amounts of data,and are scalable to handle complex recognition tasks for efficient human-computer interaction.The proposed model was evaluated on two challenging datasets,namely the Massey University Dataset(MUD)and the American Sign Language(ASL)Alphabet Dataset(ASLAD).On the MUD and ASLAD datasets,HVCNNM achieved a score of 99.23% and 99.00%,respectively.These results demonstrate the effectiveness of CNN as a promising HGRoc approach.The findings suggest that the proposed model have potential roles in applications such as sign language recognition,human-computer interaction,and robotics.展开更多
The deaf-mutes population is constantly feeling helpless when others do not understand them and vice versa.To fill this gap,this study implements a CNN-based neural network,Convolutional Based Attention Module(CBAM),t...The deaf-mutes population is constantly feeling helpless when others do not understand them and vice versa.To fill this gap,this study implements a CNN-based neural network,Convolutional Based Attention Module(CBAM),to recognise Malaysian Sign Language(MSL)in videos recognition.This study has created 2071 videos for 19 dynamic signs.Two different experiments were conducted for dynamic signs,using CBAM-3DResNet implementing‘Within Blocks’and‘Before Classifier’methods.Various metrics such as the accuracy,loss,precision,recall,F1-score,confusion matrix,and training time were recorded to evaluate the models’efficiency.Results showed that CBAM-ResNet models had good performances in videos recognition tasks,with recognition rates of over 90%with little variations.CBAMResNet‘Before Classifier’is more efficient than‘Within Blocks’models of CBAM-ResNet.All experiment results indicated the CBAM-ResNet‘Before Classifier’efficiency in recognising Malaysian Sign Language and its worth of future research.展开更多
Among the human users of the Internet of Things,the hearing-impaired is a special group of people for whom normal information expression forms,such as voice and video are unaccessible,and most of them have some diffic...Among the human users of the Internet of Things,the hearing-impaired is a special group of people for whom normal information expression forms,such as voice and video are unaccessible,and most of them have some difficulty in understanding information in text form.The hearing-impaired are accustomed to receiving information expressed in sign language.For this situation,a new information expression form for the Internet of Things oriented toward the hearing-impaired is proposed in this paper,and the new expression is based on sign language video synthesis.Under the sign synthesis frame,three modules are necessary:constructing database,searching for appropriate sign language video units and transition units,and generating interpolated frames.With this method,text information could be transformed into sign language expression for the hearing-impaired.展开更多
(Aim)Chinese sign language is an essential tool for hearing-impaired to live,learn and communicate in deaf communities.Moreover,Chinese sign language plays a significant role in speech therapy and rehabilitation.Chine...(Aim)Chinese sign language is an essential tool for hearing-impaired to live,learn and communicate in deaf communities.Moreover,Chinese sign language plays a significant role in speech therapy and rehabilitation.Chinese sign language identification can provide convenience for those hearing impaired people and eliminate the communication barrier between the deaf community and the rest of society.Similar to the research of many biomedical image processing(such as automatic chest radiograph processing,diagnosis of chest radiological images,etc.),with the rapid development of artificial intelligence,especially deep learning technologies and algorithms,sign language image recognition ushered in the spring.This study aims to propose a novel sign language image recognition method based on an optimized convolutional neural network.(Method)Three different combinations of blocks:Conv-BN-ReLU-Pooling,Conv-BN-ReLU,Conv-BN-ReLU-BN were employed,including some advanced technologies such as batch normalization,dropout,and Leaky ReLU.We proposed an optimized convolutional neural network to identify 1320 sign language images,which was called as CNN-CB method.Totally ten runs were implemented with the hold-out randomly set for each run.(Results)The results indicate that our CNN-CB method gained an overall accuracy of 94.88±0.99%.(Conclusion)Our CNN-CB method is superior to thirteen state-of-the-art methods:eight traditional machine learning approaches and five modern convolutional neural network approaches.展开更多
This document presents a computer vision system for the automatic recognition of Mexican Sign Language (MSL), based on normalized moments as invariant (to translation and scale transforms) descriptors, using artificia...This document presents a computer vision system for the automatic recognition of Mexican Sign Language (MSL), based on normalized moments as invariant (to translation and scale transforms) descriptors, using artificial neural networks as pattern recognition model. An experimental feature selection was performed to reduce computational costs due to this work focusing on automatic recognition. The computer vision system includes four LED-reflectors of 700 lumens each in order to improve image acquisition quality;this illumination system allows reducing shadows in each sign of the MSL. MSL contains 27 signs in total but 6 of them are expressed with movement;this paper presents a framework for the automatic recognition of 21 static signs of MSL. The proposed system achieved 93% of recognition rate.展开更多
Communication is a basic need of every human being;by this,they can learn,express their feelings and exchange their ideas,but deaf people cannot listen and speak.For communication,they use various hands gestures,also ...Communication is a basic need of every human being;by this,they can learn,express their feelings and exchange their ideas,but deaf people cannot listen and speak.For communication,they use various hands gestures,also known as Sign Language(SL),which they learn from special schools.As normal people have not taken SL classes;therefore,they are unable to perform signs of daily routine sentences(e.g.,what are the specifications of this mobile phone?).A technological solution can facilitate in overcoming this communication gap by which normal people can communicate with deaf people.This paper presents an architecture for an application named Sign4PSL that translates the sentences to Pakistan Sign Language(PSL)for deaf people with visual representation using virtual signing character.This research aims to develop a generic independent application that is lightweight and reusable on any platform,including web and mobile,with an ability to perform offline text translation.The Sign4PSL relies on a knowledge base that stores both corpus of PSL Words and their coded form in the notation system.Sign4PSL takes English language text as an input,performs the translation to PSL through sign language notation and displays gestures to the user using virtual character.The system is tested on deaf students at a special school.The results have shown that the students were able to understand the story presented to them appropriately.展开更多
Sign language recognition can be considered as an effective solution for disabled people to communicate with others.It helps them in conveying the intended information using sign languages without any challenges.Recen...Sign language recognition can be considered as an effective solution for disabled people to communicate with others.It helps them in conveying the intended information using sign languages without any challenges.Recent advancements in computer vision and image processing techniques can be leveraged to detect and classify the signs used by disabled people in an effective manner.Metaheuristic optimization algorithms can be designed in a manner such that it fine tunes the hyper parameters,used in Deep Learning(DL)models as the latter considerably impacts the classification results.With this motivation,the current study designs the Optimal Deep Transfer Learning Driven Sign Language Recognition and Classification(ODTL-SLRC)model for disabled people.The aim of the proposed ODTL-SLRC technique is to recognize and classify sign languages used by disabled people.The proposed ODTL-SLRC technique derives EfficientNet model to generate a collection of useful feature vectors.In addition,the hyper parameters involved in EfficientNet model are fine-tuned with the help of HGSO algorithm.Moreover,Bidirectional Long Short Term Memory(BiLSTM)technique is employed for sign language classification.The proposed ODTL-SLRC technique was experimentally validated using benchmark dataset and the results were inspected under several measures.The comparative analysis results established the superior performance of the proposed ODTL-SLRC technique over recent approaches in terms of efficiency.展开更多
Communication is a basic need of every human being to exchange thoughts and interact with the society.Acute peoples usually confab through different spoken languages,whereas deaf people cannot do so.Therefore,the Sign...Communication is a basic need of every human being to exchange thoughts and interact with the society.Acute peoples usually confab through different spoken languages,whereas deaf people cannot do so.Therefore,the Sign Language(SL)is the communication medium of such people for their conversation and interaction with the society.The SL is expressed in terms of specific gesture for every word and a gesture is consisted in a sequence of performed signs.The acute people normally observe these signs to understand the difference between single and multiple gestures for singular and plural words respectively.The signs for singular words such as I,eat,drink,home are unalike the plural words as school,cars,players.A special training is required to gain the sufficient knowledge and practice so that people can differentiate and understand every gesture/sign appropriately.Innumerable researches have been performed to articulate the computer-based solution to understand the single gesture with the help of a single hand enumeration.The complete understanding of such communications are possible only with the help of this differentiation of gestures in computer-based solution of SL to cope with the real world environment.Hence,there is still a demand for specific environment to automate such a communication solution to interact with such type of special people.This research focuses on facilitating the deaf community by capturing the gestures in video format and then mapping and differentiating as single or multiple gestures used in words.Finally,these are converted into the respective words/sentences within a reasonable time.This provide a real time solution for the deaf people to communicate and interact with the society.展开更多
基金supported from the National Philosophy and Social Sciences Foundation(Grant No.20BTQ065).
文摘Sign language,a visual-gestural language used by the deaf and hard-of-hearing community,plays a crucial role in facilitating communication and promoting inclusivity.Sign language recognition(SLR),the process of automatically recognizing and interpreting sign language gestures,has gained significant attention in recent years due to its potential to bridge the communication gap between the hearing impaired and the hearing world.The emergence and continuous development of deep learning techniques have provided inspiration and momentum for advancing SLR.This paper presents a comprehensive and up-to-date analysis of the advancements,challenges,and opportunities in deep learning-based sign language recognition,focusing on the past five years of research.We explore various aspects of SLR,including sign data acquisition technologies,sign language datasets,evaluation methods,and different types of neural networks.Convolutional Neural Networks(CNN)and Recurrent Neural Networks(RNN)have shown promising results in fingerspelling and isolated sign recognition.However,the continuous nature of sign language poses challenges,leading to the exploration of advanced neural network models such as the Transformer model for continuous sign language recognition(CSLR).Despite significant advancements,several challenges remain in the field of SLR.These challenges include expanding sign language datasets,achieving user independence in recognition systems,exploring different input modalities,effectively fusing features,modeling co-articulation,and improving semantic and syntactic understanding.Additionally,developing lightweight network architectures for mobile applications is crucial for practical implementation.By addressing these challenges,we can further advance the field of deep learning for sign language recognition and improve communication for the hearing-impaired community.
基金supported by the Competitive Research Fund of the University of Aizu,Japan.
文摘Sign language recognition is vital for enhancing communication accessibility among the Deaf and hard-of-hearing communities.In Japan,approximately 360,000 individualswith hearing and speech disabilities rely on Japanese Sign Language(JSL)for communication.However,existing JSL recognition systems have faced significant performance limitations due to inherent complexities.In response to these challenges,we present a novel JSL recognition system that employs a strategic fusion approach,combining joint skeleton-based handcrafted features and pixel-based deep learning features.Our system incorporates two distinct streams:the first stream extracts crucial handcrafted features,emphasizing the capture of hand and body movements within JSL gestures.Simultaneously,a deep learning-based transfer learning stream captures hierarchical representations of JSL gestures in the second stream.Then,we concatenated the critical information of the first stream and the hierarchy of the second stream features to produce the multiple levels of the fusion features,aiming to create a comprehensive representation of the JSL gestures.After reducing the dimensionality of the feature,a feature selection approach and a kernel-based support vector machine(SVM)were used for the classification.To assess the effectiveness of our approach,we conducted extensive experiments on our Lab JSL dataset and a publicly available Arabic sign language(ArSL)dataset.Our results unequivocally demonstrate that our fusion approach significantly enhances JSL recognition accuracy and robustness compared to individual feature sets or traditional recognition methods.
基金supported by National Social Science Foundation Annual Project“Research on Evaluation and Improvement Paths of Integrated Development of Disabled Persons”(Grant No.20BRK029)the National Language Commission’s“14th Five-Year Plan”Scientific Research Plan 2023 Project“Domain Digital Language Service Resource Construction and Key Technology Research”(YB145-72)the National Philosophy and Social Sciences Foundation(Grant No.20BTQ065).
文摘Research on Chinese Sign Language(CSL)provides convenience and support for individuals with hearing impairments to communicate and integrate into society.This article reviews the relevant literature on Chinese Sign Language Recognition(CSLR)in the past 20 years.Hidden Markov Models(HMM),Support Vector Machines(SVM),and Dynamic Time Warping(DTW)were found to be the most commonly employed technologies among traditional identificationmethods.Benefiting from the rapid development of computer vision and artificial intelligence technology,Convolutional Neural Networks(CNN),3D-CNN,YOLO,Capsule Network(CapsNet)and various deep neural networks have sprung up.Deep Neural Networks(DNNs)and their derived models are integral tomodern artificial intelligence recognitionmethods.In addition,technologies thatwerewidely used in the early days have also been integrated and applied to specific hybrid models and customized identification methods.Sign language data collection includes acquiring data from data gloves,data sensors(such as Kinect,LeapMotion,etc.),and high-definition photography.Meanwhile,facial expression recognition,complex background processing,and 3D sign language recognition have also attracted research interests among scholars.Due to the uniqueness and complexity of Chinese sign language,accuracy,robustness,real-time performance,and user independence are significant challenges for future sign language recognition research.Additionally,suitable datasets and evaluation criteria are also worth pursuing.
文摘Deaf people or people facing hearing issues can communicate using sign language(SL),a visual language.Many works based on rich source language have been proposed;however,the work using poor resource language is still lacking.Unlike other SLs,the visuals of the Urdu Language are different.This study presents a novel approach to translating Urdu sign language(UrSL)using the UrSL-CNN model,a convolutional neural network(CNN)architecture specifically designed for this purpose.Unlike existingworks that primarily focus on languageswith rich resources,this study addresses the challenge of translating a sign language with limited resources.We conducted experiments using two datasets containing 1500 and 78,000 images,employing a methodology comprising four modules:data collection,pre-processing,categorization,and prediction.To enhance prediction accuracy,each sign image was transformed into a greyscale image and underwent noise filtering.Comparative analysis with machine learning baseline methods(support vectormachine,GaussianNaive Bayes,randomforest,and k-nearest neighbors’algorithm)on the UrSL alphabets dataset demonstrated the superiority of UrSL-CNN,achieving an accuracy of 0.95.Additionally,our model exhibited superior performance in Precision,Recall,and F1-score evaluations.This work not only contributes to advancing sign language translation but also holds promise for improving communication accessibility for individuals with hearing impairments.
基金Supported by the National Natural Science Foundation of China(62072334).
文摘The hands and face are the most important parts for expressing sign language morphemes in sign language videos.However,we find that existing Continuous Sign Language Recognition(CSLR)methods lack the mining of hand and face information in visual backbones or use expensive and time-consuming external extractors to explore this information.In addition,the signs have different lengths,whereas previous CSLR methods typically use a fixed-length window to segment the video to capture sequential features and then perform global temporal modeling,which disturbs the perception of complete signs.In this study,we propose a Multi-Scale Context-Aware network(MSCA-Net)to solve the aforementioned problems.Our MSCA-Net contains two main modules:(1)Multi-Scale Motion Attention(MSMA),which uses the differences among frames to perceive information of the hands and face in multiple spatial scales,replacing the heavy feature extractors;and(2)Multi-Scale Temporal Modeling(MSTM),which explores crucial temporal information in the sign language video from different temporal scales.We conduct extensive experiments using three widely used sign language datasets,i.e.,RWTH-PHOENIX-Weather-2014,RWTH-PHOENIX-Weather-2014T,and CSL-Daily.The proposed MSCA-Net achieve state-of-the-art performance,demonstrating the effectiveness of our approach.
基金supported by the Key Research&Development Plan Project of Shandong Province,China(No.2017GGX10127).
文摘Continuous sign language recognition(CSLR)is challenging due to the complexity of video background,hand gesture variability,and temporal modeling difficulties.This work proposes a CSLR method based on a spatialtemporal graph attention network to focus on essential features of video series.The method considers local details of sign language movements by taking the information on joints and bones as inputs and constructing a spatialtemporal graph to reflect inter-frame relevance and physical connections between nodes.The graph-based multihead attention mechanism is utilized with adjacent matrix calculation for better local-feature exploration,and short-term motion correlation modeling is completed via a temporal convolutional network.We adopted BLSTM to learn the long-termdependence and connectionist temporal classification to align the word-level sequences.The proposed method achieves competitive results regarding word error rates(1.59%)on the Chinese Sign Language dataset and the mean Jaccard Index(65.78%)on the ChaLearn LAP Continuous Gesture Dataset.
文摘With advancements in computing powers and the overall quality of images captured on everyday cameras,a much wider range of possibilities has opened in various scenarios.This fact has several implications for deaf and dumb people as they have a chance to communicate with a greater number of people much easier.More than ever before,there is a plethora of info about sign language usage in the real world.Sign languages,and by extension the datasets available,are of two forms,isolated sign language and continuous sign language.The main difference between the two types is that in isolated sign language,the hand signs cover individual letters of the alphabet.In continuous sign language,entire words’hand signs are used.This paper will explore a novel deep learning architecture that will use recently published large pre-trained image models to quickly and accurately recognize the alphabets in the American Sign Language(ASL).The study will focus on isolated sign language to demonstrate that it is possible to achieve a high level of classification accuracy on the data,thereby showing that interpreters can be implemented in the real world.The newly proposed Mobile-NetV2 architecture serves as the backbone of this study.It is designed to run on end devices like mobile phones and infer signals(what does it infer)from images in a relatively short amount of time.With the proposed architecture in this paper,the classification accuracy of 98.77%in the Indian Sign Language(ISL)and American Sign Language(ASL)is achieved,outperforming the existing state-of-the-art systems.
文摘This study presents a novel and innovative approach to auto-matically translating Arabic Sign Language(ATSL)into spoken Arabic.The proposed solution utilizes a deep learning-based classification approach and the transfer learning technique to retrain 12 image recognition models.The image-based translation method maps sign language gestures to corre-sponding letters or words using distance measures and classification as a machine learning technique.The results show that the proposed model is more accurate and faster than traditional image-based models in classifying Arabic-language signs,with a translation accuracy of 93.7%.This research makes a significant contribution to the field of ATSL.It offers a practical solution for improving communication for individuals with special needs,such as the deaf and mute community.This work demonstrates the potential of deep learning techniques in translating sign language into natural language and highlights the importance of ATSL in facilitating communication for individuals with disabilities.
基金Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2023R263)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia+1 种基金The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura Universitysupporting this work by Grant Code:22UQU4310373DSR54.
文摘Sign language includes the motion of the arms and hands to communicate with people with hearing disabilities.Several models have been available in the literature for sign language detection and classification for enhanced outcomes.But the latest advancements in computer vision enable us to perform signs/gesture recognition using deep neural networks.This paper introduces an Arabic Sign Language Gesture Classification using Deer Hunting Optimization with Machine Learning(ASLGC-DHOML)model.The presented ASLGC-DHOML technique mainly concentrates on recognising and classifying sign language gestures.The presented ASLGC-DHOML model primarily pre-processes the input gesture images and generates feature vectors using the densely connected network(DenseNet169)model.For gesture recognition and classification,a multilayer perceptron(MLP)classifier is exploited to recognize and classify the existence of sign language gestures.Lastly,the DHO algorithm is utilized for parameter optimization of the MLP model.The experimental results of the ASLGC-DHOML model are tested and the outcomes are inspected under distinct aspects.The comparison analysis highlighted that the ASLGC-DHOML method has resulted in enhanced gesture classification results than other techniques with maximum accuracy of 92.88%.
文摘Sign language is used as a communication medium in the field of trade,defence,and in deaf-mute communities worldwide.Over the last few decades,research in the domain of translation of sign language has grown and become more challenging.This necessitates the development of a Sign Language Translation System(SLTS)to provide effective communication in different research domains.In this paper,novel Hybrid Adaptive Gaussian Thresholding with Otsu Algorithm(Hybrid-AO)for image segmentation is proposed for the translation of alphabet-level Indian Sign Language(ISLTS)with a 5-layer Convolution Neural Network(CNN).The focus of this paper is to analyze various image segmentation(Canny Edge Detection,Simple Thresholding,and Hybrid-AO),pooling approaches(Max,Average,and Global Average Pooling),and activation functions(ReLU,Leaky ReLU,and ELU).5-layer CNN with Max pooling,Leaky ReLU activation function,and Hybrid-AO(5MXLR-HAO)have outperformed other frameworks.An open-access dataset of ISL alphabets with approx.31 K images of 26 classes have been used to train and test the model.The proposed framework has been developed for translating alphabet-level Indian Sign Language into text.The proposed framework attains 98.95%training accuracy,98.05%validation accuracy,and 0.0721 training loss and 0.1021 validation loss and the perfor-mance of the proposed system outperforms other existing systems.
文摘Sign language fills the communication gap for people with hearing and speaking ailments.It includes both visual modalities,manual gestures consisting of movements of hands,and non-manual gestures incorporating body movements including head,facial expressions,eyes,shoulder shrugging,etc.Previously both gestures have been detected;identifying separately may have better accuracy,butmuch communicational information is lost.Aproper sign language mechanism is needed to detect manual and non-manual gestures to convey the appropriate detailed message to others.Our novel proposed system contributes as Sign LanguageAction Transformer Network(SLATN),localizing hand,body,and facial gestures in video sequences.Here we are expending a Transformer-style structural design as a“base network”to extract features from a spatiotemporal domain.Themodel impulsively learns to track individual persons and their action context inmultiple frames.Furthermore,a“head network”emphasizes hand movement and facial expression simultaneously,which is often crucial to understanding sign language,using its attention mechanism for creating tight bounding boxes around classified gestures.The model’s work is later compared with the traditional identification methods of activity recognition.It not only works faster but achieves better accuracy as well.Themodel achieves overall 82.66%testing accuracy with a very considerable performance of computation with 94.13 Giga-Floating Point Operations per Second(G-FLOPS).Another contribution is a newly created dataset of Pakistan Sign Language forManual and Non-Manual(PkSLMNM)gestures.
文摘Sign language is mainly utilized in communication with people who have hearing disabilities.Sign language is used to communicate with people hav-ing developmental impairments who have some or no interaction skills.The inter-action via Sign language becomes a fruitful means of communication for hearing and speech impaired persons.A Hand gesture recognition systemfinds helpful for deaf and dumb people by making use of human computer interface(HCI)and convolutional neural networks(CNN)for identifying the static indications of Indian Sign Language(ISL).This study introduces a shark smell optimization with deep learning based automated sign language recognition(SSODL-ASLR)model for hearing and speaking impaired people.The presented SSODL-ASLR technique majorly concentrates on the recognition and classification of sign lan-guage provided by deaf and dumb people.The presented SSODL-ASLR model encompasses a two stage process namely sign language detection and sign lan-guage classification.In thefirst stage,the Mask Region based Convolution Neural Network(Mask RCNN)model is exploited for sign language recognition.Sec-ondly,SSO algorithm with soft margin support vector machine(SM-SVM)model can be utilized for sign language classification.To assure the enhanced classifica-tion performance of the SSODL-ASLR model,a brief set of simulations was car-ried out.The extensive results portrayed the supremacy of the SSODL-ASLR model over other techniques.
基金funded by Researchers Supporting Project Number(RSPD2024 R947),King Saud University,Riyadh,Saudi Arabia.
文摘Hand gestures have been used as a significant mode of communication since the advent of human civilization.By facilitating human-computer interaction(HCI),hand gesture recognition(HGRoc)technology is crucial for seamless and error-free HCI.HGRoc technology is pivotal in healthcare and communication for the deaf community.Despite significant advancements in computer vision-based gesture recognition for language understanding,two considerable challenges persist in this field:(a)limited and common gestures are considered,(b)processing multiple channels of information across a network takes huge computational time during discriminative feature extraction.Therefore,a novel hand vision-based convolutional neural network(CNN)model named(HVCNNM)offers several benefits,notably enhanced accuracy,robustness to variations,real-time performance,reduced channels,and scalability.Additionally,these models can be optimized for real-time performance,learn from large amounts of data,and are scalable to handle complex recognition tasks for efficient human-computer interaction.The proposed model was evaluated on two challenging datasets,namely the Massey University Dataset(MUD)and the American Sign Language(ASL)Alphabet Dataset(ASLAD).On the MUD and ASLAD datasets,HVCNNM achieved a score of 99.23% and 99.00%,respectively.These results demonstrate the effectiveness of CNN as a promising HGRoc approach.The findings suggest that the proposed model have potential roles in applications such as sign language recognition,human-computer interaction,and robotics.
文摘The deaf-mutes population is constantly feeling helpless when others do not understand them and vice versa.To fill this gap,this study implements a CNN-based neural network,Convolutional Based Attention Module(CBAM),to recognise Malaysian Sign Language(MSL)in videos recognition.This study has created 2071 videos for 19 dynamic signs.Two different experiments were conducted for dynamic signs,using CBAM-3DResNet implementing‘Within Blocks’and‘Before Classifier’methods.Various metrics such as the accuracy,loss,precision,recall,F1-score,confusion matrix,and training time were recorded to evaluate the models’efficiency.Results showed that CBAM-ResNet models had good performances in videos recognition tasks,with recognition rates of over 90%with little variations.CBAMResNet‘Before Classifier’is more efficient than‘Within Blocks’models of CBAM-ResNet.All experiment results indicated the CBAM-ResNet‘Before Classifier’efficiency in recognising Malaysian Sign Language and its worth of future research.
基金supported by the National Natural Science Foundation of China(Nos.60825203,60973056,60973057,U0935004)National Technology Support Project(2007BAH13B01)+2 种基金Beijing Municipal Natural Science Foundation(4102009)Scientific Research Common Program of Beijing Municipal Commission of Education(KM200710005023)PHR(IHLB)
文摘Among the human users of the Internet of Things,the hearing-impaired is a special group of people for whom normal information expression forms,such as voice and video are unaccessible,and most of them have some difficulty in understanding information in text form.The hearing-impaired are accustomed to receiving information expressed in sign language.For this situation,a new information expression form for the Internet of Things oriented toward the hearing-impaired is proposed in this paper,and the new expression is based on sign language video synthesis.Under the sign synthesis frame,three modules are necessary:constructing database,searching for appropriate sign language video units and transition units,and generating interpolated frames.With this method,text information could be transformed into sign language expression for the hearing-impaired.
基金supported from The National Philosophy and Social Sciences Foundation(Grant No.20BTQ065).
文摘(Aim)Chinese sign language is an essential tool for hearing-impaired to live,learn and communicate in deaf communities.Moreover,Chinese sign language plays a significant role in speech therapy and rehabilitation.Chinese sign language identification can provide convenience for those hearing impaired people and eliminate the communication barrier between the deaf community and the rest of society.Similar to the research of many biomedical image processing(such as automatic chest radiograph processing,diagnosis of chest radiological images,etc.),with the rapid development of artificial intelligence,especially deep learning technologies and algorithms,sign language image recognition ushered in the spring.This study aims to propose a novel sign language image recognition method based on an optimized convolutional neural network.(Method)Three different combinations of blocks:Conv-BN-ReLU-Pooling,Conv-BN-ReLU,Conv-BN-ReLU-BN were employed,including some advanced technologies such as batch normalization,dropout,and Leaky ReLU.We proposed an optimized convolutional neural network to identify 1320 sign language images,which was called as CNN-CB method.Totally ten runs were implemented with the hold-out randomly set for each run.(Results)The results indicate that our CNN-CB method gained an overall accuracy of 94.88±0.99%.(Conclusion)Our CNN-CB method is superior to thirteen state-of-the-art methods:eight traditional machine learning approaches and five modern convolutional neural network approaches.
文摘This document presents a computer vision system for the automatic recognition of Mexican Sign Language (MSL), based on normalized moments as invariant (to translation and scale transforms) descriptors, using artificial neural networks as pattern recognition model. An experimental feature selection was performed to reduce computational costs due to this work focusing on automatic recognition. The computer vision system includes four LED-reflectors of 700 lumens each in order to improve image acquisition quality;this illumination system allows reducing shadows in each sign of the MSL. MSL contains 27 signs in total but 6 of them are expressed with movement;this paper presents a framework for the automatic recognition of 21 static signs of MSL. The proposed system achieved 93% of recognition rate.
基金This research is ongoing research supported by Yayasan Universiti Teknologi PETRONAS Grant Scheme,015LC0029 and 015LC0277.
文摘Communication is a basic need of every human being;by this,they can learn,express their feelings and exchange their ideas,but deaf people cannot listen and speak.For communication,they use various hands gestures,also known as Sign Language(SL),which they learn from special schools.As normal people have not taken SL classes;therefore,they are unable to perform signs of daily routine sentences(e.g.,what are the specifications of this mobile phone?).A technological solution can facilitate in overcoming this communication gap by which normal people can communicate with deaf people.This paper presents an architecture for an application named Sign4PSL that translates the sentences to Pakistan Sign Language(PSL)for deaf people with visual representation using virtual signing character.This research aims to develop a generic independent application that is lightweight and reusable on any platform,including web and mobile,with an ability to perform offline text translation.The Sign4PSL relies on a knowledge base that stores both corpus of PSL Words and their coded form in the notation system.Sign4PSL takes English language text as an input,performs the translation to PSL through sign language notation and displays gestures to the user using virtual character.The system is tested on deaf students at a special school.The results have shown that the students were able to understand the story presented to them appropriately.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work under grant number(RGP 1/322/42)Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R77)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:(22UQU4210118DSR02).
文摘Sign language recognition can be considered as an effective solution for disabled people to communicate with others.It helps them in conveying the intended information using sign languages without any challenges.Recent advancements in computer vision and image processing techniques can be leveraged to detect and classify the signs used by disabled people in an effective manner.Metaheuristic optimization algorithms can be designed in a manner such that it fine tunes the hyper parameters,used in Deep Learning(DL)models as the latter considerably impacts the classification results.With this motivation,the current study designs the Optimal Deep Transfer Learning Driven Sign Language Recognition and Classification(ODTL-SLRC)model for disabled people.The aim of the proposed ODTL-SLRC technique is to recognize and classify sign languages used by disabled people.The proposed ODTL-SLRC technique derives EfficientNet model to generate a collection of useful feature vectors.In addition,the hyper parameters involved in EfficientNet model are fine-tuned with the help of HGSO algorithm.Moreover,Bidirectional Long Short Term Memory(BiLSTM)technique is employed for sign language classification.The proposed ODTL-SLRC technique was experimentally validated using benchmark dataset and the results were inspected under several measures.The comparative analysis results established the superior performance of the proposed ODTL-SLRC technique over recent approaches in terms of efficiency.
基金The work presented in this paper is part of an ongoing research funded by Yayasan Universiti Teknologi PETRONAS Grant(015LC0-311 and 015LC0-029).
文摘Communication is a basic need of every human being to exchange thoughts and interact with the society.Acute peoples usually confab through different spoken languages,whereas deaf people cannot do so.Therefore,the Sign Language(SL)is the communication medium of such people for their conversation and interaction with the society.The SL is expressed in terms of specific gesture for every word and a gesture is consisted in a sequence of performed signs.The acute people normally observe these signs to understand the difference between single and multiple gestures for singular and plural words respectively.The signs for singular words such as I,eat,drink,home are unalike the plural words as school,cars,players.A special training is required to gain the sufficient knowledge and practice so that people can differentiate and understand every gesture/sign appropriately.Innumerable researches have been performed to articulate the computer-based solution to understand the single gesture with the help of a single hand enumeration.The complete understanding of such communications are possible only with the help of this differentiation of gestures in computer-based solution of SL to cope with the real world environment.Hence,there is still a demand for specific environment to automate such a communication solution to interact with such type of special people.This research focuses on facilitating the deaf community by capturing the gestures in video format and then mapping and differentiating as single or multiple gestures used in words.Finally,these are converted into the respective words/sentences within a reasonable time.This provide a real time solution for the deaf people to communicate and interact with the society.