Sign language,a visual-gestural language used by the deaf and hard-of-hearing community,plays a crucial role in facilitating communication and promoting inclusivity.Sign language recognition(SLR),the process of automa...Sign language,a visual-gestural language used by the deaf and hard-of-hearing community,plays a crucial role in facilitating communication and promoting inclusivity.Sign language recognition(SLR),the process of automatically recognizing and interpreting sign language gestures,has gained significant attention in recent years due to its potential to bridge the communication gap between the hearing impaired and the hearing world.The emergence and continuous development of deep learning techniques have provided inspiration and momentum for advancing SLR.This paper presents a comprehensive and up-to-date analysis of the advancements,challenges,and opportunities in deep learning-based sign language recognition,focusing on the past five years of research.We explore various aspects of SLR,including sign data acquisition technologies,sign language datasets,evaluation methods,and different types of neural networks.Convolutional Neural Networks(CNN)and Recurrent Neural Networks(RNN)have shown promising results in fingerspelling and isolated sign recognition.However,the continuous nature of sign language poses challenges,leading to the exploration of advanced neural network models such as the Transformer model for continuous sign language recognition(CSLR).Despite significant advancements,several challenges remain in the field of SLR.These challenges include expanding sign language datasets,achieving user independence in recognition systems,exploring different input modalities,effectively fusing features,modeling co-articulation,and improving semantic and syntactic understanding.Additionally,developing lightweight network architectures for mobile applications is crucial for practical implementation.By addressing these challenges,we can further advance the field of deep learning for sign language recognition and improve communication for the hearing-impaired community.展开更多
Sign language recognition is vital for enhancing communication accessibility among the Deaf and hard-of-hearing communities.In Japan,approximately 360,000 individualswith hearing and speech disabilities rely on Japane...Sign language recognition is vital for enhancing communication accessibility among the Deaf and hard-of-hearing communities.In Japan,approximately 360,000 individualswith hearing and speech disabilities rely on Japanese Sign Language(JSL)for communication.However,existing JSL recognition systems have faced significant performance limitations due to inherent complexities.In response to these challenges,we present a novel JSL recognition system that employs a strategic fusion approach,combining joint skeleton-based handcrafted features and pixel-based deep learning features.Our system incorporates two distinct streams:the first stream extracts crucial handcrafted features,emphasizing the capture of hand and body movements within JSL gestures.Simultaneously,a deep learning-based transfer learning stream captures hierarchical representations of JSL gestures in the second stream.Then,we concatenated the critical information of the first stream and the hierarchy of the second stream features to produce the multiple levels of the fusion features,aiming to create a comprehensive representation of the JSL gestures.After reducing the dimensionality of the feature,a feature selection approach and a kernel-based support vector machine(SVM)were used for the classification.To assess the effectiveness of our approach,we conducted extensive experiments on our Lab JSL dataset and a publicly available Arabic sign language(ArSL)dataset.Our results unequivocally demonstrate that our fusion approach significantly enhances JSL recognition accuracy and robustness compared to individual feature sets or traditional recognition methods.展开更多
In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a p...In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues, we propose a speech-text multimodal dual-tower architecture for speech recognition. It employs cross-modal interactions to achieve close semantic alignment during the encoding stage and strengthen its capabilities in modeling auditory long-distance context dependencies. In addition, a two-stage training strategy is elaborately devised to derive semantics-aware acoustic representations effectively. The first stage focuses on pre-training the speech-text multimodal encoding module to enhance inter-modal semantic alignment and aural long-distance context dependencies. The second stage fine-tunes the entire network to bridge the input modality variation gap between the training and inference phases and boost generalization performance. Extensive experiments demonstrate the effectiveness of the proposed speech-text multimodal speech recognition method on the ATCC and AISHELL-1 datasets. It reduces the character error rate to 6.54% and 8.73%, respectively, and exhibits substantial performance gains of 28.76% and 23.82% compared with the best baseline model. The case studies indicate that the obtained semantics-aware acoustic representations aid in accurately recognizing terms with similar pronunciations but distinctive semantics. The research provides a novel modeling paradigm for semantics-aware speech recognition in air traffic control communications, which could contribute to the advancement of intelligent and efficient aviation safety management.展开更多
Research on Chinese Sign Language(CSL)provides convenience and support for individuals with hearing impairments to communicate and integrate into society.This article reviews the relevant literature on Chinese Sign La...Research on Chinese Sign Language(CSL)provides convenience and support for individuals with hearing impairments to communicate and integrate into society.This article reviews the relevant literature on Chinese Sign Language Recognition(CSLR)in the past 20 years.Hidden Markov Models(HMM),Support Vector Machines(SVM),and Dynamic Time Warping(DTW)were found to be the most commonly employed technologies among traditional identificationmethods.Benefiting from the rapid development of computer vision and artificial intelligence technology,Convolutional Neural Networks(CNN),3D-CNN,YOLO,Capsule Network(CapsNet)and various deep neural networks have sprung up.Deep Neural Networks(DNNs)and their derived models are integral tomodern artificial intelligence recognitionmethods.In addition,technologies thatwerewidely used in the early days have also been integrated and applied to specific hybrid models and customized identification methods.Sign language data collection includes acquiring data from data gloves,data sensors(such as Kinect,LeapMotion,etc.),and high-definition photography.Meanwhile,facial expression recognition,complex background processing,and 3D sign language recognition have also attracted research interests among scholars.Due to the uniqueness and complexity of Chinese sign language,accuracy,robustness,real-time performance,and user independence are significant challenges for future sign language recognition research.Additionally,suitable datasets and evaluation criteria are also worth pursuing.展开更多
The hands and face are the most important parts for expressing sign language morphemes in sign language videos.However,we find that existing Continuous Sign Language Recognition(CSLR)methods lack the mining of hand an...The hands and face are the most important parts for expressing sign language morphemes in sign language videos.However,we find that existing Continuous Sign Language Recognition(CSLR)methods lack the mining of hand and face information in visual backbones or use expensive and time-consuming external extractors to explore this information.In addition,the signs have different lengths,whereas previous CSLR methods typically use a fixed-length window to segment the video to capture sequential features and then perform global temporal modeling,which disturbs the perception of complete signs.In this study,we propose a Multi-Scale Context-Aware network(MSCA-Net)to solve the aforementioned problems.Our MSCA-Net contains two main modules:(1)Multi-Scale Motion Attention(MSMA),which uses the differences among frames to perceive information of the hands and face in multiple spatial scales,replacing the heavy feature extractors;and(2)Multi-Scale Temporal Modeling(MSTM),which explores crucial temporal information in the sign language video from different temporal scales.We conduct extensive experiments using three widely used sign language datasets,i.e.,RWTH-PHOENIX-Weather-2014,RWTH-PHOENIX-Weather-2014T,and CSL-Daily.The proposed MSCA-Net achieve state-of-the-art performance,demonstrating the effectiveness of our approach.展开更多
The development of scientific inquiry and research has yielded numerous benefits in the realm of intelligent traffic control systems, particularly in the realm of automatic license plate recognition for vehicles. The ...The development of scientific inquiry and research has yielded numerous benefits in the realm of intelligent traffic control systems, particularly in the realm of automatic license plate recognition for vehicles. The design of license plate recognition algorithms has undergone digitalization through the utilization of neural networks. In contemporary times, there is a growing demand for vehicle surveillance due to the need for efficient vehicle processing and traffic management. The design, development, and implementation of a license plate recognition system hold significant social, economic, and academic importance. The study aims to present contemporary methodologies and empirical findings pertaining to automated license plate recognition. The primary focus of the automatic license plate recognition algorithm was on image extraction, character segmentation, and recognition. The task of character segmentation has been identified as the most challenging function based on my observations. The license plate recognition project that we designed demonstrated the effectiveness of this method across various observed conditions. Particularly in low-light environments, such as during periods of limited illumination or inclement weather characterized by precipitation. The method has been subjected to testing using a sample size of fifty images, resulting in a 100% accuracy rate. The findings of this study demonstrate the project’s ability to effectively determine the optimal outcomes of simulations.展开更多
A novel traffic sign recognition system is presented in this work. Firstly, the color segmentation and shape classifier based on signature feature of region are used to detect traffic signs in input video sequences. S...A novel traffic sign recognition system is presented in this work. Firstly, the color segmentation and shape classifier based on signature feature of region are used to detect traffic signs in input video sequences. Secondly, traffic sign color-image is preprocessed with gray scaling, and normalized to 64×64 size. Then, image features could be obtained by four levels DT-CWT images. Thirdly, 2DICA and nearest neighbor classifier are united to recognize traffic signs. The whole recognition algorithm is implemented for classification of 50 categories of traffic signs and its recognition accuracy reaches 90%. Comparing image representation DT-CWT with the well-established image representation like template, Gabor, and 2DICA with feature selection techniques such as PCA, LPP, 2DPCA at the same time, the results show that combination method of DT-CWT and 2DICA is useful in traffic signs recognition. Experimental results indicate that the proposed algorithm is robust, effective and accurate.展开更多
Road traffic sign recognition is an important task in intelligent transportation system.Convolutional neural networks(CNNs)have achieved a breakthrough in computer vision tasks and made great success in traffic sign c...Road traffic sign recognition is an important task in intelligent transportation system.Convolutional neural networks(CNNs)have achieved a breakthrough in computer vision tasks and made great success in traffic sign classification.In this paper,it presents a road traffic sign recognition algorithm based on a convolutional neural network.In natural scenes,traffic signs are disturbed by factors such as illumination,occlusion,missing and deformation,and the accuracy of recognition decreases,this paper proposes a model called Improved VGG(IVGG)inspired by VGG model.The IVGG model includes 9 layers,compared with the original VGG model,it is added max-pooling operation and dropout operation after multiple convolutional layers,to catch the main features and save the training time.The paper proposes the method which adds dropout and Batch Normalization(BN)operations after each fully-connected layer,to further accelerate the model convergence,and then it can get better classification effect.It uses the German Traffic Sign Recognition Benchmark(GTSRB)dataset in the experiment.The IVGG model enhances the recognition rate of traffic signs and robustness by using the data augmentation and transfer learning,and the spent time is also reduced greatly.展开更多
Recognizing various traffic signs,especially the popular circular traffic signs,is an essential task for implementing advanced driver assistance system.To recognize circular traffic signs with high accuracy and robust...Recognizing various traffic signs,especially the popular circular traffic signs,is an essential task for implementing advanced driver assistance system.To recognize circular traffic signs with high accuracy and robustness,a novel approach which uses the so-called improved constrained binary fast radial symmetry(ICBFRS) detector and pseudo-zernike moments based support vector machine(PZM-SVM) classifier is proposed.In the detection stage,the scene image containing the traffic signs will be converted into Lab color space for color segmentation.Then the ICBFRS detector can efficiently capture the position and scale of sign candidates within the scene by detecting the centers of circles.In the classification stage,once the candidates are cropped out of the image,pseudo-zernike moments are adopted to represent the features of extracted pictogram,which are then fed into a support vector machine to classify different traffic signs.Experimental results under different lighting conditions indicate that the proposed method has robust detection effect and high classification accuracy.展开更多
This paper presents the implementation of an embedded automotive system that detects and recognizes traffic signs within a video stream. In addition, it discusses the recent advances in driver assistance technologies ...This paper presents the implementation of an embedded automotive system that detects and recognizes traffic signs within a video stream. In addition, it discusses the recent advances in driver assistance technologies and highlights the safety motivations for smart in-car embedded systems. An algorithm is presented that processes RGB image data, extracts relevant pixels, filters the image, labels prospective traffic signs and evaluates them against template traffic sign images. A reconfigurable hardware system is described which uses the Virtex-5 Xilinx FPGA and hardware/software co-design tools in order to create an embedded processor and the necessary hardware IP peripherals. The implementation is shown to have robust performance results, both in terms of timing and accuracy.展开更多
The infrastructure and construction of roads are crucial for the economic and social development of a region,but traffic-related challenges like accidents and congestion persist.Artificial Intelligence(AI)and Machine ...The infrastructure and construction of roads are crucial for the economic and social development of a region,but traffic-related challenges like accidents and congestion persist.Artificial Intelligence(AI)and Machine Learning(ML)have been used in road infrastructure and construction,particularly with the Internet of Things(IoT)devices.Object detection in Computer Vision also plays a key role in improving road infrastructure and addressing trafficrelated problems.This study aims to use You Only Look Once version 7(YOLOv7),Convolutional Block Attention Module(CBAM),the most optimized object-detection algorithm,to detect and identify traffic signs,and analyze effective combinations of adaptive optimizers like Adaptive Moment estimation(Adam),Root Mean Squared Propagation(RMSprop)and Stochastic Gradient Descent(SGD)with the YOLOv7.Using a portion of German traffic signs for training,the study investigates the feasibility of adopting smaller datasets while maintaining high accuracy.The model proposed in this study not only improves traffic safety by detecting traffic signs but also has the potential to contribute to the rapid development of autonomous vehicle systems.The study results showed an impressive accuracy of 99.7%when using a batch size of 8 and the Adam optimizer.This high level of accuracy demonstrates the effectiveness of the proposed model for the image classification task of traffic sign recognition.展开更多
The paper covers analysis and investigation of lighting automation system in low-traffic long-roads. The main objective is to provide optimal solution between expensive safe design that utilizes continuous street ligh...The paper covers analysis and investigation of lighting automation system in low-traffic long-roads. The main objective is to provide optimal solution between expensive safe design that utilizes continuous street lighting system at night for the entire road, or inexpensive design that sacrifices the safety, relying on using vehicles lighting, to eliminate the problem of high cost energy consumption during the night operation of the road. By taking into account both of these factors, smart lighting automation system is proposed using Pattern Recognition Technique applied on vehicle number-plates. In this proposal, the road is sectionalized into zones, and based on smart Pattern Recognition Technique, the control system of the road lighting illuminates only the zone that the vehicles pass through. Economic analysis is provided in this paper to support the value of using this design of lighting control system.展开更多
The features extracted by principle component analysis(PCA) are the best descriptive and the features extracted by linear discriminant analysis(LDA) are the most classifiable. In this paper, these two methods are comb...The features extracted by principle component analysis(PCA) are the best descriptive and the features extracted by linear discriminant analysis(LDA) are the most classifiable. In this paper, these two methods are combined and a PC-LDA approach is used to extract the features of traffic signs. After obtaining the binary images of the traffic signs through normalization and binarization, PC-LDA can extract the feature subspace of the traffic sign images with the best description and classification. The extracted features are recognized by using the minimum distance classifier. The approach is verified by using MPEG7 CE Shape-1 Part-B computer shape library and traffic sign image library which includes both standard and natural traffic signs. The results show that under the condition that the traffic sign is in a nature scene, PC-LDA approach applied to binary images in which shape features are extracted can obtain better results.展开更多
In the field of traffic sign recognition,traffic signs usually occupy very small areas in the input image.Most object detection algorithms directly reduce the original image to a specific size for the input model duri...In the field of traffic sign recognition,traffic signs usually occupy very small areas in the input image.Most object detection algorithms directly reduce the original image to a specific size for the input model during the detection process,which leads to the loss of small object information.Addi-tionally,classification tasks are more sensitive to information loss than local-ization tasks.This paper proposes a novel traffic sign recognition approach,in which a lightweight pre-locator network and a refined classification network are incorporated.The pre-locator network locates the sub-regions of the traffic signs from the original image,and the refined classification network performs the refinement recognition task in the sub-regions.Moreover,an innovative module(named SPP-ST)is proposed,which combines the Spatial Pyramid Pool module(SPP)and the Swin-Transformer module as a new feature extractor to learn the special spatial information of traffic sign effec-tively.Experimental results show that the proposed method is superior to the state-of-the-art methods(82.1 mAP achieved on 218 categories in the TT100k dataset,an improvement of 19.7 percentage points compared to the previous method).Moreover,both the result analysis and the output visualizations further demonstrate the effectiveness of our proposed method.The source code and datasets of this work are available at https://github.com/DijiesitelaQ/TSOD.展开更多
Traffic sign recognition (TSR, or Road Sign Recognition, RSR) is one of the Advanced Driver Assistance System (ADAS) devices in modern cars. To concern the most important issues, which are real-time and resource effic...Traffic sign recognition (TSR, or Road Sign Recognition, RSR) is one of the Advanced Driver Assistance System (ADAS) devices in modern cars. To concern the most important issues, which are real-time and resource efficiency, we propose a high efficiency hardware implementation for TSR. We divide the TSR procedure into two stages, detection and recognition. In the detection stage, under the assumption that most German traffic signs have red or blue colors with circle, triangle or rectangle shapes, we use Normalized RGB color transform and Single-Pass Connected Component Labeling (CCL) to find the potential traffic signs efficiently. For Single-Pass CCL, our contribution is to eliminate the “merge-stack” operations by recording connected relations of region in the scan phase and updating the labels in the iterating phase. In the recognition stage, the Histogram of Oriented Gradient (HOG) is used to generate the descriptor of the signs, and we classify the signs with Support Vector Machine (SVM). In the HOG module, we analyze the required minimum bits under different recognition rate. The proposed method achieves 96.61% detection rate and 90.85% recognition rate while testing with the GTSDB dataset. Our hardware implementation reduces the storage of CCL and simplifies the HOG computation. Main CCL storage size is reduced by 20% comparing to the most advanced design under typical condition. By using TSMC 90 nm technology, the proposed design operates at 105 MHz clock rate and processes in 135 fps with the image size of 1360 × 800. The chip size is about 1 mm2 and the power consumption is close to 8 mW. Therefore, this work is resource efficient and achieves real-time requirement.展开更多
The accurate and efficient classification of Internet traffic is the first and key step to ac-curate traffic management,network security and traffic analysis. The classic ways to identify flows is either inaccurate or...The accurate and efficient classification of Internet traffic is the first and key step to ac-curate traffic management,network security and traffic analysis. The classic ways to identify flows is either inaccurate or inefficient,which are not suitable to be applied to real-time online classification. In this paper,we originally presented an early recognition method named Early Recognition Based on Deep Packet Inspection (ERBDPI) based on deep packet inspection,after analyzing the distribution of payload signature between packets of a flow in detail. The basic concept of ERBDPI is classifying flows based on the payload signature of their first some packets,so that we can identify traffic at the be-ginning of a flow connection. We compared the performance of ERBDPI with that of traditional sampling methods both synthetically and using real-world traffic traces. The result shows that ERBDPI can get a higher classification accuracy with a lower packet sampling rate,which makes it suitable to be applied to accurate real-time classification in high-speed links.展开更多
With the progress of deep learning research, convolutional neural networks have become the most important method in feature extraction. How to effectively classify and recognize the extracted features will directly af...With the progress of deep learning research, convolutional neural networks have become the most important method in feature extraction. How to effectively classify and recognize the extracted features will directly affect the performance of the entire network. Traditional processing methods include classification models such as fully connected network models and support vector machines. In order to solve the problem that the traditional convolutional neural network is prone to over-fitting for the classification of small samples, a CNN-TWSVM hybrid model was proposed by fusing the twin support vector machine (TWSVM) with higher computational efficiency as the CNN classifier, and it was applied to the traffic sign recognition task. In order to improve the generalization ability of the model, the wavelet kernel function is introduced to deal with the nonlinear classification task. The method uses the network initialized from the ImageNet dataset to fine-tune the specific domain and intercept the inner layer of the network to extract the high abstract features of the traffic sign image. Finally, the TWSVM based on wavelet kernel function is used to identify the traffic signs, so as to effectively solve the over-fitting problem of traffic signs classification. On GTSRB and BELGIUMTS datasets, the validity and generalization ability of the improved model is verified by comparing with different kernel functions and different SVM classifiers.展开更多
Designing accurate and time-efficient real-time traffic sign recognition systems is a crucial part of developing the intelligent vehicle which is the main agent in the intelligent transportation system.Traffic sign re...Designing accurate and time-efficient real-time traffic sign recognition systems is a crucial part of developing the intelligent vehicle which is the main agent in the intelligent transportation system.Traffic sign recognition systems consist of an initial detection phase where images transportaand colors are segmented and fed to the recognition phase.The most challenging process in such systems in terms of time consumption is the detection phase.The trade off in previous studies,which proposed different methods for detecting traffic signs,is between accuracy and computation time,Therefore,this paper presents a novel accurate and time-efficient color segmentation approach based on logistic regression.We used RGB color space as the domain to extract the features of our hypothesis;this has boosted the speed of our approach since no color conversion is needed.Our trained segmentation classifier was tested on 1000 traffic sign images taken in different lighting conditions.The results show that our approach segmented 974 of these images correctly and in a time less than one-fifth of the time needed by any other robust segmentation method.展开更多
Traffic sign recognition is an important task in intelligent transportation systems, which can improve road safety and reduce accidents. Algorithms based on deep learning have achieved remarkable results in traffic si...Traffic sign recognition is an important task in intelligent transportation systems, which can improve road safety and reduce accidents. Algorithms based on deep learning have achieved remarkable results in traffic sign recognition in recent years. In this paper, we build traffic sign recognition algorithms based on ResNet and CNN models, respectively. We evaluate the proposed algorithm on public datasets and compare. We first use the dataset of traffic sign images from Kaggle. And then designed ResNet-based and CNN-based architectures that can effectively capture the complex features of traffic signs. Our experiments show that our ResNet-based model achieves a recognition accuracy of 99% on the test set, and our CNN-based model achieves a recognition accuracy of 98% on the test set. Our proposed approach has the potential to improve traffic safety and can be used in various intelligent transportation systems.展开更多
Background:The rapid development of the automobile industry has led to an increase in the output and holdings of automobiles year by year,which has brought huge challenges to the current traffic management.Method:This...Background:The rapid development of the automobile industry has led to an increase in the output and holdings of automobiles year by year,which has brought huge challenges to the current traffic management.Method:This paper adopts a traffic sign recognition technology based on deep convolution neural network(CNN):step 1,preprocess the collected traffic sign images through gray processing and near interpolation;step 2,automatically extract image features through the convolutional layer and the pooling layer;step 3,recognize traffic signs through the fully connected layer and the Dropout technology.Purpose:Artificial intelligence technology is applied to traffic management to better realize intelligent traffic assisted driving.Results:This paper adopts an Adam optimization algorithm for calculating the loss value.The average accuracy of the experimental classification is 98.87%.Compared with the traditional gradient descent algorithm,the experimental model can quickly converge in a few iteration cycles.展开更多
基金supported from the National Philosophy and Social Sciences Foundation(Grant No.20BTQ065).
文摘Sign language,a visual-gestural language used by the deaf and hard-of-hearing community,plays a crucial role in facilitating communication and promoting inclusivity.Sign language recognition(SLR),the process of automatically recognizing and interpreting sign language gestures,has gained significant attention in recent years due to its potential to bridge the communication gap between the hearing impaired and the hearing world.The emergence and continuous development of deep learning techniques have provided inspiration and momentum for advancing SLR.This paper presents a comprehensive and up-to-date analysis of the advancements,challenges,and opportunities in deep learning-based sign language recognition,focusing on the past five years of research.We explore various aspects of SLR,including sign data acquisition technologies,sign language datasets,evaluation methods,and different types of neural networks.Convolutional Neural Networks(CNN)and Recurrent Neural Networks(RNN)have shown promising results in fingerspelling and isolated sign recognition.However,the continuous nature of sign language poses challenges,leading to the exploration of advanced neural network models such as the Transformer model for continuous sign language recognition(CSLR).Despite significant advancements,several challenges remain in the field of SLR.These challenges include expanding sign language datasets,achieving user independence in recognition systems,exploring different input modalities,effectively fusing features,modeling co-articulation,and improving semantic and syntactic understanding.Additionally,developing lightweight network architectures for mobile applications is crucial for practical implementation.By addressing these challenges,we can further advance the field of deep learning for sign language recognition and improve communication for the hearing-impaired community.
基金supported by the Competitive Research Fund of the University of Aizu,Japan.
文摘Sign language recognition is vital for enhancing communication accessibility among the Deaf and hard-of-hearing communities.In Japan,approximately 360,000 individualswith hearing and speech disabilities rely on Japanese Sign Language(JSL)for communication.However,existing JSL recognition systems have faced significant performance limitations due to inherent complexities.In response to these challenges,we present a novel JSL recognition system that employs a strategic fusion approach,combining joint skeleton-based handcrafted features and pixel-based deep learning features.Our system incorporates two distinct streams:the first stream extracts crucial handcrafted features,emphasizing the capture of hand and body movements within JSL gestures.Simultaneously,a deep learning-based transfer learning stream captures hierarchical representations of JSL gestures in the second stream.Then,we concatenated the critical information of the first stream and the hierarchy of the second stream features to produce the multiple levels of the fusion features,aiming to create a comprehensive representation of the JSL gestures.After reducing the dimensionality of the feature,a feature selection approach and a kernel-based support vector machine(SVM)were used for the classification.To assess the effectiveness of our approach,we conducted extensive experiments on our Lab JSL dataset and a publicly available Arabic sign language(ArSL)dataset.Our results unequivocally demonstrate that our fusion approach significantly enhances JSL recognition accuracy and robustness compared to individual feature sets or traditional recognition methods.
基金This research was funded by Shenzhen Science and Technology Program(Grant No.RCBS20221008093121051)the General Higher Education Project of Guangdong Provincial Education Department(Grant No.2020ZDZX3085)+1 种基金China Postdoctoral Science Foundation(Grant No.2021M703371)the Post-Doctoral Foundation Project of Shenzhen Polytechnic(Grant No.6021330002K).
文摘In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues, we propose a speech-text multimodal dual-tower architecture for speech recognition. It employs cross-modal interactions to achieve close semantic alignment during the encoding stage and strengthen its capabilities in modeling auditory long-distance context dependencies. In addition, a two-stage training strategy is elaborately devised to derive semantics-aware acoustic representations effectively. The first stage focuses on pre-training the speech-text multimodal encoding module to enhance inter-modal semantic alignment and aural long-distance context dependencies. The second stage fine-tunes the entire network to bridge the input modality variation gap between the training and inference phases and boost generalization performance. Extensive experiments demonstrate the effectiveness of the proposed speech-text multimodal speech recognition method on the ATCC and AISHELL-1 datasets. It reduces the character error rate to 6.54% and 8.73%, respectively, and exhibits substantial performance gains of 28.76% and 23.82% compared with the best baseline model. The case studies indicate that the obtained semantics-aware acoustic representations aid in accurately recognizing terms with similar pronunciations but distinctive semantics. The research provides a novel modeling paradigm for semantics-aware speech recognition in air traffic control communications, which could contribute to the advancement of intelligent and efficient aviation safety management.
基金supported by National Social Science Foundation Annual Project“Research on Evaluation and Improvement Paths of Integrated Development of Disabled Persons”(Grant No.20BRK029)the National Language Commission’s“14th Five-Year Plan”Scientific Research Plan 2023 Project“Domain Digital Language Service Resource Construction and Key Technology Research”(YB145-72)the National Philosophy and Social Sciences Foundation(Grant No.20BTQ065).
文摘Research on Chinese Sign Language(CSL)provides convenience and support for individuals with hearing impairments to communicate and integrate into society.This article reviews the relevant literature on Chinese Sign Language Recognition(CSLR)in the past 20 years.Hidden Markov Models(HMM),Support Vector Machines(SVM),and Dynamic Time Warping(DTW)were found to be the most commonly employed technologies among traditional identificationmethods.Benefiting from the rapid development of computer vision and artificial intelligence technology,Convolutional Neural Networks(CNN),3D-CNN,YOLO,Capsule Network(CapsNet)and various deep neural networks have sprung up.Deep Neural Networks(DNNs)and their derived models are integral tomodern artificial intelligence recognitionmethods.In addition,technologies thatwerewidely used in the early days have also been integrated and applied to specific hybrid models and customized identification methods.Sign language data collection includes acquiring data from data gloves,data sensors(such as Kinect,LeapMotion,etc.),and high-definition photography.Meanwhile,facial expression recognition,complex background processing,and 3D sign language recognition have also attracted research interests among scholars.Due to the uniqueness and complexity of Chinese sign language,accuracy,robustness,real-time performance,and user independence are significant challenges for future sign language recognition research.Additionally,suitable datasets and evaluation criteria are also worth pursuing.
基金Supported by the National Natural Science Foundation of China(62072334).
文摘The hands and face are the most important parts for expressing sign language morphemes in sign language videos.However,we find that existing Continuous Sign Language Recognition(CSLR)methods lack the mining of hand and face information in visual backbones or use expensive and time-consuming external extractors to explore this information.In addition,the signs have different lengths,whereas previous CSLR methods typically use a fixed-length window to segment the video to capture sequential features and then perform global temporal modeling,which disturbs the perception of complete signs.In this study,we propose a Multi-Scale Context-Aware network(MSCA-Net)to solve the aforementioned problems.Our MSCA-Net contains two main modules:(1)Multi-Scale Motion Attention(MSMA),which uses the differences among frames to perceive information of the hands and face in multiple spatial scales,replacing the heavy feature extractors;and(2)Multi-Scale Temporal Modeling(MSTM),which explores crucial temporal information in the sign language video from different temporal scales.We conduct extensive experiments using three widely used sign language datasets,i.e.,RWTH-PHOENIX-Weather-2014,RWTH-PHOENIX-Weather-2014T,and CSL-Daily.The proposed MSCA-Net achieve state-of-the-art performance,demonstrating the effectiveness of our approach.
文摘The development of scientific inquiry and research has yielded numerous benefits in the realm of intelligent traffic control systems, particularly in the realm of automatic license plate recognition for vehicles. The design of license plate recognition algorithms has undergone digitalization through the utilization of neural networks. In contemporary times, there is a growing demand for vehicle surveillance due to the need for efficient vehicle processing and traffic management. The design, development, and implementation of a license plate recognition system hold significant social, economic, and academic importance. The study aims to present contemporary methodologies and empirical findings pertaining to automated license plate recognition. The primary focus of the automatic license plate recognition algorithm was on image extraction, character segmentation, and recognition. The task of character segmentation has been identified as the most challenging function based on my observations. The license plate recognition project that we designed demonstrated the effectiveness of this method across various observed conditions. Particularly in low-light environments, such as during periods of limited illumination or inclement weather characterized by precipitation. The method has been subjected to testing using a sample size of fifty images, resulting in a 100% accuracy rate. The findings of this study demonstrate the project’s ability to effectively determine the optimal outcomes of simulations.
基金Projects(90820302, 60805027) supported by the National Natural Science Foundation of ChinaProject(200805330005) supported by Research Fund for Doctoral Program of Higher Education, ChinaProject(2009FJ4030) supported by Academician Foundation of Hunan Province, China
文摘A novel traffic sign recognition system is presented in this work. Firstly, the color segmentation and shape classifier based on signature feature of region are used to detect traffic signs in input video sequences. Secondly, traffic sign color-image is preprocessed with gray scaling, and normalized to 64×64 size. Then, image features could be obtained by four levels DT-CWT images. Thirdly, 2DICA and nearest neighbor classifier are united to recognize traffic signs. The whole recognition algorithm is implemented for classification of 50 categories of traffic signs and its recognition accuracy reaches 90%. Comparing image representation DT-CWT with the well-established image representation like template, Gabor, and 2DICA with feature selection techniques such as PCA, LPP, 2DPCA at the same time, the results show that combination method of DT-CWT and 2DICA is useful in traffic signs recognition. Experimental results indicate that the proposed algorithm is robust, effective and accurate.
文摘Road traffic sign recognition is an important task in intelligent transportation system.Convolutional neural networks(CNNs)have achieved a breakthrough in computer vision tasks and made great success in traffic sign classification.In this paper,it presents a road traffic sign recognition algorithm based on a convolutional neural network.In natural scenes,traffic signs are disturbed by factors such as illumination,occlusion,missing and deformation,and the accuracy of recognition decreases,this paper proposes a model called Improved VGG(IVGG)inspired by VGG model.The IVGG model includes 9 layers,compared with the original VGG model,it is added max-pooling operation and dropout operation after multiple convolutional layers,to catch the main features and save the training time.The paper proposes the method which adds dropout and Batch Normalization(BN)operations after each fully-connected layer,to further accelerate the model convergence,and then it can get better classification effect.It uses the German Traffic Sign Recognition Benchmark(GTSRB)dataset in the experiment.The IVGG model enhances the recognition rate of traffic signs and robustness by using the data augmentation and transfer learning,and the spent time is also reduced greatly.
基金Supported by the Program for Changjiang Scholars and Innovative Research Team (2008)Program for New Centoury Excellent Talents in University(NCET-09-0045)+1 种基金the National Nat-ural Science Foundation of China (60773044,61004059)the Natural Science Foundation of Beijing(4101001)
文摘Recognizing various traffic signs,especially the popular circular traffic signs,is an essential task for implementing advanced driver assistance system.To recognize circular traffic signs with high accuracy and robustness,a novel approach which uses the so-called improved constrained binary fast radial symmetry(ICBFRS) detector and pseudo-zernike moments based support vector machine(PZM-SVM) classifier is proposed.In the detection stage,the scene image containing the traffic signs will be converted into Lab color space for color segmentation.Then the ICBFRS detector can efficiently capture the position and scale of sign candidates within the scene by detecting the centers of circles.In the classification stage,once the candidates are cropped out of the image,pseudo-zernike moments are adopted to represent the features of extracted pictogram,which are then fed into a support vector machine to classify different traffic signs.Experimental results under different lighting conditions indicate that the proposed method has robust detection effect and high classification accuracy.
文摘This paper presents the implementation of an embedded automotive system that detects and recognizes traffic signs within a video stream. In addition, it discusses the recent advances in driver assistance technologies and highlights the safety motivations for smart in-car embedded systems. An algorithm is presented that processes RGB image data, extracts relevant pixels, filters the image, labels prospective traffic signs and evaluates them against template traffic sign images. A reconfigurable hardware system is described which uses the Virtex-5 Xilinx FPGA and hardware/software co-design tools in order to create an embedded processor and the necessary hardware IP peripherals. The implementation is shown to have robust performance results, both in terms of timing and accuracy.
文摘The infrastructure and construction of roads are crucial for the economic and social development of a region,but traffic-related challenges like accidents and congestion persist.Artificial Intelligence(AI)and Machine Learning(ML)have been used in road infrastructure and construction,particularly with the Internet of Things(IoT)devices.Object detection in Computer Vision also plays a key role in improving road infrastructure and addressing trafficrelated problems.This study aims to use You Only Look Once version 7(YOLOv7),Convolutional Block Attention Module(CBAM),the most optimized object-detection algorithm,to detect and identify traffic signs,and analyze effective combinations of adaptive optimizers like Adaptive Moment estimation(Adam),Root Mean Squared Propagation(RMSprop)and Stochastic Gradient Descent(SGD)with the YOLOv7.Using a portion of German traffic signs for training,the study investigates the feasibility of adopting smaller datasets while maintaining high accuracy.The model proposed in this study not only improves traffic safety by detecting traffic signs but also has the potential to contribute to the rapid development of autonomous vehicle systems.The study results showed an impressive accuracy of 99.7%when using a batch size of 8 and the Adam optimizer.This high level of accuracy demonstrates the effectiveness of the proposed model for the image classification task of traffic sign recognition.
文摘The paper covers analysis and investigation of lighting automation system in low-traffic long-roads. The main objective is to provide optimal solution between expensive safe design that utilizes continuous street lighting system at night for the entire road, or inexpensive design that sacrifices the safety, relying on using vehicles lighting, to eliminate the problem of high cost energy consumption during the night operation of the road. By taking into account both of these factors, smart lighting automation system is proposed using Pattern Recognition Technique applied on vehicle number-plates. In this proposal, the road is sectionalized into zones, and based on smart Pattern Recognition Technique, the control system of the road lighting illuminates only the zone that the vehicles pass through. Economic analysis is provided in this paper to support the value of using this design of lighting control system.
基金Supported by National Natural Science Foundation of China(No.61540069)
文摘The features extracted by principle component analysis(PCA) are the best descriptive and the features extracted by linear discriminant analysis(LDA) are the most classifiable. In this paper, these two methods are combined and a PC-LDA approach is used to extract the features of traffic signs. After obtaining the binary images of the traffic signs through normalization and binarization, PC-LDA can extract the feature subspace of the traffic sign images with the best description and classification. The extracted features are recognized by using the minimum distance classifier. The approach is verified by using MPEG7 CE Shape-1 Part-B computer shape library and traffic sign image library which includes both standard and natural traffic signs. The results show that under the condition that the traffic sign is in a nature scene, PC-LDA approach applied to binary images in which shape features are extracted can obtain better results.
基金supported by the Natural Science Foundation of Sichuan,China (No.2022NSFSC0571)the Sichuan Science and Technology Program (No.2018JY0273,No.2019YJ0532)+1 种基金supported by funding of V.C.&V.R.Key Lab of Sichuan Province (No.SCVCVR2020.05VS)supported by the China Scholarship Council (No.201908510026).
文摘In the field of traffic sign recognition,traffic signs usually occupy very small areas in the input image.Most object detection algorithms directly reduce the original image to a specific size for the input model during the detection process,which leads to the loss of small object information.Addi-tionally,classification tasks are more sensitive to information loss than local-ization tasks.This paper proposes a novel traffic sign recognition approach,in which a lightweight pre-locator network and a refined classification network are incorporated.The pre-locator network locates the sub-regions of the traffic signs from the original image,and the refined classification network performs the refinement recognition task in the sub-regions.Moreover,an innovative module(named SPP-ST)is proposed,which combines the Spatial Pyramid Pool module(SPP)and the Swin-Transformer module as a new feature extractor to learn the special spatial information of traffic sign effec-tively.Experimental results show that the proposed method is superior to the state-of-the-art methods(82.1 mAP achieved on 218 categories in the TT100k dataset,an improvement of 19.7 percentage points compared to the previous method).Moreover,both the result analysis and the output visualizations further demonstrate the effectiveness of our proposed method.The source code and datasets of this work are available at https://github.com/DijiesitelaQ/TSOD.
文摘Traffic sign recognition (TSR, or Road Sign Recognition, RSR) is one of the Advanced Driver Assistance System (ADAS) devices in modern cars. To concern the most important issues, which are real-time and resource efficiency, we propose a high efficiency hardware implementation for TSR. We divide the TSR procedure into two stages, detection and recognition. In the detection stage, under the assumption that most German traffic signs have red or blue colors with circle, triangle or rectangle shapes, we use Normalized RGB color transform and Single-Pass Connected Component Labeling (CCL) to find the potential traffic signs efficiently. For Single-Pass CCL, our contribution is to eliminate the “merge-stack” operations by recording connected relations of region in the scan phase and updating the labels in the iterating phase. In the recognition stage, the Histogram of Oriented Gradient (HOG) is used to generate the descriptor of the signs, and we classify the signs with Support Vector Machine (SVM). In the HOG module, we analyze the required minimum bits under different recognition rate. The proposed method achieves 96.61% detection rate and 90.85% recognition rate while testing with the GTSDB dataset. Our hardware implementation reduces the storage of CCL and simplifies the HOG computation. Main CCL storage size is reduced by 20% comparing to the most advanced design under typical condition. By using TSMC 90 nm technology, the proposed design operates at 105 MHz clock rate and processes in 135 fps with the image size of 1360 × 800. The chip size is about 1 mm2 and the power consumption is close to 8 mW. Therefore, this work is resource efficient and achieves real-time requirement.
基金Supported by grant from the Major State Basic Research Development Program of China (No.2007CB307102)
文摘The accurate and efficient classification of Internet traffic is the first and key step to ac-curate traffic management,network security and traffic analysis. The classic ways to identify flows is either inaccurate or inefficient,which are not suitable to be applied to real-time online classification. In this paper,we originally presented an early recognition method named Early Recognition Based on Deep Packet Inspection (ERBDPI) based on deep packet inspection,after analyzing the distribution of payload signature between packets of a flow in detail. The basic concept of ERBDPI is classifying flows based on the payload signature of their first some packets,so that we can identify traffic at the be-ginning of a flow connection. We compared the performance of ERBDPI with that of traditional sampling methods both synthetically and using real-world traffic traces. The result shows that ERBDPI can get a higher classification accuracy with a lower packet sampling rate,which makes it suitable to be applied to accurate real-time classification in high-speed links.
文摘With the progress of deep learning research, convolutional neural networks have become the most important method in feature extraction. How to effectively classify and recognize the extracted features will directly affect the performance of the entire network. Traditional processing methods include classification models such as fully connected network models and support vector machines. In order to solve the problem that the traditional convolutional neural network is prone to over-fitting for the classification of small samples, a CNN-TWSVM hybrid model was proposed by fusing the twin support vector machine (TWSVM) with higher computational efficiency as the CNN classifier, and it was applied to the traffic sign recognition task. In order to improve the generalization ability of the model, the wavelet kernel function is introduced to deal with the nonlinear classification task. The method uses the network initialized from the ImageNet dataset to fine-tune the specific domain and intercept the inner layer of the network to extract the high abstract features of the traffic sign image. Finally, the TWSVM based on wavelet kernel function is used to identify the traffic signs, so as to effectively solve the over-fitting problem of traffic signs classification. On GTSRB and BELGIUMTS datasets, the validity and generalization ability of the improved model is verified by comparing with different kernel functions and different SVM classifiers.
文摘Designing accurate and time-efficient real-time traffic sign recognition systems is a crucial part of developing the intelligent vehicle which is the main agent in the intelligent transportation system.Traffic sign recognition systems consist of an initial detection phase where images transportaand colors are segmented and fed to the recognition phase.The most challenging process in such systems in terms of time consumption is the detection phase.The trade off in previous studies,which proposed different methods for detecting traffic signs,is between accuracy and computation time,Therefore,this paper presents a novel accurate and time-efficient color segmentation approach based on logistic regression.We used RGB color space as the domain to extract the features of our hypothesis;this has boosted the speed of our approach since no color conversion is needed.Our trained segmentation classifier was tested on 1000 traffic sign images taken in different lighting conditions.The results show that our approach segmented 974 of these images correctly and in a time less than one-fifth of the time needed by any other robust segmentation method.
文摘Traffic sign recognition is an important task in intelligent transportation systems, which can improve road safety and reduce accidents. Algorithms based on deep learning have achieved remarkable results in traffic sign recognition in recent years. In this paper, we build traffic sign recognition algorithms based on ResNet and CNN models, respectively. We evaluate the proposed algorithm on public datasets and compare. We first use the dataset of traffic sign images from Kaggle. And then designed ResNet-based and CNN-based architectures that can effectively capture the complex features of traffic signs. Our experiments show that our ResNet-based model achieves a recognition accuracy of 99% on the test set, and our CNN-based model achieves a recognition accuracy of 98% on the test set. Our proposed approach has the potential to improve traffic safety and can be used in various intelligent transportation systems.
文摘Background:The rapid development of the automobile industry has led to an increase in the output and holdings of automobiles year by year,which has brought huge challenges to the current traffic management.Method:This paper adopts a traffic sign recognition technology based on deep convolution neural network(CNN):step 1,preprocess the collected traffic sign images through gray processing and near interpolation;step 2,automatically extract image features through the convolutional layer and the pooling layer;step 3,recognize traffic signs through the fully connected layer and the Dropout technology.Purpose:Artificial intelligence technology is applied to traffic management to better realize intelligent traffic assisted driving.Results:This paper adopts an Adam optimization algorithm for calculating the loss value.The average accuracy of the experimental classification is 98.87%.Compared with the traditional gradient descent algorithm,the experimental model can quickly converge in a few iteration cycles.