Natural language processing technologies have become more widely available in recent years,making them more useful in everyday situations.Machine learning systems that employ accessible datasets and corporate work to ...Natural language processing technologies have become more widely available in recent years,making them more useful in everyday situations.Machine learning systems that employ accessible datasets and corporate work to serve the whole spectrum of problems addressed in computational linguistics have lately yielded a number of promising breakthroughs.These methods were particularly advantageous for regional languages,as they were provided with cut-ting-edge language processing tools as soon as the requisite corporate information was generated.The bulk of modern people are unconcerned about the importance of reading.Reading aloud,on the other hand,is an effective technique for nour-ishing feelings as well as a necessary skill in the learning process.This paper pro-posed a novel approach for speech recognition based on neural networks.The attention mechanism isfirst utilized to determine the speech accuracy andfluency assessments,with the spectrum map as the feature extraction input.To increase phoneme identification accuracy,reading precision,for example,employs a new type of deep speech.It makes use of the exportchapter tool,which provides a corpus,as well as the TensorFlow framework in the experimental setting.The experimentalfindings reveal that the suggested model can more effectively assess spoken speech accuracy and readingfluency than the old model,and its evalua-tion model’s score outcomes are more accurate.展开更多
Human body posture recognition has attracted considerable attention in recent years in wireless body area networks(WBAN). In order to precisely recognize human body posture,many recognition algorithms have been propos...Human body posture recognition has attracted considerable attention in recent years in wireless body area networks(WBAN). In order to precisely recognize human body posture,many recognition algorithms have been proposed.However, the recognition rate is relatively low. In this paper, we apply back propagation(BP) neural network as a classifier to recognizing human body posture, where signals are collected from VG350 acceleration sensor and a posture signal collection system based on WBAN is designed. Human body signal vector magnitude(SVM) and tri-axial acceleration sensor data are used to describe the human body postures. We are able to recognize 4postures: Walk, Run, Squat and Sit. Our posture recognition rate is up to 91.67%. Furthermore, we find an implied relationship between hidden layer neurons and the posture recognition rate. The proposed human body posture recognition algorithm lays the foundation for the subsequent applications.展开更多
Recognition and analysis of dynamic information about population images during wheat growth periods can be taken for the base of quantitative diagnosis for wheat growth. A recognition system based on self-learning BP ...Recognition and analysis of dynamic information about population images during wheat growth periods can be taken for the base of quantitative diagnosis for wheat growth. A recognition system based on self-learning BP neural network for feature data of wheat population images, such as total green areas and leaves areas was designed in this paper. In addition, some techniques to create favorable conditions for image recognition was discussed, which were as follows: (1) The method of collecting images by a digital camera and assistant equipment under natural conditions in fields. (2) An algorithm of pixel labeling was used to segment image and extract feature. (3) A high pass filter based on Laplacian was used to strengthen image information. The results showed that the ANN system was availability for image recognition of wheat population feature.展开更多
To solve the problem of mistake recognition among rice diseases, automatic recognition methods based on BP(back propagation) neural network were studied in this paper for blast, sheath blight and bacterial blight. Cho...To solve the problem of mistake recognition among rice diseases, automatic recognition methods based on BP(back propagation) neural network were studied in this paper for blast, sheath blight and bacterial blight. Chose mobile terminal equipment as image collecting tool and built database of rice leaf images with diseases under threshold segmentation method. Characteristic parameters were extracted from color, shape and texture. Furthermore, parameters were optimized using the single-factor variance analysis and the effects of BP neural network model. The optimization would simplify BP neural network model without reducing the recognition accuracy. The finally model could successfully recognize 98%, 96% and 98% of rice blast, sheath blight and white leaf blight, respectively.展开更多
Speech recognition or speech to text includes capturing and digitizing the sound waves, transformation of basic linguistic units or phonemes, constructing words from phonemes and contextually analyzing the words to en...Speech recognition or speech to text includes capturing and digitizing the sound waves, transformation of basic linguistic units or phonemes, constructing words from phonemes and contextually analyzing the words to ensure the correct spelling of words that sounds the same. Approach: Studying the possibility of designing a software system using one of the techniques of artificial intelligence applications neuron networks where this system is able to distinguish the sound signals and neural networks of irregular users. Fixed weights are trained on those forms first and then the system gives the output match for each of these formats and high speed. The proposed neural network study is based on solutions of speech recognition tasks, detecting signals using angular modulation and detection of modulated techniques.展开更多
On the basis of asymptotic theory of Gersho, the isodistortion principle of vector clustering was discussed and a kind of competitive and selective learning method (CSL) which may avoid local optimization and have exc...On the basis of asymptotic theory of Gersho, the isodistortion principle of vector clustering was discussed and a kind of competitive and selective learning method (CSL) which may avoid local optimization and have excellent result in application to clusters of HMM model was also proposed. In combining the parallel, self organizational hierarchical neural networks (PSHNN) to reclassify the scores of every form output by HMM, the CSL speech recognition rate is obviously elevated.展开更多
The application of pattern recognition technology enables us to solve various human-computer interaction problems that were difficult to solve before.Handwritten Chinese character recognition,as a hot research object ...The application of pattern recognition technology enables us to solve various human-computer interaction problems that were difficult to solve before.Handwritten Chinese character recognition,as a hot research object in image pattern recognition,has many applications in people’s daily life,and more and more scholars are beginning to study off-line handwritten Chinese character recognition.This paper mainly studies the recognition of handwritten Chinese characters by BP(Back Propagation)neural network.Establish a handwritten Chinese character recognition model based on BP neural network,and then verify the accuracy and feasibility of the neural network through GUI(Graphical User Interface)model established by Matlab.This paper mainly includes the following aspects:Firstly,the preprocessing process of handwritten Chinese character recognition in this paper is analyzed.Among them,image preprocessing mainly includes six processes:graying,binarization,smoothing and denoising,character segmentation,histogram equalization and normalization.Secondly,through the comparative selection of feature extraction methods for handwritten Chinese characters,and through the comparative analysis of the results of three different feature extraction methods,the most suitable feature extraction method for this paper is found.Finally,it is the application of BP neural network in handwritten Chinese character recognition.The establishment,training process and parameter selection of BP neural network are described in detail.The simulation software platform chosen in this paper is Matlab,and the sample images are used to train BP neural network to verify the feasibility of Chinese character recognition.Design the GUI interface of human-computer interaction based on Matlab,show the process and results of handwritten Chinese character recognition,and analyze the experimental results.展开更多
Communication is a significant part of being human and living in the world.Diverse kinds of languages and their variations are there;thus,one person can speak any language and cannot effectively communicate with one w...Communication is a significant part of being human and living in the world.Diverse kinds of languages and their variations are there;thus,one person can speak any language and cannot effectively communicate with one who speaks that language in a different accent.Numerous application fields such as education,mobility,smart systems,security,and health care systems utilize the speech or voice recognition models abundantly.Though,various studies are focused on the Arabic or Asian and English languages by ignoring other significant languages like Marathi that leads to the broader research motivations in regional languages.It is necessary to understand the speech recognition field,in which the major concentrated stages are feature extraction and classification.This paper emphasis developing a Speech Recognition model for the Marathi language by optimizing Recurrent Neural Network(RNN).Here,the preprocessing of the input signal is performed by smoothing and median filtering.After preprocessing the feature extraction is carried out using MFCC and Spectral features to get precise features from the input Marathi Speech corpus.The optimized RNN classifier is used for speech recognition after completing the feature extraction task,where the optimization of hidden neurons in RNN is performed by the Grasshopper Optimization Algorithm(GOA).Finally,the comparison with the conventional techniques has shown that the proposed model outperforms most competing models on a benchmark dataset.展开更多
Training neural network to recognize targets needs a lot of samples.People usually get these samples in a non-systematic way,which can miss or overemphasize some target information.To improve this situation,a new meth...Training neural network to recognize targets needs a lot of samples.People usually get these samples in a non-systematic way,which can miss or overemphasize some target information.To improve this situation,a new method based on virtual model and invariant moments was proposed to generate training samples.The method was composed of the following steps:use computer and simulation software to build target object's virtual model and then simulate the environment,light condition,camera parameter,etc.;rotate the model by spin and nutation of inclination to get the image sequence by virtual camera;preprocess each image and transfer them into binary image;calculate the invariant moments for each image and get a vectors' sequence.The vectors' sequence which was proved to be complete became the training samples together with the target outputs.The simulated results showed that the proposed method could be used to recognize the real targets and improve the accuracy of target recognition effectively when the sampling interval was short enough and the circumstance simulation was close enough.展开更多
Donggan language, which is a special variant of Mandarin, is used by Donggan people in Central Asia. Donggan language includes Gansu dialect and Shaanxi dialect. This paper proposes a convolutional neural network (CNN...Donggan language, which is a special variant of Mandarin, is used by Donggan people in Central Asia. Donggan language includes Gansu dialect and Shaanxi dialect. This paper proposes a convolutional neural network (CNN) based Donggan language speech recognition method for the Donggan Shaanxi dialect. A text corpus and a pronunciation dictionary were designed for of Donggan Shannxi dialect and the corresponding speech corpus was recorded. Then the acoustic models of Donggan Shaanxi dialect was trained by CNN. Experimental results demonstrate that the recognition rate of proposed CNNbased method achieves lower word error rate than that of the monophonic hidden Markov model (HMM) based method, triphone HMM-based method and DNN- based method.展开更多
This paper presents a new HMM/MLP hybrid network for speech recognition. By taking advantage of the discriminative training of MLP, the unreasonable model correctness assumption on the model correctness of the ML trai...This paper presents a new HMM/MLP hybrid network for speech recognition. By taking advantage of the discriminative training of MLP, the unreasonable model correctness assumption on the model correctness of the ML training in basic HMM can be overcome, and its discriminative ability and recognition performance can be improved. Experimental results demonstrate that the discriminative ability and recognition performance of HMM/MLP is apparently better than normal HMM.展开更多
In the speech recognition system,the acoustic model is an important underlying model,and its accuracy directly affects the performance of the entire system.This paper introduces the construction and training process o...In the speech recognition system,the acoustic model is an important underlying model,and its accuracy directly affects the performance of the entire system.This paper introduces the construction and training process of the acoustic model in detail and studies the Connectionist temporal classification(CTC)algorithm,which plays an important role in the end-to-end framework,established a convolutional neural network(CNN)combined with an acoustic model of Connectionist temporal classification to improve the accuracy of speech recognition.This study uses a sound sensor,ReSpeakerMic Array v2.0.1,to convert the collected speech signals into text or corresponding speech signals to improve communication and reduce noise and hardware interference.The baseline acousticmodel in this study faces challenges such as long training time,high error rate,and a certain degree of overfitting.The model is trained through continuous design and improvement of the relevant parameters of the acousticmodel,and finally the performance is selected according to the evaluation index.Excellentmodel,which reduces the error rate to about 18%,thus improving the accuracy rate.Finally,comparative verificationwas carried out from the selection of acoustic feature parameters,the selection of modeling units,and the speaker’s speech rate,which further verified the excellent performance of the CTCCNN_5+BN+Residual model structure.In terms of experiments,to train and verify the CTC-CNN baseline acoustic model,this study uses THCHS-30 and ST-CMDS speech data sets as training data sets,and after 54 epochs of training,the word error rate of the acoustic model training set is 31%,the word error rate of the test set is stable at about 43%.This experiment also considers the surrounding environmental noise.Under the noise level of 80∼90 dB,the accuracy rate is 88.18%,which is the worst performance among all levels.In contrast,at 40–60 dB,the accuracy was as high as 97.33%due to less noise pollution.展开更多
Automatic speech recognition(ASR)systems have emerged as indispensable tools across a wide spectrum of applications,ranging from transcription services to voice-activated assistants.To enhance the performance of these...Automatic speech recognition(ASR)systems have emerged as indispensable tools across a wide spectrum of applications,ranging from transcription services to voice-activated assistants.To enhance the performance of these systems,it is important to deploy efficient models capable of adapting to diverse deployment conditions.In recent years,on-demand pruning methods have obtained significant attention within the ASR domain due to their adaptability in various deployment scenarios.However,these methods often confront substantial trade-offs,particularly in terms of unstable accuracy when reducing the model size.To address challenges,this study introduces two crucial empirical findings.Firstly,it proposes the incorporation of an online distillation mechanism during on-demand pruning training,which holds the promise of maintaining more consistent accuracy levels.Secondly,it proposes the utilization of the Mogrifier long short-term memory(LSTM)language model(LM),an advanced iteration of the conventional LSTM LM,as an effective alternative for pruning targets within the ASR framework.Through rigorous experimentation on the ASR system,employing the Mogrifier LSTM LM and training it using the suggested joint on-demand pruning and online distillation method,this study provides compelling evidence.The results exhibit that the proposed methods significantly outperform a benchmark model trained solely with on-demand pruning methods.Impressively,the proposed strategic configuration successfully reduces the parameter count by approximately 39%,all the while minimizing trade-offs.展开更多
Speech recognition systems have become a unique human-computer interaction(HCI)family.Speech is one of the most naturally developed human abilities;speech signal processing opens up a transparent and hand-free computa...Speech recognition systems have become a unique human-computer interaction(HCI)family.Speech is one of the most naturally developed human abilities;speech signal processing opens up a transparent and hand-free computation experience.This paper aims to present a retrospective yet modern approach to the world of speech recognition systems.The development journey of ASR(Automatic Speech Recognition)has seen quite a few milestones and breakthrough technologies that have been highlighted in this paper.A step-by-step rundown of the fundamental stages in developing speech recognition systems has been presented,along with a brief discussion of various modern-day developments and applications in this domain.This review paper aims to summarize and provide a beginning point for those starting in the vast field of speech signal processing.Since speech recognition has a vast potential in various industries like telecommunication,emotion recognition,healthcare,etc.,this review would be helpful to researchers who aim at exploring more applications that society can quickly adopt in future years of evolution.展开更多
This study aims to reduce the interference of ambient noise in mobile communication,improve the accuracy and authenticity of information transmitted by sound,and guarantee the accuracy of voice information deliv-ered ...This study aims to reduce the interference of ambient noise in mobile communication,improve the accuracy and authenticity of information transmitted by sound,and guarantee the accuracy of voice information deliv-ered by mobile communication.First,the principles and techniques of speech enhancement are analyzed,and a fast lateral recursive least square method(FLRLS method)is adopted to process sound data.Then,the convolutional neural networks(CNNs)-based noise recognition CNN(NR-CNN)algorithm and speech enhancement model are proposed.Finally,related experiments are designed to verify the performance of the proposed algorithm and model.The experimental results show that the noise classification accuracy of the NR-CNN noise recognition algorithm is higher than 99.82%,and the recall rate and F1 value are also higher than 99.92.The proposed sound enhance-ment model can effectively enhance the original sound in the case of noise interference.After the CNN is incorporated,the average value of all noisy sound perception quality evaluation system values is improved by over 21%compared with that of the traditional noise reduction method.The proposed algorithm can adapt to a variety of voice environments and can simultaneously enhance and reduce noise processing on a variety of different types of voice signals,and the processing effect is better than that of traditional sound enhancement models.In addition,the sound distortion index of the proposed speech enhancement model is inferior to that of the control group,indicating that the addition of the CNN neural network is less likely to cause sound signal distortion in various sound environments and shows superior robustness.In summary,the proposed CNN-based speech enhancement model shows significant sound enhancement effects,stable performance,and strong adapt-ability.This study provides a reference and basis for research applying neural networks in speech enhancement.展开更多
基金the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:(22UQU4170008DSR06).
文摘Natural language processing technologies have become more widely available in recent years,making them more useful in everyday situations.Machine learning systems that employ accessible datasets and corporate work to serve the whole spectrum of problems addressed in computational linguistics have lately yielded a number of promising breakthroughs.These methods were particularly advantageous for regional languages,as they were provided with cut-ting-edge language processing tools as soon as the requisite corporate information was generated.The bulk of modern people are unconcerned about the importance of reading.Reading aloud,on the other hand,is an effective technique for nour-ishing feelings as well as a necessary skill in the learning process.This paper pro-posed a novel approach for speech recognition based on neural networks.The attention mechanism isfirst utilized to determine the speech accuracy andfluency assessments,with the spectrum map as the feature extraction input.To increase phoneme identification accuracy,reading precision,for example,employs a new type of deep speech.It makes use of the exportchapter tool,which provides a corpus,as well as the TensorFlow framework in the experimental setting.The experimentalfindings reveal that the suggested model can more effectively assess spoken speech accuracy and readingfluency than the old model,and its evalua-tion model’s score outcomes are more accurate.
基金supported by the National Natural Science Foundation of China(No.61074165 and No.61273064)Jilin Provincial Science&Technology Department Key Scientific and Technological Project(No.20140204034GX)Jilin Province Development and Reform Commission Project(No.2015Y043)
文摘Human body posture recognition has attracted considerable attention in recent years in wireless body area networks(WBAN). In order to precisely recognize human body posture,many recognition algorithms have been proposed.However, the recognition rate is relatively low. In this paper, we apply back propagation(BP) neural network as a classifier to recognizing human body posture, where signals are collected from VG350 acceleration sensor and a posture signal collection system based on WBAN is designed. Human body signal vector magnitude(SVM) and tri-axial acceleration sensor data are used to describe the human body postures. We are able to recognize 4postures: Walk, Run, Squat and Sit. Our posture recognition rate is up to 91.67%. Furthermore, we find an implied relationship between hidden layer neurons and the posture recognition rate. The proposed human body posture recognition algorithm lays the foundation for the subsequent applications.
基金suppported by the National Nat-ual Sience Fundation of China(990427 and“863”Opening Item(001A110-02)
文摘Recognition and analysis of dynamic information about population images during wheat growth periods can be taken for the base of quantitative diagnosis for wheat growth. A recognition system based on self-learning BP neural network for feature data of wheat population images, such as total green areas and leaves areas was designed in this paper. In addition, some techniques to create favorable conditions for image recognition was discussed, which were as follows: (1) The method of collecting images by a digital camera and assistant equipment under natural conditions in fields. (2) An algorithm of pixel labeling was used to segment image and extract feature. (3) A high pass filter based on Laplacian was used to strengthen image information. The results showed that the ANN system was availability for image recognition of wheat population feature.
基金Supported by Quality and Brand Construction of"Internet+County Characteristic Agricultural Products"(ZY17C06)
文摘To solve the problem of mistake recognition among rice diseases, automatic recognition methods based on BP(back propagation) neural network were studied in this paper for blast, sheath blight and bacterial blight. Chose mobile terminal equipment as image collecting tool and built database of rice leaf images with diseases under threshold segmentation method. Characteristic parameters were extracted from color, shape and texture. Furthermore, parameters were optimized using the single-factor variance analysis and the effects of BP neural network model. The optimization would simplify BP neural network model without reducing the recognition accuracy. The finally model could successfully recognize 98%, 96% and 98% of rice blast, sheath blight and white leaf blight, respectively.
文摘Speech recognition or speech to text includes capturing and digitizing the sound waves, transformation of basic linguistic units or phonemes, constructing words from phonemes and contextually analyzing the words to ensure the correct spelling of words that sounds the same. Approach: Studying the possibility of designing a software system using one of the techniques of artificial intelligence applications neuron networks where this system is able to distinguish the sound signals and neural networks of irregular users. Fixed weights are trained on those forms first and then the system gives the output match for each of these formats and high speed. The proposed neural network study is based on solutions of speech recognition tasks, detecting signals using angular modulation and detection of modulated techniques.
基金National Natural Science Foundation ofChina!( No.69672 0 0 7)
文摘On the basis of asymptotic theory of Gersho, the isodistortion principle of vector clustering was discussed and a kind of competitive and selective learning method (CSL) which may avoid local optimization and have excellent result in application to clusters of HMM model was also proposed. In combining the parallel, self organizational hierarchical neural networks (PSHNN) to reclassify the scores of every form output by HMM, the CSL speech recognition rate is obviously elevated.
文摘The application of pattern recognition technology enables us to solve various human-computer interaction problems that were difficult to solve before.Handwritten Chinese character recognition,as a hot research object in image pattern recognition,has many applications in people’s daily life,and more and more scholars are beginning to study off-line handwritten Chinese character recognition.This paper mainly studies the recognition of handwritten Chinese characters by BP(Back Propagation)neural network.Establish a handwritten Chinese character recognition model based on BP neural network,and then verify the accuracy and feasibility of the neural network through GUI(Graphical User Interface)model established by Matlab.This paper mainly includes the following aspects:Firstly,the preprocessing process of handwritten Chinese character recognition in this paper is analyzed.Among them,image preprocessing mainly includes six processes:graying,binarization,smoothing and denoising,character segmentation,histogram equalization and normalization.Secondly,through the comparative selection of feature extraction methods for handwritten Chinese characters,and through the comparative analysis of the results of three different feature extraction methods,the most suitable feature extraction method for this paper is found.Finally,it is the application of BP neural network in handwritten Chinese character recognition.The establishment,training process and parameter selection of BP neural network are described in detail.The simulation software platform chosen in this paper is Matlab,and the sample images are used to train BP neural network to verify the feasibility of Chinese character recognition.Design the GUI interface of human-computer interaction based on Matlab,show the process and results of handwritten Chinese character recognition,and analyze the experimental results.
基金Taif University Researchers Supporting Project number(TURSP-2020/349),Taif University,Taif,Saudi Arabia.
文摘Communication is a significant part of being human and living in the world.Diverse kinds of languages and their variations are there;thus,one person can speak any language and cannot effectively communicate with one who speaks that language in a different accent.Numerous application fields such as education,mobility,smart systems,security,and health care systems utilize the speech or voice recognition models abundantly.Though,various studies are focused on the Arabic or Asian and English languages by ignoring other significant languages like Marathi that leads to the broader research motivations in regional languages.It is necessary to understand the speech recognition field,in which the major concentrated stages are feature extraction and classification.This paper emphasis developing a Speech Recognition model for the Marathi language by optimizing Recurrent Neural Network(RNN).Here,the preprocessing of the input signal is performed by smoothing and median filtering.After preprocessing the feature extraction is carried out using MFCC and Spectral features to get precise features from the input Marathi Speech corpus.The optimized RNN classifier is used for speech recognition after completing the feature extraction task,where the optimization of hidden neurons in RNN is performed by the Grasshopper Optimization Algorithm(GOA).Finally,the comparison with the conventional techniques has shown that the proposed model outperforms most competing models on a benchmark dataset.
基金Supported by the Ministerial Level Research Foundation(404040401)
文摘Training neural network to recognize targets needs a lot of samples.People usually get these samples in a non-systematic way,which can miss or overemphasize some target information.To improve this situation,a new method based on virtual model and invariant moments was proposed to generate training samples.The method was composed of the following steps:use computer and simulation software to build target object's virtual model and then simulate the environment,light condition,camera parameter,etc.;rotate the model by spin and nutation of inclination to get the image sequence by virtual camera;preprocess each image and transfer them into binary image;calculate the invariant moments for each image and get a vectors' sequence.The vectors' sequence which was proved to be complete became the training samples together with the target outputs.The simulated results showed that the proposed method could be used to recognize the real targets and improve the accuracy of target recognition effectively when the sampling interval was short enough and the circumstance simulation was close enough.
文摘Donggan language, which is a special variant of Mandarin, is used by Donggan people in Central Asia. Donggan language includes Gansu dialect and Shaanxi dialect. This paper proposes a convolutional neural network (CNN) based Donggan language speech recognition method for the Donggan Shaanxi dialect. A text corpus and a pronunciation dictionary were designed for of Donggan Shannxi dialect and the corresponding speech corpus was recorded. Then the acoustic models of Donggan Shaanxi dialect was trained by CNN. Experimental results demonstrate that the recognition rate of proposed CNNbased method achieves lower word error rate than that of the monophonic hidden Markov model (HMM) based method, triphone HMM-based method and DNN- based method.
文摘This paper presents a new HMM/MLP hybrid network for speech recognition. By taking advantage of the discriminative training of MLP, the unreasonable model correctness assumption on the model correctness of the ML training in basic HMM can be overcome, and its discriminative ability and recognition performance can be improved. Experimental results demonstrate that the discriminative ability and recognition performance of HMM/MLP is apparently better than normal HMM.
基金Supported by the Department of Electrical Engineering at National Chin-Yi University of TechnologyNational Chin-Yi University of Technology,TakmingUniversity of Science and Technology,Taiwan,for supporting this research。
文摘In the speech recognition system,the acoustic model is an important underlying model,and its accuracy directly affects the performance of the entire system.This paper introduces the construction and training process of the acoustic model in detail and studies the Connectionist temporal classification(CTC)algorithm,which plays an important role in the end-to-end framework,established a convolutional neural network(CNN)combined with an acoustic model of Connectionist temporal classification to improve the accuracy of speech recognition.This study uses a sound sensor,ReSpeakerMic Array v2.0.1,to convert the collected speech signals into text or corresponding speech signals to improve communication and reduce noise and hardware interference.The baseline acousticmodel in this study faces challenges such as long training time,high error rate,and a certain degree of overfitting.The model is trained through continuous design and improvement of the relevant parameters of the acousticmodel,and finally the performance is selected according to the evaluation index.Excellentmodel,which reduces the error rate to about 18%,thus improving the accuracy rate.Finally,comparative verificationwas carried out from the selection of acoustic feature parameters,the selection of modeling units,and the speaker’s speech rate,which further verified the excellent performance of the CTCCNN_5+BN+Residual model structure.In terms of experiments,to train and verify the CTC-CNN baseline acoustic model,this study uses THCHS-30 and ST-CMDS speech data sets as training data sets,and after 54 epochs of training,the word error rate of the acoustic model training set is 31%,the word error rate of the test set is stable at about 43%.This experiment also considers the surrounding environmental noise.Under the noise level of 80∼90 dB,the accuracy rate is 88.18%,which is the worst performance among all levels.In contrast,at 40–60 dB,the accuracy was as high as 97.33%due to less noise pollution.
基金supported by Institute of Information&communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.2022-0-00377,Development of Intelligent Analysis and Classification Based Contents Class Categorization Technique to Prevent Imprudent Harmful Media Distribution).
文摘Automatic speech recognition(ASR)systems have emerged as indispensable tools across a wide spectrum of applications,ranging from transcription services to voice-activated assistants.To enhance the performance of these systems,it is important to deploy efficient models capable of adapting to diverse deployment conditions.In recent years,on-demand pruning methods have obtained significant attention within the ASR domain due to their adaptability in various deployment scenarios.However,these methods often confront substantial trade-offs,particularly in terms of unstable accuracy when reducing the model size.To address challenges,this study introduces two crucial empirical findings.Firstly,it proposes the incorporation of an online distillation mechanism during on-demand pruning training,which holds the promise of maintaining more consistent accuracy levels.Secondly,it proposes the utilization of the Mogrifier long short-term memory(LSTM)language model(LM),an advanced iteration of the conventional LSTM LM,as an effective alternative for pruning targets within the ASR framework.Through rigorous experimentation on the ASR system,employing the Mogrifier LSTM LM and training it using the suggested joint on-demand pruning and online distillation method,this study provides compelling evidence.The results exhibit that the proposed methods significantly outperform a benchmark model trained solely with on-demand pruning methods.Impressively,the proposed strategic configuration successfully reduces the parameter count by approximately 39%,all the while minimizing trade-offs.
文摘Speech recognition systems have become a unique human-computer interaction(HCI)family.Speech is one of the most naturally developed human abilities;speech signal processing opens up a transparent and hand-free computation experience.This paper aims to present a retrospective yet modern approach to the world of speech recognition systems.The development journey of ASR(Automatic Speech Recognition)has seen quite a few milestones and breakthrough technologies that have been highlighted in this paper.A step-by-step rundown of the fundamental stages in developing speech recognition systems has been presented,along with a brief discussion of various modern-day developments and applications in this domain.This review paper aims to summarize and provide a beginning point for those starting in the vast field of speech signal processing.Since speech recognition has a vast potential in various industries like telecommunication,emotion recognition,healthcare,etc.,this review would be helpful to researchers who aim at exploring more applications that society can quickly adopt in future years of evolution.
基金supported by General Project of Philosophy and Social Science Research in Colleges and Universities in Jiangsu Province(2022SJYB0712)Research Development Fund for Young Teachers of Chengxian College of Southeast University(z0037)Special Project of Ideological and Political Education Reform and Research Course(yjgsz2206).
文摘This study aims to reduce the interference of ambient noise in mobile communication,improve the accuracy and authenticity of information transmitted by sound,and guarantee the accuracy of voice information deliv-ered by mobile communication.First,the principles and techniques of speech enhancement are analyzed,and a fast lateral recursive least square method(FLRLS method)is adopted to process sound data.Then,the convolutional neural networks(CNNs)-based noise recognition CNN(NR-CNN)algorithm and speech enhancement model are proposed.Finally,related experiments are designed to verify the performance of the proposed algorithm and model.The experimental results show that the noise classification accuracy of the NR-CNN noise recognition algorithm is higher than 99.82%,and the recall rate and F1 value are also higher than 99.92.The proposed sound enhance-ment model can effectively enhance the original sound in the case of noise interference.After the CNN is incorporated,the average value of all noisy sound perception quality evaluation system values is improved by over 21%compared with that of the traditional noise reduction method.The proposed algorithm can adapt to a variety of voice environments and can simultaneously enhance and reduce noise processing on a variety of different types of voice signals,and the processing effect is better than that of traditional sound enhancement models.In addition,the sound distortion index of the proposed speech enhancement model is inferior to that of the control group,indicating that the addition of the CNN neural network is less likely to cause sound signal distortion in various sound environments and shows superior robustness.In summary,the proposed CNN-based speech enhancement model shows significant sound enhancement effects,stable performance,and strong adapt-ability.This study provides a reference and basis for research applying neural networks in speech enhancement.