期刊文献+
共找到152篇文章
< 1 2 8 >
每页显示 20 50 100
Audio-Text Multimodal Speech Recognition via Dual-Tower Architecture for Mandarin Air Traffic Control Communications
1
作者 Shuting Ge Jin Ren +3 位作者 Yihua Shi Yujun Zhang Shunzhi Yang Jinfeng Yang 《Computers, Materials & Continua》 SCIE EI 2024年第3期3215-3245,共31页
In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a p... In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues, we propose a speech-text multimodal dual-tower architecture for speech recognition. It employs cross-modal interactions to achieve close semantic alignment during the encoding stage and strengthen its capabilities in modeling auditory long-distance context dependencies. In addition, a two-stage training strategy is elaborately devised to derive semantics-aware acoustic representations effectively. The first stage focuses on pre-training the speech-text multimodal encoding module to enhance inter-modal semantic alignment and aural long-distance context dependencies. The second stage fine-tunes the entire network to bridge the input modality variation gap between the training and inference phases and boost generalization performance. Extensive experiments demonstrate the effectiveness of the proposed speech-text multimodal speech recognition method on the ATCC and AISHELL-1 datasets. It reduces the character error rate to 6.54% and 8.73%, respectively, and exhibits substantial performance gains of 28.76% and 23.82% compared with the best baseline model. The case studies indicate that the obtained semantics-aware acoustic representations aid in accurately recognizing terms with similar pronunciations but distinctive semantics. The research provides a novel modeling paradigm for semantics-aware speech recognition in air traffic control communications, which could contribute to the advancement of intelligent and efficient aviation safety management. 展开更多
关键词 speech-text multimodal automatic speech recognition semantic alignment air traffic control communications dual-tower architecture
下载PDF
Audiovisual speech recognition based on a deep convolutional neural network
2
作者 Shashidhar Rudregowda Sudarshan Patilkulkarni +2 位作者 Vinayakumar Ravi Gururaj H.L. Moez Krichen 《Data Science and Management》 2024年第1期25-34,共10页
Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for India... Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for Indian English linguistics and categorized it into three main categories:(1)audio recognition,(2)visual feature extraction,and(3)combined audio and visual recognition.Audio features were extracted using the mel-frequency cepstral coefficient,and classification was performed using a one-dimension convolutional neural network.Visual feature extraction uses Dlib and then classifies visual speech using a long short-term memory type of recurrent neural networks.Finally,integration was performed using a deep convolutional network.The audio speech of Indian English was successfully recognized with accuracies of 93.67%and 91.53%,respectively,using testing data from 200 epochs.The training accuracy for visual speech recognition using the Indian English dataset was 77.48%and the test accuracy was 76.19%using 60 epochs.After integration,the accuracies of audiovisual speech recognition using the Indian English dataset for training and testing were 94.67%and 91.75%,respectively. 展开更多
关键词 Audiovisual speech recognition Custom dataset 1D Convolution neural network(CNN) Deep CNN(DCNN) Long short-term memory(LSTM) LIPREADING Dlib Mel-frequency cepstral coefficient(MFCC)
下载PDF
Comparative Study on VQ-Based Efficient Mandarin Speech Recognition Method
3
作者 谢湘 赵军辉 匡镜明 《Journal of Beijing Institute of Technology》 EI CAS 2002年第3期266-270,共5页
A VQ based efficient speech recognition method is introduced, and the key parameters of this method are comparatively studied. This method is especially designed for mandarin speaker dependent small size word set r... A VQ based efficient speech recognition method is introduced, and the key parameters of this method are comparatively studied. This method is especially designed for mandarin speaker dependent small size word set recognition. It has less complexity, less resource consumption but higher ARR (accurate recognition rate) compared with traditional HMM or NN approach. A large scale test on the task of 11 mandarin digits recognition shows that the WER(word error rate) can reach 3 86%. This method is suitable for being embedded in PDA (personal digital assistant), mobile phone and so on to perform voice controlling like digits dialing, name dialing, calculating, voice commanding, etc. 展开更多
关键词 speech recognition vector quantization(VQ) speaker dependent digits recognition
下载PDF
Discriminative tone model training and optimal integration for Mandarin speech recognition
4
作者 黄浩 朱杰 《Journal of Southeast University(English Edition)》 EI CAS 2007年第2期174-178,共5页
Two discriminative methods for solving tone problems in Mandarin speech recognition are presented. First, discriminative training on the HMM (hidden Markov model) based tone models is proposed. Then an integration t... Two discriminative methods for solving tone problems in Mandarin speech recognition are presented. First, discriminative training on the HMM (hidden Markov model) based tone models is proposed. Then an integration technique of tone models into a large vocabulary continuous speech recognition system is presented. Discriminative model weight training based on minimum phone error criteria is adopted aiming at optimal integration of the tone models. The extended Baum Welch algorithm is applied to find the model-dependent weights to scale the acoustic scores and tone scores. Experimental results show that tone recognition rates and continuous speech recognition accuracy can be improved by the discriminatively trained tone model. Performance of a large vocabulary continuous Mandarin speech recognition system can be further enhanced by the discriminatively trained weight combinations due to a better interpolation of the given models. 展开更多
关键词 discriminative training minimum phone error tone modeling Mandarin speech recognition
下载PDF
Challenges and Limitations in Speech Recognition Technology:A Critical Review of Speech Signal Processing Algorithms,Tools and Systems
5
作者 Sneha Basak Himanshi Agrawal +4 位作者 Shreya Jena Shilpa Gite Mrinal Bachute Biswajeet Pradhan Mazen Assiri 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第5期1053-1089,共37页
Speech recognition systems have become a unique human-computer interaction(HCI)family.Speech is one of the most naturally developed human abilities;speech signal processing opens up a transparent and hand-free computa... Speech recognition systems have become a unique human-computer interaction(HCI)family.Speech is one of the most naturally developed human abilities;speech signal processing opens up a transparent and hand-free computation experience.This paper aims to present a retrospective yet modern approach to the world of speech recognition systems.The development journey of ASR(Automatic Speech Recognition)has seen quite a few milestones and breakthrough technologies that have been highlighted in this paper.A step-by-step rundown of the fundamental stages in developing speech recognition systems has been presented,along with a brief discussion of various modern-day developments and applications in this domain.This review paper aims to summarize and provide a beginning point for those starting in the vast field of speech signal processing.Since speech recognition has a vast potential in various industries like telecommunication,emotion recognition,healthcare,etc.,this review would be helpful to researchers who aim at exploring more applications that society can quickly adopt in future years of evolution. 展开更多
关键词 speech recognition automatic speech recognition(ASR) mel-frequency cepstral coefficients(MFCC) hidden Markov model(HMM) artificial neural network(ANN)
下载PDF
Speech Recognition via CTC-CNN Model
6
作者 Wen-Tsai Sung Hao-WeiKang Sung-Jung Hsiao 《Computers, Materials & Continua》 SCIE EI 2023年第9期3833-3858,共26页
In the speech recognition system,the acoustic model is an important underlying model,and its accuracy directly affects the performance of the entire system.This paper introduces the construction and training process o... In the speech recognition system,the acoustic model is an important underlying model,and its accuracy directly affects the performance of the entire system.This paper introduces the construction and training process of the acoustic model in detail and studies the Connectionist temporal classification(CTC)algorithm,which plays an important role in the end-to-end framework,established a convolutional neural network(CNN)combined with an acoustic model of Connectionist temporal classification to improve the accuracy of speech recognition.This study uses a sound sensor,ReSpeakerMic Array v2.0.1,to convert the collected speech signals into text or corresponding speech signals to improve communication and reduce noise and hardware interference.The baseline acousticmodel in this study faces challenges such as long training time,high error rate,and a certain degree of overfitting.The model is trained through continuous design and improvement of the relevant parameters of the acousticmodel,and finally the performance is selected according to the evaluation index.Excellentmodel,which reduces the error rate to about 18%,thus improving the accuracy rate.Finally,comparative verificationwas carried out from the selection of acoustic feature parameters,the selection of modeling units,and the speaker’s speech rate,which further verified the excellent performance of the CTCCNN_5+BN+Residual model structure.In terms of experiments,to train and verify the CTC-CNN baseline acoustic model,this study uses THCHS-30 and ST-CMDS speech data sets as training data sets,and after 54 epochs of training,the word error rate of the acoustic model training set is 31%,the word error rate of the test set is stable at about 43%.This experiment also considers the surrounding environmental noise.Under the noise level of 80∼90 dB,the accuracy rate is 88.18%,which is the worst performance among all levels.In contrast,at 40–60 dB,the accuracy was as high as 97.33%due to less noise pollution. 展开更多
关键词 Artificial intelligence speech recognition speech to text convolutional neural network automatic speech recognition
下载PDF
Adaptive bands filter bank optimized by genetic algorithm for robust speech recognition system 被引量:5
7
作者 黄丽霞 G.Evangelista 张雪英 《Journal of Central South University》 SCIE EI CAS 2011年第5期1595-1601,共7页
Perceptual auditory filter banks such as Bark-scale filter bank are widely used as front-end processing in speech recognition systems.However,the problem of the design of optimized filter banks that provide higher acc... Perceptual auditory filter banks such as Bark-scale filter bank are widely used as front-end processing in speech recognition systems.However,the problem of the design of optimized filter banks that provide higher accuracy in recognition tasks is still open.Owing to spectral analysis in feature extraction,an adaptive bands filter bank (ABFB) is presented.The design adopts flexible bandwidths and center frequencies for the frequency responses of the filters and utilizes genetic algorithm (GA) to optimize the design parameters.The optimization process is realized by combining the front-end filter bank with the back-end recognition network in the performance evaluation loop.The deployment of ABFB together with zero-crossing peak amplitude (ZCPA) feature as a front process for radial basis function (RBF) system shows significant improvement in robustness compared with the Bark-scale filter bank.In ABFB,several sub-bands are still more concentrated toward lower frequency but their exact locations are determined by the performance rather than the perceptual criteria.For the ease of optimization,only symmetrical bands are considered here,which still provide satisfactory results. 展开更多
关键词 perceptual filter banks bark scale speaker independent speech recognition systems zero-crossing peak amplitude genetic algorithm
下载PDF
Improved hidden Markov model for speech recognition and POS tagging 被引量:4
8
作者 袁里驰 《Journal of Central South University》 SCIE EI CAS 2012年第2期511-516,共6页
In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language proc... In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language processing. The speaker independently continuous speech recognition experiments and the part-of-speech tagging experiments show that Markov family model has higher performance than hidden Markov model. The precision is enhanced from 94.642% to 96.214% in the part-of-speech tagging experiments, and the work rate is reduced by 11.9% in the speech recognition experiments with respect to HMM baseline system. 展开更多
关键词 hidden Markov model Markov family model speech recognition part-of-speech tagging
下载PDF
An Innovative Approach Utilizing Binary-View Transformer for Speech Recognition Task 被引量:3
9
作者 Muhammad Babar Kamal Arfat Ahmad Khan +5 位作者 Faizan Ahmed Khan Malik Muhammad Ali Shahid Chitapong Wechtaisong Muhammad Daud Kamal Muhammad Junaid Ali Peerapong Uthansakul 《Computers, Materials & Continua》 SCIE EI 2022年第9期5547-5562,共16页
The deep learning advancements have greatly improved the performance of speech recognition systems,and most recent systems are based on the Recurrent Neural Network(RNN).Overall,the RNN works fine with the small seque... The deep learning advancements have greatly improved the performance of speech recognition systems,and most recent systems are based on the Recurrent Neural Network(RNN).Overall,the RNN works fine with the small sequence data,but suffers from the gradient vanishing problem in case of large sequence.The transformer networks have neutralized this issue and have shown state-of-the-art results on sequential or speech-related data.Generally,in speech recognition,the input audio is converted into an image using Mel-spectrogram to illustrate frequencies and intensities.The image is classified by the machine learning mechanism to generate a classification transcript.However,the audio frequency in the image has low resolution and causing inaccurate predictions.This paper presents a novel end-to-end binary view transformer-based architecture for speech recognition to cope with the frequency resolution problem.Firstly,the input audio signal is transformed into a 2D image using Mel-spectrogram.Secondly,the modified universal transformers utilize the multi-head attention to derive contextual information and derive different speech-related features.Moreover,a feedforward neural network is also deployed for classification.The proposed system has generated robust results on Google’s speech command dataset with an accuracy of 95.16%and with minimal loss.The binary-view transformer eradicates the eventuality of the over-fitting problem by deploying a multiview mechanism to diversify the input data,and multi-head attention captures multiple contexts from the data’s feature map. 展开更多
关键词 Convolution neural network multi-head attention MULTI-VIEW RNN self-attention speech recognition TRANSFORMER
下载PDF
Integrated search technique for parameter determination of SVM for speech recognition 被引量:2
10
作者 Teena Mittal R.K.Sharma 《Journal of Central South University》 SCIE EI CAS CSCD 2016年第6期1390-1398,共9页
Support vector machine(SVM)has a good application prospect for speech recognition problems;still optimum parameter selection is a vital issue for it.To improve the learning ability of SVM,a method for searching the op... Support vector machine(SVM)has a good application prospect for speech recognition problems;still optimum parameter selection is a vital issue for it.To improve the learning ability of SVM,a method for searching the optimal parameters based on integration of predator prey optimization(PPO)and Hooke-Jeeves method has been proposed.In PPO technique,population consists of prey and predator particles.The prey particles search the optimum solution and predator always attacks the global best prey particle.The solution obtained by PPO is further improved by applying Hooke-Jeeves method.Proposed method is applied to recognize isolated words in a Hindi speech database and also to recognize words in a benchmark database TI-20 in clean and noisy environment.A recognition rate of 81.5%for Hindi database and 92.2%for TI-20 database has been achieved using proposed technique. 展开更多
关键词 support vector machine (SVM) predator prey optimization speech recognition Mel-frequency cepstral coefficients wavelet packets Hooke-Jeeves method
下载PDF
Speech Recognition-Based Automated Visual Acuity Testing with Adaptive Mel Filter Bank 被引量:2
11
作者 Shibli Nisar Muhammad Asghar Khan +3 位作者 Fahad Algarni Abdul Wakeel M.Irfan Uddin Insaf Ullah 《Computers, Materials & Continua》 SCIE EI 2022年第2期2991-3004,共14页
One of the most commonly reported disabilities is vision loss,which can be diagnosed by an ophthalmologist in order to determine the visual system of a patient.This procedure,however,usually requires an appointment wi... One of the most commonly reported disabilities is vision loss,which can be diagnosed by an ophthalmologist in order to determine the visual system of a patient.This procedure,however,usually requires an appointment with an ophthalmologist,which is both time-consuming and expensive process.Other issues that can arise include a lack of appropriate equipment and trained practitioners,especially in rural areas.Centered on a cognitively motivated attribute extraction and speech recognition approach,this paper proposes a novel idea that immediately determines the eyesight deficiency.The proposed system uses an adaptive filter bank with weighted mel frequency cepstral coefficients for feature extraction.The adaptive filter bank implementation is inspired by the principle of spectrum sensing in cognitive radio that is aware of its environment and adapts to statistical variations in the input stimuli by learning from the environment.Comparative performance evaluation demonstrates the potential of our automated visual acuity test method to achieve comparable results to the clinical ground truth,established by the expert ophthalmologist’s tests.The overall accuracy achieved by the proposed model when compared with the expert ophthalmologist test is 91.875%.The proposed method potentially offers a second opinion to ophthalmologists,and serves as a cost-effective pre-screening test to predict eyesight loss at an early stage. 展开更多
关键词 Eyesight test speech recognition HMM SVM feature extraction
下载PDF
Mandarin Digits Speech Recognition Using Support Vector Machines 被引量:2
12
作者 谢湘 匡镜明 《Journal of Beijing Institute of Technology》 EI CAS 2005年第1期9-12,共4页
A method of applying support vector machine (SVM) in speech recognition was proposed, and a speech recognition system for mandarin digits was built up by SVMs. In the system, vectors were linearly extracted from speec... A method of applying support vector machine (SVM) in speech recognition was proposed, and a speech recognition system for mandarin digits was built up by SVMs. In the system, vectors were linearly extracted from speech feature sequence to make up time-aligned input patterns for SVM, and the decisions of several 2-class SVM classifiers were employed for constructing an N-class classifier. Four kinds of SVM kernel functions were compared in the experiments of speaker-independent speech recognition of mandarin digits. And the kernel of radial basis function has the highest accurate rate of 99.33%, which is better than that of the baseline system based on hidden Markov models (HMM) (97.08%). And the experiments also show that SVM can outperform HMM especially when the samples for learning were very limited. 展开更多
关键词 speech recognition support vector machine (SVM) kernel function
下载PDF
Novel Active Learning Method for Speech Recognition 被引量:1
13
作者 Liu Gang Chen Wei Guo Jun 《China Communications》 SCIE CSCD 2010年第5期29-39,共11页
In speech recognition, acoustic modeling always requires tremendous transcribed samples, and the transcription becomes intensively time-consuming and costly. In order to aid this labor-intensive process, Active Learni... In speech recognition, acoustic modeling always requires tremendous transcribed samples, and the transcription becomes intensively time-consuming and costly. In order to aid this labor-intensive process, Active Learning (AL) is adopted for speech recognition, where only the most informative training samples are selected for manual annotation. In this paper, we propose a novel active learning method for Chinese acoustic modeling, the methods for initial training set selection based on Kullback-Leibler Divergence (KLD) and sample evaluation based on multi-level confusion networks are proposed and adopted in our active learning system, respectively. Our experiments show that our proposed method can achieve satisfying performances. 展开更多
关键词 active learning acoustic model speech recognition KLD confusion network
下载PDF
Data-Driven Temporal Filtering on Teager Energy Time Trajectory for Robust Speech Recognition 被引量:1
14
作者 赵军辉 谢湘 匡镜明 《Journal of Beijing Institute of Technology》 EI CAS 2006年第2期195-200,共6页
Data-driven temporal filtering technique is integrated into the time trajectory of Teager energy operation (TEO) based feature parameter for improving the robustness of speech recognition system against noise. Three... Data-driven temporal filtering technique is integrated into the time trajectory of Teager energy operation (TEO) based feature parameter for improving the robustness of speech recognition system against noise. Three kinds of data-driven temporal filters are investigated for the motivation of alleviating the harmful effects that the environmental factors have on the speech. The filters include: principle component analysis (PCA) based filters, linear discriminant analysis (LDA) based filters and minimum classification error (MCE) based filters. Detailed comparative analysis among these temporal filtering approaches applied in Teager energy domain is presented. It is shown that while all of them can improve the recognition performance of the original TEO based feature parameter in adverse environment, MCE based temporal filtering can provide the lowest error rate as SNR decreases than any other algorithms. 展开更多
关键词 robust speech recognition principle component analysis linear discriminant analysis minimum classification error
下载PDF
Relative Contributions of Spectral and Temporal Cues for Speech Recognition in Patients with Sensorineural Hearing Loss 被引量:1
15
作者 Rebecca Brashears Katherine Rife 《Journal of Otology》 2008年第2期84-91,共8页
The present study was designed to examine speech recognition in patients with sensorineural hearing loss when the temporal and spectral information in the speech signals were co-varied. Four subjects with mild to mode... The present study was designed to examine speech recognition in patients with sensorineural hearing loss when the temporal and spectral information in the speech signals were co-varied. Four subjects with mild to moderate sensorineural hearing loss were recruited to participate in consonant and vowel recognition tests that used speech stimuli processed through a noise-excited vocoder. The number of channels was varied between 2 and 32, which defined spectral information. The lowpass cutoff frequency of the temporal envelope extractor was varied from 1 to 512 Hz, which defined temporal information. Results indicate that performance of subjects with sen-sorineural hearing loss varied tremendously among the subjects. For consonant recognition, patterns of relative contributions of spectral and temporal information were similar to those in normal-hearing subjects. The utility of temporal envelope information appeared to be normal in the hearing-impaired listeners. For vowel recognition, which depended predominately on spectral information, the performance plateau was achieved with numbers of channels as high as 16-24, much higher than expected, given that the frequency selectivity in patients with sensorineural hearing loss might be compromised. In order to understand the mechanisms on how hearing-impaired listeners utilize spectral and temporal cues for speech recognition, future studies that involve a large sample of patients with sensorineural hearing loss will be necessary to elucidate the relationship between frequency selectivity as well as central processing capability and speech recognition performance using vocoded signals. 展开更多
关键词 SPECTRAL TEMPORAL speech recognition hearing loss
下载PDF
Fuzzy C-Means Clustering Based Phonetic Tied-Mixture HMM in Speech Recognition 被引量:1
16
作者 徐向华 朱杰 郭强 《Journal of Shanghai Jiaotong university(Science)》 EI 2005年第1期16-20,共5页
A fuzzy clustering analysis based phonetic tied-mixture HMM(FPTM) was presented to decrease parameter size and improve robustness of parameter training. FPTM was synthesized from state-tied HMMs by a modified fuzzy C-... A fuzzy clustering analysis based phonetic tied-mixture HMM(FPTM) was presented to decrease parameter size and improve robustness of parameter training. FPTM was synthesized from state-tied HMMs by a modified fuzzy C-means clustering algorithm. Each Gaussian codebook of FPTM was built from Gaussian components within the same root node in phonetic decision tree. The experimental results on large vocabulary Mandarin speech recognition show that compared with conventional phonetic tied-mixture HMM and state-tied HMM with approximately the same number of Gaussian mixtures, FPTM achieves word error rate reductions by 4.84% and 13.02% respectively. Combining the two schemes of mixing weights pruning and Gaussian centers fuzzy merging, a significantly parameter size reduction was achieved with little impact on recognition accuracy. 展开更多
关键词 speech recognition hidden Markov model (HMM) fuzzy C-means (FCM) phonetic decision tree
下载PDF
EMOTIONAL SPEECH RECOGNITION BASED ON SVM WITH GMM SUPERVECTOR 被引量:1
17
作者 Chen Yanxiang Xie Jian 《Journal of Electronics(China)》 2012年第3期339-344,共6页
Emotion recognition from speech is an important field of research in human computer interaction. In this letter the framework of Support Vector Machines (SVM) with Gaussian Mixture Model (GMM) supervector is introduce... Emotion recognition from speech is an important field of research in human computer interaction. In this letter the framework of Support Vector Machines (SVM) with Gaussian Mixture Model (GMM) supervector is introduced for emotional speech recognition. Because of the importance of variance in reflecting the distribution of speech, the normalized mean vectors potential to exploit the information from the variance are adopted to form the GMM supervector. Comparative experiments from five aspects are conducted to study their corresponding effect to system performance. The experiment results, which indicate that the influence of number of mixtures is strong as well as influence of duration is weak, provide basis for the train set selection of Universal Background Model (UBM). 展开更多
关键词 Emotional speech recognition Support Vector Machines (SVM) Gaussian Mixture Model (GMM) supervector Universal Background Model (USB)
下载PDF
Tibetan Multi-Dialect Speech Recognition Using Latent Regression Bayesian Network and End-To-End Mode 被引量:1
18
作者 Yue Zhao Jianjian Yue +4 位作者 Wei Song Xiaona Xu Xiali Li Licheng Wu Qiang Ji 《Journal on Internet of Things》 2019年第1期17-23,共7页
We proposed a method using latent regression Bayesian network (LRBN) toextract the shared speech feature for the input of end-to-end speech recognition model.The structure of LRBN is compact and its parameter learning... We proposed a method using latent regression Bayesian network (LRBN) toextract the shared speech feature for the input of end-to-end speech recognition model.The structure of LRBN is compact and its parameter learning is fast. Compared withConvolutional Neural Network, it has a simpler and understood structure and lessparameters to learn. Experimental results show that the advantage of hybridLRBN/Bidirectional Long Short-Term Memory-Connectionist Temporal Classificationarchitecture for Tibetan multi-dialect speech recognition, and demonstrate the LRBN ishelpful to differentiate among multiple language speech sets. 展开更多
关键词 Multi-dialect speech recognition Tibetan language latent regressionbayesian network end-to-end model
下载PDF
Robust Speech Recognition System Using Conventional and Hybrid Features of MFCC,LPCC,PLP,RASTA-PLP and Hidden Markov Model Classifier in Noisy Conditions 被引量:7
19
作者 Veton Z.Kepuska Hussien A.Elharati 《Journal of Computer and Communications》 2015年第6期1-9,共9页
In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance... In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance degradation in noisy conditions or distorted channels. It is necessary to search for more robust feature extraction methods to gain better performance in adverse conditions. This paper investigates the performance of conventional and new hybrid speech feature extraction algorithms of Mel Frequency Cepstrum Coefficient (MFCC), Linear Prediction Coding Coefficient (LPCC), perceptual linear production (PLP), and RASTA-PLP in noisy conditions through using multivariate Hidden Markov Model (HMM) classifier. The behavior of the proposal system is evaluated using TIDIGIT human voice dataset corpora, recorded from 208 different adult speakers in both training and testing process. The theoretical basis for speech processing and classifier procedures were presented, and the recognition results were obtained based on word recognition rate. 展开更多
关键词 speech recognition Noisy Conditions Feature Extraction Mel-Frequency Cepstral Coefficients Linear Predictive Coding Coefficients Perceptual Linear Production RASTA-PLP Isolated speech Hidden Markov Model
下载PDF
Comparative Study on Channel Compensation for Robust Speech Recognition
20
作者 赵军辉 匡镜明 黄石磊 《Journal of Beijing Institute of Technology》 EI CAS 2003年第4期403-406,共4页
Some channel compensation techniques integrated into front-end of speech recognizer for improving channel robustness are described. These techniques include cepstral mean normalization, rasta processing and blind equa... Some channel compensation techniques integrated into front-end of speech recognizer for improving channel robustness are described. These techniques include cepstral mean normalization, rasta processing and blind equalization. Two standard channel frequency characteristics, G.712 and MIRS, are introduced as channel distortion references and a mandarin digit string recognition task is performed for evaluating and comparing the performance of these different methods. The recognition results show that in G.712 case blind equalization can achieve the best recognition performance while cepstral mean normalization outperforms the other methods in MIRS case which is capable of reaching a word error rate of 3.96%. 展开更多
关键词 ROBUSTNESS speech recognition cepstral mean normalization rasta processing blind equalization
下载PDF
上一页 1 2 8 下一页 到第
使用帮助 返回顶部