The unequal error protection (UEP) is applied in distributed speech recognition (DSR) system and three schemes are proposed. All of these three schemes are evaluated on the GSM simulating platform for recognizing ...The unequal error protection (UEP) is applied in distributed speech recognition (DSR) system and three schemes are proposed. All of these three schemes are evaluated on the GSM simulating platform for recognizing mandarin digit strings and compared with the equal error protection (EEP) scheme. Experiments show that UEP can protect the data transmitted in DSR system more effectively, which results in a higher word accurate rate of DSR system.展开更多
Distributed speech recognition (DSR) applications have certain QoS (Quality of service) requirements in terms of latency, packet loss rate, etc. To deliver quality guaranteed DSR application over wirelined or wireless...Distributed speech recognition (DSR) applications have certain QoS (Quality of service) requirements in terms of latency, packet loss rate, etc. To deliver quality guaranteed DSR application over wirelined or wireless links, some QoS mechanisms should be provided. We put forward a RTP/RSVP transmission scheme with DSR-specific payload and QoS parameters by modifying the present WAP protocol stack. The simulation result shows that this scheme will provide adequate network bandwidth to keep the real-time transport of DSR data over either wirelined or wireless channels.展开更多
As a kind of statistical method, the technique of Hidden Markov Model (HMM) is widely used for speech recognition. In order to train the HMM to be more effective with much less amount of data, the Subspace Distribut...As a kind of statistical method, the technique of Hidden Markov Model (HMM) is widely used for speech recognition. In order to train the HMM to be more effective with much less amount of data, the Subspace Distribution Clustering Hidden Markov Model (SDCHMM), derived from the Continuous Density Hidden Markov Model (CDHMM), is introduced. With parameter tying, a new method to train SDCHMMs is described. Compared with the conventional training method, an SDCHMM recognizer trained by means of the new method achieves higher accuracy and speed. Experiment results show that the SDCHMM recognizer outperforms the CDHMM recognizer on speech recognition of Chinese digits.展开更多
Speech recognition allows the machine to turn the speech signal into text through identification and understanding process. Extract the features, predict the maximum likelihood, and generate the models of the input sp...Speech recognition allows the machine to turn the speech signal into text through identification and understanding process. Extract the features, predict the maximum likelihood, and generate the models of the input speech signal are considered the most important steps to configure the Automatic Speech Recognition System (ASR). In this paper, an automatic Arabic speech recognition system was established using MATLAB and 24 Arabic words Consonant-Vowel Consonant-Vowel Consonant-Vowel (CVCVCV) was recorded from 19 Arabic native speakers, each speaker uttering the same word 3 times (total 1368 words). In order to test the system, 39-features were extracted by partitioning the speech signal into frames ~ 0.25 sec shifted by 0.10 sec. in back-end, the statistical models were generated by separated the features into number of states between 4 to 10, each state has 8-gaussian distributions. The data has 48 k sample rate and 32-bit depth and saved separately in a wave file format. The system was trained in phonetically rich and balanced Arabic speech words list (10 speakers * 3 times * 24 words, total 720 words) and tested using another word list (24 words * 9 speakers * 3 times *, total 648 words). Using different speakers similar words, the system obtained a very good word recognition accuracy results of 92.92% and a Word Error Rate (WER) of 7.08%.展开更多
In this paper we first analyze the Distributed Speech Recognition (DSR)system and the key factors that affect it's performance and then focus on the research on therelationship between the length of testing speech...In this paper we first analyze the Distributed Speech Recognition (DSR)system and the key factors that affect it's performance and then focus on the research on therelationship between the length of testing speech and the recognition accuracy of the system . Someexperimental results are given at last.展开更多
This work demonstrates the use of the nonlinear time-frequency distribution (NLTFD) of a discrete time energy operator (DTEO) based on amplitude modulation-frequency modulation demodulation techniques as a feature i...This work demonstrates the use of the nonlinear time-frequency distribution (NLTFD) of a discrete time energy operator (DTEO) based on amplitude modulation-frequency modulation demodulation techniques as a feature in speech recognition. The duration distribution based hidden Markov module in a speaker independent large vocabulary mandarin speech recognition system was reconstructed from the feature vectors in the front-end detection stage. The goal was to improve the performance of the existing system by combining new features to the baseline feature vector. This paper also deals with errors associated with using a pre-emphasis filter in the front end processing of the present scheme, which causes an increase in the noise energy at high frequencies above 4 kHz and in some cases degrades the recognition accuracy. The experimental results show that eliminating the pre-emphasis filters from the pre-processing stage and using NLTFD with compensated DTEO combined with Mel frequency cepstrum components give a 21.95% reduction in the relative error rate compared to the conventional technique with 25 candidates used in the test.展开更多
This work describes an improved feature extractor algorithm to extract the peripheral features of point x(ti,fj) using a nonlinear algorithm to compute the nonlinear time spectrum (NL-TS) pattern. The algo- rithm ob...This work describes an improved feature extractor algorithm to extract the peripheral features of point x(ti,fj) using a nonlinear algorithm to compute the nonlinear time spectrum (NL-TS) pattern. The algo- rithm observes n×n neighborhoods of the point in all directions, and then incorporates the peripheral fea- tures using the Mel frequency cepstrum components (MFCCs)-based feature extractor of the Tsinghua elec- tronic engineering speech processing (THEESP) for Mandarin automatic speech recognition (MASR) sys- tem as replacements of the dynamic features with different feature combinations. In this algorithm, the or- thogonal bases are extracted directly from the speech data using discrite cosime transformation (DCT) with 3×3 blocks on an NL-TS pattern as the peripheral features. The new primal bases are then selected and simplified in the form of the ?dp- operator in the time direction and the ?dp- operator in the frequency di- t f rection. The algorithm has 23.29% improvements of the relative error rate in comparison with the standard MFCC feature-set and the dynamic features in tests using THEESP with the duration distribution-based hid- den Markov model (DDBHMM) based on MASR system.展开更多
In speech recognition systems, the physiological characteristics of the speech production model cause the voiced sections of the speech signal to have an attenuation of approximately 20 dB per decade. Many speech rec...In speech recognition systems, the physiological characteristics of the speech production model cause the voiced sections of the speech signal to have an attenuation of approximately 20 dB per decade. Many speech recognition algorithms have been developed to solve this problem by filtering the input signal with a single-zero high pass filter. Unfortunately, this technique increases the noise energy at high frequencies above 4 kHz, which in some cases degrades the recognition accuracy. This paper solves the problem using a pre-emphasis filter in the front end of the recognizer. The aim is to develop a modified parameterization approach taking into account the whole energy zone in the spectrum to improve the performance of the existing baseline recognition system in the acoustic phase. The results show that a large vocabulary speaker-independent continuous speech recognition system using this approach has a greatly improved recognition rate.展开更多
首先,给出结合韵律信息的系统框架。然后,针对汉语的特点,解决了韵律相关的语音识别系统中建模单元选择、模型训练等问题,并在多空间概率分布隐马尔可夫模型(multiple-space distribution hidden Markov mod-el,MSD-HMM)框架下构建了韵...首先,给出结合韵律信息的系统框架。然后,针对汉语的特点,解决了韵律相关的语音识别系统中建模单元选择、模型训练等问题,并在多空间概率分布隐马尔可夫模型(multiple-space distribution hidden Markov mod-el,MSD-HMM)框架下构建了韵律相关的语音识别系统。最后,通过语音识别的实验验证了方法的有效性。在"863"测试集上,该方法能够达到76.18%的带调音节识别正确率。展开更多
基金Sponsored bythe National Natural Science Foundation of China (60372089) the Basic Research Foundation of Beijing Institute of Technology(BIT-UBF-200301F03)
文摘The unequal error protection (UEP) is applied in distributed speech recognition (DSR) system and three schemes are proposed. All of these three schemes are evaluated on the GSM simulating platform for recognizing mandarin digit strings and compared with the equal error protection (EEP) scheme. Experiments show that UEP can protect the data transmitted in DSR system more effectively, which results in a higher word accurate rate of DSR system.
文摘Distributed speech recognition (DSR) applications have certain QoS (Quality of service) requirements in terms of latency, packet loss rate, etc. To deliver quality guaranteed DSR application over wirelined or wireless links, some QoS mechanisms should be provided. We put forward a RTP/RSVP transmission scheme with DSR-specific payload and QoS parameters by modifying the present WAP protocol stack. The simulation result shows that this scheme will provide adequate network bandwidth to keep the real-time transport of DSR data over either wirelined or wireless channels.
基金Supported by the National Natural Science Foundation of China (No.60172048)
文摘As a kind of statistical method, the technique of Hidden Markov Model (HMM) is widely used for speech recognition. In order to train the HMM to be more effective with much less amount of data, the Subspace Distribution Clustering Hidden Markov Model (SDCHMM), derived from the Continuous Density Hidden Markov Model (CDHMM), is introduced. With parameter tying, a new method to train SDCHMMs is described. Compared with the conventional training method, an SDCHMM recognizer trained by means of the new method achieves higher accuracy and speed. Experiment results show that the SDCHMM recognizer outperforms the CDHMM recognizer on speech recognition of Chinese digits.
文摘Speech recognition allows the machine to turn the speech signal into text through identification and understanding process. Extract the features, predict the maximum likelihood, and generate the models of the input speech signal are considered the most important steps to configure the Automatic Speech Recognition System (ASR). In this paper, an automatic Arabic speech recognition system was established using MATLAB and 24 Arabic words Consonant-Vowel Consonant-Vowel Consonant-Vowel (CVCVCV) was recorded from 19 Arabic native speakers, each speaker uttering the same word 3 times (total 1368 words). In order to test the system, 39-features were extracted by partitioning the speech signal into frames ~ 0.25 sec shifted by 0.10 sec. in back-end, the statistical models were generated by separated the features into number of states between 4 to 10, each state has 8-gaussian distributions. The data has 48 k sample rate and 32-bit depth and saved separately in a wave file format. The system was trained in phonetically rich and balanced Arabic speech words list (10 speakers * 3 times * 24 words, total 720 words) and tested using another word list (24 words * 9 speakers * 3 times *, total 648 words). Using different speakers similar words, the system obtained a very good word recognition accuracy results of 92.92% and a Word Error Rate (WER) of 7.08%.
文摘In this paper we first analyze the Distributed Speech Recognition (DSR)system and the key factors that affect it's performance and then focus on the research on therelationship between the length of testing speech and the recognition accuracy of the system . Someexperimental results are given at last.
基金the National High- Tech Research andDevelopm ent Program of China(No. 2 0 0 1AA114 0 71)
文摘This work demonstrates the use of the nonlinear time-frequency distribution (NLTFD) of a discrete time energy operator (DTEO) based on amplitude modulation-frequency modulation demodulation techniques as a feature in speech recognition. The duration distribution based hidden Markov module in a speaker independent large vocabulary mandarin speech recognition system was reconstructed from the feature vectors in the front-end detection stage. The goal was to improve the performance of the existing system by combining new features to the baseline feature vector. This paper also deals with errors associated with using a pre-emphasis filter in the front end processing of the present scheme, which causes an increase in the noise energy at high frequencies above 4 kHz and in some cases degrades the recognition accuracy. The experimental results show that eliminating the pre-emphasis filters from the pre-processing stage and using NLTFD with compensated DTEO combined with Mel frequency cepstrum components give a 21.95% reduction in the relative error rate compared to the conventional technique with 25 candidates used in the test.
基金Supported by the National High-Tech Research and Development (863) Program of China (No. 200/AA/14)
文摘This work describes an improved feature extractor algorithm to extract the peripheral features of point x(ti,fj) using a nonlinear algorithm to compute the nonlinear time spectrum (NL-TS) pattern. The algo- rithm observes n×n neighborhoods of the point in all directions, and then incorporates the peripheral fea- tures using the Mel frequency cepstrum components (MFCCs)-based feature extractor of the Tsinghua elec- tronic engineering speech processing (THEESP) for Mandarin automatic speech recognition (MASR) sys- tem as replacements of the dynamic features with different feature combinations. In this algorithm, the or- thogonal bases are extracted directly from the speech data using discrite cosime transformation (DCT) with 3×3 blocks on an NL-TS pattern as the peripheral features. The new primal bases are then selected and simplified in the form of the ?dp- operator in the time direction and the ?dp- operator in the frequency di- t f rection. The algorithm has 23.29% improvements of the relative error rate in comparison with the standard MFCC feature-set and the dynamic features in tests using THEESP with the duration distribution-based hid- den Markov model (DDBHMM) based on MASR system.
基金Supported by the National High- TechnologyDevelopm ent Program of China(No.2 0 0 1AA1140 71)
文摘In speech recognition systems, the physiological characteristics of the speech production model cause the voiced sections of the speech signal to have an attenuation of approximately 20 dB per decade. Many speech recognition algorithms have been developed to solve this problem by filtering the input signal with a single-zero high pass filter. Unfortunately, this technique increases the noise energy at high frequencies above 4 kHz, which in some cases degrades the recognition accuracy. This paper solves the problem using a pre-emphasis filter in the front end of the recognizer. The aim is to develop a modified parameterization approach taking into account the whole energy zone in the spectrum to improve the performance of the existing baseline recognition system in the acoustic phase. The results show that a large vocabulary speaker-independent continuous speech recognition system using this approach has a greatly improved recognition rate.
文摘首先,给出结合韵律信息的系统框架。然后,针对汉语的特点,解决了韵律相关的语音识别系统中建模单元选择、模型训练等问题,并在多空间概率分布隐马尔可夫模型(multiple-space distribution hidden Markov mod-el,MSD-HMM)框架下构建了韵律相关的语音识别系统。最后,通过语音识别的实验验证了方法的有效性。在"863"测试集上,该方法能够达到76.18%的带调音节识别正确率。