The Perception Spectrogram Structure Boundary(PSSB)parameter is proposed for speech endpoint detection as a preprocess of speech or speaker recognition.At first a hearing perception speech enhancement is carried out...The Perception Spectrogram Structure Boundary(PSSB)parameter is proposed for speech endpoint detection as a preprocess of speech or speaker recognition.At first a hearing perception speech enhancement is carried out.Then the two-dimensional enhancement is performed upon the sound spectrogram according to the difference between the determinacy distribution characteristic of speech and the random distribution characteristic of noise.Finally a decision for endpoint was made by the PSSB parameter.Experimental results show that,in a low SNR environment from-10 dB to 10 dB,the algorithm proposed in this paper may achieve higher accuracy than the extant endpoint detection algorithms.The detection accuracy of 75.2%can be reached even in the extremely low SNR at-10 dB.Therefore it is suitable for speech endpoint detection in low-SNRs environment.展开更多
It is an effective approach to learn the influence of environmental parameters, such as additive noise and channel distortions, from training data for robust speech recognition. Most of the previous methods are based ...It is an effective approach to learn the influence of environmental parameters, such as additive noise and channel distortions, from training data for robust speech recognition. Most of the previous methods are based on maximum likelihood estimation criterion. However, these methods do not lead to a minimum error rate result. In this paper, a novel discrimina-tive learning method of environmental parameters, which is based on Minimum Classification Error (MCE) criterion, is proposed. In the method, a simple classifier and the Generalized Probabilistic Descent (GPD) algorithm are adopted to iteratively learn the environmental pa-rameters. Consequently, the clean speech features are estimated from the noisy speech features with the estimated environmental parameters, and then the estimations of clean speech features are utilized in the back-end HMM classifier. Experiments show that the best error rate reduction of 32.1% is obtained, tested on a task of 18 isolated confusion Korean words, relative to a conventional HMM system.展开更多
自适应技术可以用较少的数据来调整声学模型参数,从而达到较好的语音识别效果,它们大多用于自适应有口音的语音。将最大似然线性回归(Maximum Likelihood Linear Regression,MLLR)、最大后验概率(Maximum A Posteriori,MAP)自适应技术...自适应技术可以用较少的数据来调整声学模型参数,从而达到较好的语音识别效果,它们大多用于自适应有口音的语音。将最大似然线性回归(Maximum Likelihood Linear Regression,MLLR)、最大后验概率(Maximum A Posteriori,MAP)自适应技术用在远场噪声混响环境下来分析其在此环境下的识别性能。实验结果表明,仿真条件下,在墙壁反射系数为0.6,各种噪声环境下MAP有最好的自适应性能,在信噪比(Signal-to-Noise Ratio,SNR)分别为5 dB、10 dB、15 dB时,MAP使远场连续语音词错率(Word Error Rate,WER)平均降低了1.51%、12.82%、2.95%。真实条件下,MAP使WER下降幅度最大达到了37.13%。进一步验证了MAP良好的渐进性,且当自适应句数为1 000时,用MAP声学模型自适应方法得到的远场噪声混响连续语音的识别词错率比自适应前平均降低了12.5%。展开更多
基金supported by the National Natural Science Foundation of China.(61071215,61271359,61372146)
文摘The Perception Spectrogram Structure Boundary(PSSB)parameter is proposed for speech endpoint detection as a preprocess of speech or speaker recognition.At first a hearing perception speech enhancement is carried out.Then the two-dimensional enhancement is performed upon the sound spectrogram according to the difference between the determinacy distribution characteristic of speech and the random distribution characteristic of noise.Finally a decision for endpoint was made by the PSSB parameter.Experimental results show that,in a low SNR environment from-10 dB to 10 dB,the algorithm proposed in this paper may achieve higher accuracy than the extant endpoint detection algorithms.The detection accuracy of 75.2%can be reached even in the extremely low SNR at-10 dB.Therefore it is suitable for speech endpoint detection in low-SNRs environment.
基金the '863' High-Tech Programme of China (No. 863-306ZT03-02-3) and partially by the National Natural Science Foundation of China
文摘It is an effective approach to learn the influence of environmental parameters, such as additive noise and channel distortions, from training data for robust speech recognition. Most of the previous methods are based on maximum likelihood estimation criterion. However, these methods do not lead to a minimum error rate result. In this paper, a novel discrimina-tive learning method of environmental parameters, which is based on Minimum Classification Error (MCE) criterion, is proposed. In the method, a simple classifier and the Generalized Probabilistic Descent (GPD) algorithm are adopted to iteratively learn the environmental pa-rameters. Consequently, the clean speech features are estimated from the noisy speech features with the estimated environmental parameters, and then the estimations of clean speech features are utilized in the back-end HMM classifier. Experiments show that the best error rate reduction of 32.1% is obtained, tested on a task of 18 isolated confusion Korean words, relative to a conventional HMM system.