An improved method based on minimum mean square error-short time spectral amplitude (MMSE-STSA) is proposed to cancel background noise in whispered speech. Using the acoustic character of whispered speech, the algor...An improved method based on minimum mean square error-short time spectral amplitude (MMSE-STSA) is proposed to cancel background noise in whispered speech. Using the acoustic character of whispered speech, the algorithm can track the change of non-stationary background noise effectively. Compared with original MMSE-STSA algorithm and method in selectable mode Vo-coder (SMV), the improved algorithm can further suppress the residual noise for low signal-to-noise radio (SNR) and avoid the excessive suppression. Simulations show that under the non-stationary noisy environment, the proposed algorithm can not only get a better performance in enhancement, but also reduce the speech distortion.展开更多
A method of conversion from whispered speech to normal speech using the extended bilinear transformation was proposed. On account of the different deviation degrees of the whisper's formants in different frequency ba...A method of conversion from whispered speech to normal speech using the extended bilinear transformation was proposed. On account of the different deviation degrees of the whisper's formants in different frequency bands, the spectrum of the whispered speech will be processed in the separate partitions of this paper. On the basis of this spectrum, we will establish a conversion function able to usefully convert whispered speech to normal speech. Because of the whisper's non-linear offset in relation to normal speech, this paper introduces an expansion factor in the bilinear transform function making it correspond more closely to the actual conversion demands of whispered speech to normal speech. The introduction of this factor takes the non-linear move of the spectrum and the compression of the formant bandwidth into consideration, thus effectively reducing the spectrum distortion distance in the conversion. The experiment results show that the conversion presented in this paper effectively improves both the sound quality and the intelligibility of whispered speech.展开更多
Directing to the weakness of the present fixed values mapping methods (method_F), a vocal tract system conversion method based on the universal background model (UBM) is proposed for improving the performance of t...Directing to the weakness of the present fixed values mapping methods (method_F), a vocal tract system conversion method based on the universal background model (UBM) is proposed for improving the performance of the speech conversion system from Chinese whis- pered speech to normal speech. For the numerous components of UBM, the errors produced by the acoustical probability density statistical model can't be ignored. Thus an effective Gaus- sian mixture components chosen method based on the posterior probability summation of the minimum spectral distortion is developed to optimizing the system performance. The proposed method (method_U) is analyzed and compared using the performance index (PI) based on Itakura-Saito spectral distortion measure. It is shown experimentally that the performance of method_U is more stability for different speakers and different phonemes than that of method_F. The average PI of method_U is better than method_F. It is shown that by selecting effective Gaussian mixture components, the PI of method_U can be further improved 5.11%. Subjective auditory tests also show that the proposed method can improve the definition and intelligibility of conversion speech.展开更多
Whispered speech enhancement using auditory masking model in modified Mel- domain and Speech Absence Probability (SAP) was proposed. In light of the phonation char- acteristic of whisper, we modify the Mel-frequency...Whispered speech enhancement using auditory masking model in modified Mel- domain and Speech Absence Probability (SAP) was proposed. In light of the phonation char- acteristic of whisper, we modify the Mel-frequency Scaling model. Whispered speech is filtered by the proposed model. Meanwhile, the value of masking threshold for each frequency band is dynamically determined by speech absence probability. Then whispered speech enhancement is conducted by adaptively rectifying the spectrum subtraction coefficients using different masking threshold values. Results of objective and subjective tests on the enhanced whispered signal show that compared with other methods; the proposed method can enhance whispered signal with better subjective auditory quality and less distortion by reducing the music noise and background noise under the masking threshold value.展开更多
文摘An improved method based on minimum mean square error-short time spectral amplitude (MMSE-STSA) is proposed to cancel background noise in whispered speech. Using the acoustic character of whispered speech, the algorithm can track the change of non-stationary background noise effectively. Compared with original MMSE-STSA algorithm and method in selectable mode Vo-coder (SMV), the improved algorithm can further suppress the residual noise for low signal-to-noise radio (SNR) and avoid the excessive suppression. Simulations show that under the non-stationary noisy environment, the proposed algorithm can not only get a better performance in enhancement, but also reduce the speech distortion.
基金supported by the National Natural Science Foundation of China(61271359,61071215)Suzhou Science and Technology Development Plan(SYG201001)Key Joint Laboratory of Soochow University and JieMei Biomedical Engineering Instrument
文摘A method of conversion from whispered speech to normal speech using the extended bilinear transformation was proposed. On account of the different deviation degrees of the whisper's formants in different frequency bands, the spectrum of the whispered speech will be processed in the separate partitions of this paper. On the basis of this spectrum, we will establish a conversion function able to usefully convert whispered speech to normal speech. Because of the whisper's non-linear offset in relation to normal speech, this paper introduces an expansion factor in the bilinear transform function making it correspond more closely to the actual conversion demands of whispered speech to normal speech. The introduction of this factor takes the non-linear move of the spectrum and the compression of the formant bandwidth into consideration, thus effectively reducing the spectrum distortion distance in the conversion. The experiment results show that the conversion presented in this paper effectively improves both the sound quality and the intelligibility of whispered speech.
基金supported by the National Natural Science Foundation of China(61071215)the Science and Technology Foundation of Suzhou(SYG201033)the Pre-research Foundation of Soochow University(Q311901111,14317399)
文摘Directing to the weakness of the present fixed values mapping methods (method_F), a vocal tract system conversion method based on the universal background model (UBM) is proposed for improving the performance of the speech conversion system from Chinese whis- pered speech to normal speech. For the numerous components of UBM, the errors produced by the acoustical probability density statistical model can't be ignored. Thus an effective Gaus- sian mixture components chosen method based on the posterior probability summation of the minimum spectral distortion is developed to optimizing the system performance. The proposed method (method_U) is analyzed and compared using the performance index (PI) based on Itakura-Saito spectral distortion measure. It is shown experimentally that the performance of method_U is more stability for different speakers and different phonemes than that of method_F. The average PI of method_U is better than method_F. It is shown that by selecting effective Gaussian mixture components, the PI of method_U can be further improved 5.11%. Subjective auditory tests also show that the proposed method can improve the definition and intelligibility of conversion speech.
基金supported by the National Natural Science Foundation of China(61071215)the University Natural Science Research Project of Jiangsu Province(05KJB510113)
文摘Whispered speech enhancement using auditory masking model in modified Mel- domain and Speech Absence Probability (SAP) was proposed. In light of the phonation char- acteristic of whisper, we modify the Mel-frequency Scaling model. Whispered speech is filtered by the proposed model. Meanwhile, the value of masking threshold for each frequency band is dynamically determined by speech absence probability. Then whispered speech enhancement is conducted by adaptively rectifying the spectrum subtraction coefficients using different masking threshold values. Results of objective and subjective tests on the enhanced whispered signal show that compared with other methods; the proposed method can enhance whispered signal with better subjective auditory quality and less distortion by reducing the music noise and background noise under the masking threshold value.