The problem of speech enhancement using threshold de-noising in wavelet domain was considered.The appropriate decomposition level is another key factor pertinent to de-noising performance.This paper proposed a new wav...The problem of speech enhancement using threshold de-noising in wavelet domain was considered.The appropriate decomposition level is another key factor pertinent to de-noising performance.This paper proposed a new wavelet-based de-noising scheme that can improve the enhancement performance significantly in the presence of additive white Gaussian noise.The proposed algorithm can adaptively select the optimal decomposition level of wavelet transformation according to the characteristics of noisy speech.The experimental results demonstrate that this proposed algorithm outperforms the classical wavelet-based de-noising method and effectively improves the practicability of this kind of techniques.展开更多
The perceptual effect of the phase information in speech has been studied by auditorysubjective tests. On the condition that the phase spectrum in speech is changed while amplitudespectrum is unchanged, the tests show...The perceptual effect of the phase information in speech has been studied by auditorysubjective tests. On the condition that the phase spectrum in speech is changed while amplitudespectrum is unchanged, the tests show that: (1) If the envelop of the reconstructed speech signalis unchanged, there is indistinctive auditory perception between the original speech and thereconstructed speech; (2) The auditory perception effect of the reconstructed speech mainly lieson the amplitude of the derivative of the additive phase; (3) td is the maximum relative time shiftbetween different frequency components of the reconstructed speech signal. The speech qualityis excellent while td <10ms; good while 10ms< td <20ms; common while 20ms< td <35ms, andpoor while td >35ms.展开更多
Two gain forms of spectral amplitude subtraction are derived theoretically without neglecting the correlation of speech and noise spectrum during the period of a frame. In the implementation, the constrained gain is e...Two gain forms of spectral amplitude subtraction are derived theoretically without neglecting the correlation of speech and noise spectrum during the period of a frame. In the implementation, the constrained gain is expressed as a function of noncausal a priori SNR (Signal-to-Noise Ratio). Noise and noncausal a priori SNR are estimated from the multitaper spectrum of the noisy signal with algorithms modified to be suitable for the multitaper spectrum. Objective evaluations show that in case of white Gaussian noise the proposed method outperforms some methods based on LSA (Log Spectral Amplitude) in terms of MBSD (Modified Bark Spectral Distortion), segmental SNR and overall SNR, and informal listening tests show that speech reconstructed in this way has little speech distortion and musical noise is nearly inaudible even at low SNR.展开更多
A sinusoidal representation of speech and a cochlear model are used to extract speech parameters in this paper, and a speech analysis/synthesis system controlled by the auditory spectrum is developed with the model. T...A sinusoidal representation of speech and a cochlear model are used to extract speech parameters in this paper, and a speech analysis/synthesis system controlled by the auditory spectrum is developed with the model. The computer simulation shows that speech can be synthesized with only 12 parameters per frame on the average. The method has the advantages of few parameters, low complexity and high performance of speech representation. The synthetic speech has high intelligibility.展开更多
文摘The problem of speech enhancement using threshold de-noising in wavelet domain was considered.The appropriate decomposition level is another key factor pertinent to de-noising performance.This paper proposed a new wavelet-based de-noising scheme that can improve the enhancement performance significantly in the presence of additive white Gaussian noise.The proposed algorithm can adaptively select the optimal decomposition level of wavelet transformation according to the characteristics of noisy speech.The experimental results demonstrate that this proposed algorithm outperforms the classical wavelet-based de-noising method and effectively improves the practicability of this kind of techniques.
基金the National Natural Science Foundation of China (No.60071029)
文摘The perceptual effect of the phase information in speech has been studied by auditorysubjective tests. On the condition that the phase spectrum in speech is changed while amplitudespectrum is unchanged, the tests show that: (1) If the envelop of the reconstructed speech signalis unchanged, there is indistinctive auditory perception between the original speech and thereconstructed speech; (2) The auditory perception effect of the reconstructed speech mainly lieson the amplitude of the derivative of the additive phase; (3) td is the maximum relative time shiftbetween different frequency components of the reconstructed speech signal. The speech qualityis excellent while td <10ms; good while 10ms< td <20ms; common while 20ms< td <35ms, andpoor while td >35ms.
基金Supported by 973 Project of China (No.2002 CB312102)and the National Natural Science Foundation of China (No.60272044).
文摘Two gain forms of spectral amplitude subtraction are derived theoretically without neglecting the correlation of speech and noise spectrum during the period of a frame. In the implementation, the constrained gain is expressed as a function of noncausal a priori SNR (Signal-to-Noise Ratio). Noise and noncausal a priori SNR are estimated from the multitaper spectrum of the noisy signal with algorithms modified to be suitable for the multitaper spectrum. Objective evaluations show that in case of white Gaussian noise the proposed method outperforms some methods based on LSA (Log Spectral Amplitude) in terms of MBSD (Modified Bark Spectral Distortion), segmental SNR and overall SNR, and informal listening tests show that speech reconstructed in this way has little speech distortion and musical noise is nearly inaudible even at low SNR.
文摘A sinusoidal representation of speech and a cochlear model are used to extract speech parameters in this paper, and a speech analysis/synthesis system controlled by the auditory spectrum is developed with the model. The computer simulation shows that speech can be synthesized with only 12 parameters per frame on the average. The method has the advantages of few parameters, low complexity and high performance of speech representation. The synthetic speech has high intelligibility.