Aiming at the problem of music noise introduced by classical spectral subtraction,a shorttime modulation domain(STM)spectral subtraction method has been successfully applied for singlechannel speech enhancement.Howeve...Aiming at the problem of music noise introduced by classical spectral subtraction,a shorttime modulation domain(STM)spectral subtraction method has been successfully applied for singlechannel speech enhancement.However,due to the inaccurate voice activity detection(VAD),the residual music noise and enhanced performance still need to be further improved,especially in the low signal to noise ratio(SNR)scenarios.To address this issue,an improved frame iterative spectral subtraction in the STM domain(IMModSSub)is proposed.More specifically,with the inter-frame correlation,the noise subtraction is directly applied to handle the noisy signal for each frame in the STM domain.Then,the noisy signal is classified into speech or silence frames based on a predefined threshold of segmented SNR.With these classification results,a corresponding mask function is developed for noisy speech after noise subtraction.Finally,exploiting the increased sparsity of speech signal in the modulation domain,the orthogonal matching pursuit(OMP)technique is employed to the speech frames for improving the speech quality and intelligibility.The effectiveness of the proposed method is evaluated with three types of noise,including white noise,pink noise,and hfchannel noise.The obtained results show that the proposed method outperforms some established baselines at lower SNRs(-5 to +5 dB).展开更多
Two gain forms of spectral amplitude subtraction are derived theoretically without neglecting the correlation of speech and noise spectrum during the period of a fralne. In the implementation, the constrained gain is ...Two gain forms of spectral amplitude subtraction are derived theoretically without neglecting the correlation of speech and noise spectrum during the period of a fralne. In the implementation, the constrained gain is expressed as a function of noncausal a priori SNR (Signal-to-Noise Ratio). Noise and noncausal a priori SNR are estimated from the multitaper spectrum of the noisy signal with algorithms modified to be suitable for the multitaper spectruln. Objective evaluations show that in case of white Gaussian noise the proposed method outperforms some methods based on LSA (Log Spectral Amplitude) in terms of MBSD (Modified Bark Spectral Distortion), segmental SNR and overall SNR, and informal listening tests show that speech reconstructed in this way has little speech distortion and musical noise is nearly inaudible even at low SNR.展开更多
A good voice-band signal classification can not only enable the safe application of speech ceding techniques, the implementation of a Digital Signal Interpolation (DSI) system, but also facilitate network administra...A good voice-band signal classification can not only enable the safe application of speech ceding techniques, the implementation of a Digital Signal Interpolation (DSI) system, but also facilitate network administration and planning by providing accurate voice-band traffic analysis. A new method is proposed to detect and classify the presence of various voice-band signals on the General Switched Telephone Network (GSTN). The method uses a combination of simple base classifiers through the AdaBoost algorithm. The conventional classification features for voice- band data classification are combined and optimized by the AdaBoost algorithm and spectral subtraction method. Experiments show the simpleness, effectiveness, efficiency and flexibility of the method.展开更多
基金National Natural Science Foundation of China(NSFC)(No.61671075)Major Program of National Natural Science Foundation of China(No.61631003)。
文摘Aiming at the problem of music noise introduced by classical spectral subtraction,a shorttime modulation domain(STM)spectral subtraction method has been successfully applied for singlechannel speech enhancement.However,due to the inaccurate voice activity detection(VAD),the residual music noise and enhanced performance still need to be further improved,especially in the low signal to noise ratio(SNR)scenarios.To address this issue,an improved frame iterative spectral subtraction in the STM domain(IMModSSub)is proposed.More specifically,with the inter-frame correlation,the noise subtraction is directly applied to handle the noisy signal for each frame in the STM domain.Then,the noisy signal is classified into speech or silence frames based on a predefined threshold of segmented SNR.With these classification results,a corresponding mask function is developed for noisy speech after noise subtraction.Finally,exploiting the increased sparsity of speech signal in the modulation domain,the orthogonal matching pursuit(OMP)technique is employed to the speech frames for improving the speech quality and intelligibility.The effectiveness of the proposed method is evaluated with three types of noise,including white noise,pink noise,and hfchannel noise.The obtained results show that the proposed method outperforms some established baselines at lower SNRs(-5 to +5 dB).
基金Supported by 973 Project of China (No.2002 CB312102)and the National Natural Science Foundation of China (No.60272044).
文摘Two gain forms of spectral amplitude subtraction are derived theoretically without neglecting the correlation of speech and noise spectrum during the period of a fralne. In the implementation, the constrained gain is expressed as a function of noncausal a priori SNR (Signal-to-Noise Ratio). Noise and noncausal a priori SNR are estimated from the multitaper spectrum of the noisy signal with algorithms modified to be suitable for the multitaper spectruln. Objective evaluations show that in case of white Gaussian noise the proposed method outperforms some methods based on LSA (Log Spectral Amplitude) in terms of MBSD (Modified Bark Spectral Distortion), segmental SNR and overall SNR, and informal listening tests show that speech reconstructed in this way has little speech distortion and musical noise is nearly inaudible even at low SNR.
文摘A good voice-band signal classification can not only enable the safe application of speech ceding techniques, the implementation of a Digital Signal Interpolation (DSI) system, but also facilitate network administration and planning by providing accurate voice-band traffic analysis. A new method is proposed to detect and classify the presence of various voice-band signals on the General Switched Telephone Network (GSTN). The method uses a combination of simple base classifiers through the AdaBoost algorithm. The conventional classification features for voice- band data classification are combined and optimized by the AdaBoost algorithm and spectral subtraction method. Experiments show the simpleness, effectiveness, efficiency and flexibility of the method.