This paper presents an improved Voice Activity Detection (VAD) algorithm which uses the Signal-to-Noise Ratio (SNR) measure. We assume that noise Power Spectral Density (PSD) in each spectral bin follows a Rayle...This paper presents an improved Voice Activity Detection (VAD) algorithm which uses the Signal-to-Noise Ratio (SNR) measure. We assume that noise Power Spectral Density (PSD) in each spectral bin follows a Rayleigh distribution. Rayleigh distributions with its asymmetric tail characteristics give a better description of the noise PSD distribution than Gaussian distribution. Under this asstlmption, a new threshold updating expression is derived. Since the analytical integral of the false alarm probability, the threshold updating expression can be represented without the inverse complementary error function and low computational complexity is achieved in our system. Experimental results show that the proposed VAD outperforms or at least is comparable with the VAD scheme presented by Davis under several noise environments and has a lower computational complexity.展开更多
The performance of the traditional Voice Activity Detection (VAD) algorithms declines sharply in lower Signal-to-Noise Ratio (SNR) environments. In this paper, a feature weighting likelihood method is proposed for...The performance of the traditional Voice Activity Detection (VAD) algorithms declines sharply in lower Signal-to-Noise Ratio (SNR) environments. In this paper, a feature weighting likelihood method is proposed for noise-robust VAD. The contribution of dynamic features to likelihood score can be increased via the method, which improves consequently the noise robustness of VAD. Divergence based dimension reduction method is proposed for saving computation, which reduces these feature dimensions with smaller divergence value at the cost of degrading the performance a little. Experimental results on Aurora Ⅱ database show that the detection performance in noise environments can remarkably be improved by the proposed method when the model trained in clean data is used to detect speech endpoints. Using weighting likelihood on the dimension-reduced features obtains comparable, even better, performance compared to original full-dimensional feature.展开更多
Echo cancellation plays an important role in current Internet protocol(IP) based voice interactive systems. Voice state detection is an essential part in echo cancellation. It mainly comprises two parts: double tal...Echo cancellation plays an important role in current Internet protocol(IP) based voice interactive systems. Voice state detection is an essential part in echo cancellation. It mainly comprises two parts: double talk detection(DTD) and voice activity detection(VAD). DTD is used to detect doubletalk and prevent filter divergence in the presence of near-end speech, and VAD is used to determine the near-end voice activity and output silence indicator when near-end is silent. However, DTD straightforwardly proceeded may mistakenly declare double talk under double silent condition, coefficients update under the far-end silence condition may lead to filter divergence, and current VAD algorithms may misjudge the residual echo from the near end to be far-end voice. Therefore, a voice detection algorithm combining DTD and far-end VAD is proposed. DTD is implemented when VAD declares far-end speech, filtering and coefficients update will be halted when VAD declares far-end silence, and the far-end VAD adopted is multi-feature VAD based on short-time energy and correlation. The new algorithm can improve the accuracy of DTD, prevent filter divergence, and exclude the circumstance that far-end signal only contains residual echo from near end. Actual test results show that the voice state decision of the new algorithm is accurate, and the performance of echo cancellation is improved.展开更多
介绍DRTD系统中无线列调语音业务的音频回放技术。DRTD系统有线通信基于SIP协议和R T P流进行音频传输,通过混音、加窗语音检测、缓存、格式转换、信令控制等处理流程,将音频流在无线空口上进行传输,并最终在移动终端上实现语音波形回放...介绍DRTD系统中无线列调语音业务的音频回放技术。DRTD系统有线通信基于SIP协议和R T P流进行音频传输,通过混音、加窗语音检测、缓存、格式转换、信令控制等处理流程,将音频流在无线空口上进行传输,并最终在移动终端上实现语音波形回放,从而桥接无线列调中的有线通信和无线通信,为DRTD系统的核心业务提供支撑。展开更多
基金Supported by the National Natural Science Foundation of China (No. 60874060)
文摘This paper presents an improved Voice Activity Detection (VAD) algorithm which uses the Signal-to-Noise Ratio (SNR) measure. We assume that noise Power Spectral Density (PSD) in each spectral bin follows a Rayleigh distribution. Rayleigh distributions with its asymmetric tail characteristics give a better description of the noise PSD distribution than Gaussian distribution. Under this asstlmption, a new threshold updating expression is derived. Since the analytical integral of the false alarm probability, the threshold updating expression can be represented without the inverse complementary error function and low computational complexity is achieved in our system. Experimental results show that the proposed VAD outperforms or at least is comparable with the VAD scheme presented by Davis under several noise environments and has a lower computational complexity.
基金Supported by the National Basic Research Program of China (973 Program) (No.2007CB311104)
文摘The performance of the traditional Voice Activity Detection (VAD) algorithms declines sharply in lower Signal-to-Noise Ratio (SNR) environments. In this paper, a feature weighting likelihood method is proposed for noise-robust VAD. The contribution of dynamic features to likelihood score can be increased via the method, which improves consequently the noise robustness of VAD. Divergence based dimension reduction method is proposed for saving computation, which reduces these feature dimensions with smaller divergence value at the cost of degrading the performance a little. Experimental results on Aurora Ⅱ database show that the detection performance in noise environments can remarkably be improved by the proposed method when the model trained in clean data is used to detect speech endpoints. Using weighting likelihood on the dimension-reduced features obtains comparable, even better, performance compared to original full-dimensional feature.
基金supported by the National Youth Science Fund Project(61501052)the National Natural Science Foundation of China(61271182)
文摘Echo cancellation plays an important role in current Internet protocol(IP) based voice interactive systems. Voice state detection is an essential part in echo cancellation. It mainly comprises two parts: double talk detection(DTD) and voice activity detection(VAD). DTD is used to detect doubletalk and prevent filter divergence in the presence of near-end speech, and VAD is used to determine the near-end voice activity and output silence indicator when near-end is silent. However, DTD straightforwardly proceeded may mistakenly declare double talk under double silent condition, coefficients update under the far-end silence condition may lead to filter divergence, and current VAD algorithms may misjudge the residual echo from the near end to be far-end voice. Therefore, a voice detection algorithm combining DTD and far-end VAD is proposed. DTD is implemented when VAD declares far-end speech, filtering and coefficients update will be halted when VAD declares far-end silence, and the far-end VAD adopted is multi-feature VAD based on short-time energy and correlation. The new algorithm can improve the accuracy of DTD, prevent filter divergence, and exclude the circumstance that far-end signal only contains residual echo from near end. Actual test results show that the voice state decision of the new algorithm is accurate, and the performance of echo cancellation is improved.
文摘介绍DRTD系统中无线列调语音业务的音频回放技术。DRTD系统有线通信基于SIP协议和R T P流进行音频传输,通过混音、加窗语音检测、缓存、格式转换、信令控制等处理流程,将音频流在无线空口上进行传输,并最终在移动终端上实现语音波形回放,从而桥接无线列调中的有线通信和无线通信,为DRTD系统的核心业务提供支撑。