This paper presents an improved Voice Activity Detection (VAD) algorithm which uses the Signal-to-Noise Ratio (SNR) measure. We assume that noise Power Spectral Density (PSD) in each spectral bin follows a Rayle...This paper presents an improved Voice Activity Detection (VAD) algorithm which uses the Signal-to-Noise Ratio (SNR) measure. We assume that noise Power Spectral Density (PSD) in each spectral bin follows a Rayleigh distribution. Rayleigh distributions with its asymmetric tail characteristics give a better description of the noise PSD distribution than Gaussian distribution. Under this asstlmption, a new threshold updating expression is derived. Since the analytical integral of the false alarm probability, the threshold updating expression can be represented without the inverse complementary error function and low computational complexity is achieved in our system. Experimental results show that the proposed VAD outperforms or at least is comparable with the VAD scheme presented by Davis under several noise environments and has a lower computational complexity.展开更多
The performance of the traditional Voice Activity Detection (VAD) algorithms declines sharply in lower Signal-to-Noise Ratio (SNR) environments. In this paper, a feature weighting likelihood method is proposed for...The performance of the traditional Voice Activity Detection (VAD) algorithms declines sharply in lower Signal-to-Noise Ratio (SNR) environments. In this paper, a feature weighting likelihood method is proposed for noise-robust VAD. The contribution of dynamic features to likelihood score can be increased via the method, which improves consequently the noise robustness of VAD. Divergence based dimension reduction method is proposed for saving computation, which reduces these feature dimensions with smaller divergence value at the cost of degrading the performance a little. Experimental results on Aurora Ⅱ database show that the detection performance in noise environments can remarkably be improved by the proposed method when the model trained in clean data is used to detect speech endpoints. Using weighting likelihood on the dimension-reduced features obtains comparable, even better, performance compared to original full-dimensional feature.展开更多
Echo cancellation plays an important role in current Internet protocol(IP) based voice interactive systems. Voice state detection is an essential part in echo cancellation. It mainly comprises two parts: double tal...Echo cancellation plays an important role in current Internet protocol(IP) based voice interactive systems. Voice state detection is an essential part in echo cancellation. It mainly comprises two parts: double talk detection(DTD) and voice activity detection(VAD). DTD is used to detect doubletalk and prevent filter divergence in the presence of near-end speech, and VAD is used to determine the near-end voice activity and output silence indicator when near-end is silent. However, DTD straightforwardly proceeded may mistakenly declare double talk under double silent condition, coefficients update under the far-end silence condition may lead to filter divergence, and current VAD algorithms may misjudge the residual echo from the near end to be far-end voice. Therefore, a voice detection algorithm combining DTD and far-end VAD is proposed. DTD is implemented when VAD declares far-end speech, filtering and coefficients update will be halted when VAD declares far-end silence, and the far-end VAD adopted is multi-feature VAD based on short-time energy and correlation. The new algorithm can improve the accuracy of DTD, prevent filter divergence, and exclude the circumstance that far-end signal only contains residual echo from near end. Actual test results show that the voice state decision of the new algorithm is accurate, and the performance of echo cancellation is improved.展开更多
介绍DRTD系统中无线列调语音业务的音频回放技术。DRTD系统有线通信基于SIP协议和R T P流进行音频传输,通过混音、加窗语音检测、缓存、格式转换、信令控制等处理流程,将音频流在无线空口上进行传输,并最终在移动终端上实现语音波形回放...介绍DRTD系统中无线列调语音业务的音频回放技术。DRTD系统有线通信基于SIP协议和R T P流进行音频传输,通过混音、加窗语音检测、缓存、格式转换、信令控制等处理流程,将音频流在无线空口上进行传输,并最终在移动终端上实现语音波形回放,从而桥接无线列调中的有线通信和无线通信,为DRTD系统的核心业务提供支撑。展开更多
在频域应用高阶统计量(High order statistics,HOS),提出一种基于幅度谱HOS新特征的语音端点检测(Voice activity detection,VAD)算法。算法利用相邻帧获取当前帧的统计信息,并用幅度谱构造独立零均值高斯随机序列,通过计算此序列的归...在频域应用高阶统计量(High order statistics,HOS),提出一种基于幅度谱HOS新特征的语音端点检测(Voice activity detection,VAD)算法。算法利用相邻帧获取当前帧的统计信息,并用幅度谱构造独立零均值高斯随机序列,通过计算此序列的归一化偏度来得到HOS特征。新特征利用了噪声的长时平稳特性和无序性的先验信息,借用语音生成模型来分析噪声模型,并通过合理的假定,提取潜藏在幅度谱中的高斯信息。因此相比传统HOS特征只能用于高斯或准高斯白噪声检测,幅度谱HOS适用范围扩展到包括有色噪声在内的所有平稳随机噪声。同时新特征表现出许多优异的特性,如:平稳噪声的特征值趋近于零;语音间隙噪声段和语音结束时呈现出负峰特性等。利用这些特性可以建立适用于不同类型、不同信噪比、且具有随机切入点的强鲁棒性能的VAD算法。文章详细阐述了新特征的原理以及特性,并结合判决准则构造了一个简单的VAD算法。实验结果表明,对于平稳噪声基于幅度谱HOS的VAD算法,在检测的准确性和算法鲁棒性的综合性能上优于基于传统特征的算法。展开更多
基金Supported by the National Natural Science Foundation of China (No. 60874060)
文摘This paper presents an improved Voice Activity Detection (VAD) algorithm which uses the Signal-to-Noise Ratio (SNR) measure. We assume that noise Power Spectral Density (PSD) in each spectral bin follows a Rayleigh distribution. Rayleigh distributions with its asymmetric tail characteristics give a better description of the noise PSD distribution than Gaussian distribution. Under this asstlmption, a new threshold updating expression is derived. Since the analytical integral of the false alarm probability, the threshold updating expression can be represented without the inverse complementary error function and low computational complexity is achieved in our system. Experimental results show that the proposed VAD outperforms or at least is comparable with the VAD scheme presented by Davis under several noise environments and has a lower computational complexity.
基金Supported by the National Basic Research Program of China (973 Program) (No.2007CB311104)
文摘The performance of the traditional Voice Activity Detection (VAD) algorithms declines sharply in lower Signal-to-Noise Ratio (SNR) environments. In this paper, a feature weighting likelihood method is proposed for noise-robust VAD. The contribution of dynamic features to likelihood score can be increased via the method, which improves consequently the noise robustness of VAD. Divergence based dimension reduction method is proposed for saving computation, which reduces these feature dimensions with smaller divergence value at the cost of degrading the performance a little. Experimental results on Aurora Ⅱ database show that the detection performance in noise environments can remarkably be improved by the proposed method when the model trained in clean data is used to detect speech endpoints. Using weighting likelihood on the dimension-reduced features obtains comparable, even better, performance compared to original full-dimensional feature.
基金supported by the National Youth Science Fund Project(61501052)the National Natural Science Foundation of China(61271182)
文摘Echo cancellation plays an important role in current Internet protocol(IP) based voice interactive systems. Voice state detection is an essential part in echo cancellation. It mainly comprises two parts: double talk detection(DTD) and voice activity detection(VAD). DTD is used to detect doubletalk and prevent filter divergence in the presence of near-end speech, and VAD is used to determine the near-end voice activity and output silence indicator when near-end is silent. However, DTD straightforwardly proceeded may mistakenly declare double talk under double silent condition, coefficients update under the far-end silence condition may lead to filter divergence, and current VAD algorithms may misjudge the residual echo from the near end to be far-end voice. Therefore, a voice detection algorithm combining DTD and far-end VAD is proposed. DTD is implemented when VAD declares far-end speech, filtering and coefficients update will be halted when VAD declares far-end silence, and the far-end VAD adopted is multi-feature VAD based on short-time energy and correlation. The new algorithm can improve the accuracy of DTD, prevent filter divergence, and exclude the circumstance that far-end signal only contains residual echo from near end. Actual test results show that the voice state decision of the new algorithm is accurate, and the performance of echo cancellation is improved.
文摘介绍DRTD系统中无线列调语音业务的音频回放技术。DRTD系统有线通信基于SIP协议和R T P流进行音频传输,通过混音、加窗语音检测、缓存、格式转换、信令控制等处理流程,将音频流在无线空口上进行传输,并最终在移动终端上实现语音波形回放,从而桥接无线列调中的有线通信和无线通信,为DRTD系统的核心业务提供支撑。
文摘在频域应用高阶统计量(High order statistics,HOS),提出一种基于幅度谱HOS新特征的语音端点检测(Voice activity detection,VAD)算法。算法利用相邻帧获取当前帧的统计信息,并用幅度谱构造独立零均值高斯随机序列,通过计算此序列的归一化偏度来得到HOS特征。新特征利用了噪声的长时平稳特性和无序性的先验信息,借用语音生成模型来分析噪声模型,并通过合理的假定,提取潜藏在幅度谱中的高斯信息。因此相比传统HOS特征只能用于高斯或准高斯白噪声检测,幅度谱HOS适用范围扩展到包括有色噪声在内的所有平稳随机噪声。同时新特征表现出许多优异的特性,如:平稳噪声的特征值趋近于零;语音间隙噪声段和语音结束时呈现出负峰特性等。利用这些特性可以建立适用于不同类型、不同信噪比、且具有随机切入点的强鲁棒性能的VAD算法。文章详细阐述了新特征的原理以及特性,并结合判决准则构造了一个简单的VAD算法。实验结果表明,对于平稳噪声基于幅度谱HOS的VAD算法,在检测的准确性和算法鲁棒性的综合性能上优于基于传统特征的算法。