This paper presents an improved Voice Activity Detection (VAD) algorithm which uses the Signal-to-Noise Ratio (SNR) measure. We assume that noise Power Spectral Density (PSD) in each spectral bin follows a Rayle...This paper presents an improved Voice Activity Detection (VAD) algorithm which uses the Signal-to-Noise Ratio (SNR) measure. We assume that noise Power Spectral Density (PSD) in each spectral bin follows a Rayleigh distribution. Rayleigh distributions with its asymmetric tail characteristics give a better description of the noise PSD distribution than Gaussian distribution. Under this asstlmption, a new threshold updating expression is derived. Since the analytical integral of the false alarm probability, the threshold updating expression can be represented without the inverse complementary error function and low computational complexity is achieved in our system. Experimental results show that the proposed VAD outperforms or at least is comparable with the VAD scheme presented by Davis under several noise environments and has a lower computational complexity.展开更多
The performance of the traditional Voice Activity Detection (VAD) algorithms declines sharply in lower Signal-to-Noise Ratio (SNR) environments. In this paper, a feature weighting likelihood method is proposed for...The performance of the traditional Voice Activity Detection (VAD) algorithms declines sharply in lower Signal-to-Noise Ratio (SNR) environments. In this paper, a feature weighting likelihood method is proposed for noise-robust VAD. The contribution of dynamic features to likelihood score can be increased via the method, which improves consequently the noise robustness of VAD. Divergence based dimension reduction method is proposed for saving computation, which reduces these feature dimensions with smaller divergence value at the cost of degrading the performance a little. Experimental results on Aurora Ⅱ database show that the detection performance in noise environments can remarkably be improved by the proposed method when the model trained in clean data is used to detect speech endpoints. Using weighting likelihood on the dimension-reduced features obtains comparable, even better, performance compared to original full-dimensional feature.展开更多
A novel technique is proposed to improve the performance of voice activity detection(VAD) by using deep belief networks(DBN) with a likelihood ratio(LR). The likelihood ratio is derived from the speech and noise spect...A novel technique is proposed to improve the performance of voice activity detection(VAD) by using deep belief networks(DBN) with a likelihood ratio(LR). The likelihood ratio is derived from the speech and noise spectral components that are assumed to follow the Gaussian probability density function(PDF). The proposed algorithm employs DBN learning in order to classify voice activity by using the input signal to calculate the likelihood ratio. Experiments show that the proposed algorithm yields improved results in various noise environments, compared to the conventional VAD algorithms. Furthermore, the DBN based algorithm decreases the detection probability of error with [0.7, 2.6] compared to the support vector machine based algorithm.展开更多
In this work, a novel voice activity detection (VAD) algorithm that uses speech absence probability (SAP) based on Teager energy (TE) was proposed for speech enhancement. The proposed method employs local SAP (...In this work, a novel voice activity detection (VAD) algorithm that uses speech absence probability (SAP) based on Teager energy (TE) was proposed for speech enhancement. The proposed method employs local SAP (LSAP) based on the TE of noisy speech as a feature parameter for voice activity detection (VAD) in each frequency subband, rather than conventional LSAP. Results show that the TE operator can enhance the abiTity to discriminate speech and noise and further suppress noise components. Therefore, TE-based LSAP provides a better representation of LSAP, resulting in improved VAD for estimating noise power in a speech enhancement algorithm. In addition, the presented method utilizes TE-based global SAP (GSAP) derived in each frame as the weighting parameter for modifying the adopted TE operator and improving its performance. The proposed algorithm was evaluated by objective and subjective quality tests under various environments, and was shown to produce better results than the conventional method.展开更多
We consider the problem of automated voice activity detection (VAD), in the presence of noise. To attain this objective, we introduce a Sequential Detection of Change Test (SDCT), designed at the independent mixture o...We consider the problem of automated voice activity detection (VAD), in the presence of noise. To attain this objective, we introduce a Sequential Detection of Change Test (SDCT), designed at the independent mixture of Laplacian and Gaussian distributions. We analyse and numerically evaluate the proposed test for various noisy environments. In addition, we address the problem of effectively recognizing the possible presence of cyber exploits in the voice transmission channel. We then introduce another sequential test, designed to detect rapidly and accurately the presence of such exploits, named Cyber Attacks Sequential Detection of Change Test (CA-SDCT). We analyse and numerically evaluate the latter test. Experimental results and comparisons with other proposed methods are also presented.展开更多
Aiming at the poor performance of speech signal detection at low signal-to-noise ratio(SNR),a method is proposed to detect active speech frames based on multi-window time-frequency(T-F)diagrams.First,the T-F diagram o...Aiming at the poor performance of speech signal detection at low signal-to-noise ratio(SNR),a method is proposed to detect active speech frames based on multi-window time-frequency(T-F)diagrams.First,the T-F diagram of the signal is calculated based on a multi-window T-F analysis,and a speech test statistic is constructed based on the characteristic difference between the signal and background noise.Second,the dynamic double-threshold processing is used for preliminary detection,and then the global double-threshold value is obtained using K-means clustering.Finally,the detection results are obtained by sequential decision.The experimental results show that the overall performance of the method is better than that of traditional methods under various SNR conditions and background noises.This method also has the advantages of low complexity,strong robustness,and adaptability to multi-national languages.展开更多
In speech signal processing systems,frame-energy based voice activity detection(VAD)method may be interfered with the background noise and non-stationary characteristic of the frame-energy in voice segment.The purpose...In speech signal processing systems,frame-energy based voice activity detection(VAD)method may be interfered with the background noise and non-stationary characteristic of the frame-energy in voice segment.The purpose of this paper is to improve the performance and robustness of VAD by introducing visual information.Meanwhile,data-driven linear transformation is adopted in visual feature extraction,and a general statistical VAD model is designed.Using the general model and a two-stage fusion strategy presented in this paper,a concrete multimodal VAD system is built.Experiments show that a 55.0%relative reduction in frame error rate and a 98.5%relative reduction in sentence-breaking error rate are obtained when using multimodal VAD,compared to frame-energy based audio VAD.The results show that using multimodal method,sentence-breaking errors are almost avoided,and frame-detection performance is clearly improved,which proves the effectiveness of the visual modal in VAD.展开更多
Echo cancellation plays an important role in current Internet protocol(IP) based voice interactive systems. Voice state detection is an essential part in echo cancellation. It mainly comprises two parts: double tal...Echo cancellation plays an important role in current Internet protocol(IP) based voice interactive systems. Voice state detection is an essential part in echo cancellation. It mainly comprises two parts: double talk detection(DTD) and voice activity detection(VAD). DTD is used to detect doubletalk and prevent filter divergence in the presence of near-end speech, and VAD is used to determine the near-end voice activity and output silence indicator when near-end is silent. However, DTD straightforwardly proceeded may mistakenly declare double talk under double silent condition, coefficients update under the far-end silence condition may lead to filter divergence, and current VAD algorithms may misjudge the residual echo from the near end to be far-end voice. Therefore, a voice detection algorithm combining DTD and far-end VAD is proposed. DTD is implemented when VAD declares far-end speech, filtering and coefficients update will be halted when VAD declares far-end silence, and the far-end VAD adopted is multi-feature VAD based on short-time energy and correlation. The new algorithm can improve the accuracy of DTD, prevent filter divergence, and exclude the circumstance that far-end signal only contains residual echo from near end. Actual test results show that the voice state decision of the new algorithm is accurate, and the performance of echo cancellation is improved.展开更多
Parkinson's disease(PD)is a neurodegenerative disorder characterized by motor and non-motor symptoms that significantly impact an individual's quality of life.Voice changes have shown promise as early indicato...Parkinson's disease(PD)is a neurodegenerative disorder characterized by motor and non-motor symptoms that significantly impact an individual's quality of life.Voice changes have shown promise as early indicators of PD,making voice analysis a valuable tool for early detection and intervention.This study aims to assess and detect the severity of PD through voice analysis using the mobile device voice recordings dataset.The dataset consisted of recordings from PD patients at different stages of the disease and healthy control subjects.A novel approach was employed,incorporating a voice activity detection algorithm for speech segmentation and the wavelet scattering transform for feature extraction.A Bayesian optimization technique is used to fine-tune the hyperparameters of seven commonly used classifiers and optimize the performance of machine learning classifiers for PD severity detection.AdaBoost and K-nearest neighbor consistently demonstrated superior performance across various evaluation metrics among the classifiers.Furthermore,a weighted majority voting(WMV)technique is implemented,leveraging the predictions of multiple models to achieve a near-perfect accuracy of 98.62%,improving classification accuracy.The results highlight the promising potential of voice analysis in PD diagnosis and monitoring.Integrating advanced signal processing techniques and machine learning models provides reliable and accessible tools for PD assessment,facilitating early intervention and improving patient outcomes.This study contributes to the field by demonstrating the effectiveness of the proposed methodology and the significant role of WMV in enhancing classification accuracy for PD severity detection.展开更多
介绍DRTD系统中无线列调语音业务的音频回放技术。DRTD系统有线通信基于SIP协议和R T P流进行音频传输,通过混音、加窗语音检测、缓存、格式转换、信令控制等处理流程,将音频流在无线空口上进行传输,并最终在移动终端上实现语音波形回放...介绍DRTD系统中无线列调语音业务的音频回放技术。DRTD系统有线通信基于SIP协议和R T P流进行音频传输,通过混音、加窗语音检测、缓存、格式转换、信令控制等处理流程,将音频流在无线空口上进行传输,并最终在移动终端上实现语音波形回放,从而桥接无线列调中的有线通信和无线通信,为DRTD系统的核心业务提供支撑。展开更多
基金Supported by the National Natural Science Foundation of China (No. 60874060)
文摘This paper presents an improved Voice Activity Detection (VAD) algorithm which uses the Signal-to-Noise Ratio (SNR) measure. We assume that noise Power Spectral Density (PSD) in each spectral bin follows a Rayleigh distribution. Rayleigh distributions with its asymmetric tail characteristics give a better description of the noise PSD distribution than Gaussian distribution. Under this asstlmption, a new threshold updating expression is derived. Since the analytical integral of the false alarm probability, the threshold updating expression can be represented without the inverse complementary error function and low computational complexity is achieved in our system. Experimental results show that the proposed VAD outperforms or at least is comparable with the VAD scheme presented by Davis under several noise environments and has a lower computational complexity.
基金Supported by the National Basic Research Program of China (973 Program) (No.2007CB311104)
文摘The performance of the traditional Voice Activity Detection (VAD) algorithms declines sharply in lower Signal-to-Noise Ratio (SNR) environments. In this paper, a feature weighting likelihood method is proposed for noise-robust VAD. The contribution of dynamic features to likelihood score can be increased via the method, which improves consequently the noise robustness of VAD. Divergence based dimension reduction method is proposed for saving computation, which reduces these feature dimensions with smaller divergence value at the cost of degrading the performance a little. Experimental results on Aurora Ⅱ database show that the detection performance in noise environments can remarkably be improved by the proposed method when the model trained in clean data is used to detect speech endpoints. Using weighting likelihood on the dimension-reduced features obtains comparable, even better, performance compared to original full-dimensional feature.
基金supported by the KERI Primary Research Program through the Korea Research Council for Industrial Science & Technology funded by the Ministry of Science,ICT and Future Planning (No.15-12-N0101-46)
文摘A novel technique is proposed to improve the performance of voice activity detection(VAD) by using deep belief networks(DBN) with a likelihood ratio(LR). The likelihood ratio is derived from the speech and noise spectral components that are assumed to follow the Gaussian probability density function(PDF). The proposed algorithm employs DBN learning in order to classify voice activity by using the input signal to calculate the likelihood ratio. Experiments show that the proposed algorithm yields improved results in various noise environments, compared to the conventional VAD algorithms. Furthermore, the DBN based algorithm decreases the detection probability of error with [0.7, 2.6] compared to the support vector machine based algorithm.
基金Project supported by Inha University Research GrantProject(10031764) supported by the Strategic Technology Development Program of Ministry of Knowledge Economy, Korea
文摘In this work, a novel voice activity detection (VAD) algorithm that uses speech absence probability (SAP) based on Teager energy (TE) was proposed for speech enhancement. The proposed method employs local SAP (LSAP) based on the TE of noisy speech as a feature parameter for voice activity detection (VAD) in each frequency subband, rather than conventional LSAP. Results show that the TE operator can enhance the abiTity to discriminate speech and noise and further suppress noise components. Therefore, TE-based LSAP provides a better representation of LSAP, resulting in improved VAD for estimating noise power in a speech enhancement algorithm. In addition, the presented method utilizes TE-based global SAP (GSAP) derived in each frame as the weighting parameter for modifying the adopted TE operator and improving its performance. The proposed algorithm was evaluated by objective and subjective quality tests under various environments, and was shown to produce better results than the conventional method.
文摘We consider the problem of automated voice activity detection (VAD), in the presence of noise. To attain this objective, we introduce a Sequential Detection of Change Test (SDCT), designed at the independent mixture of Laplacian and Gaussian distributions. We analyse and numerically evaluate the proposed test for various noisy environments. In addition, we address the problem of effectively recognizing the possible presence of cyber exploits in the voice transmission channel. We then introduce another sequential test, designed to detect rapidly and accurately the presence of such exploits, named Cyber Attacks Sequential Detection of Change Test (CA-SDCT). We analyse and numerically evaluate the latter test. Experimental results and comparisons with other proposed methods are also presented.
基金The National Natural Science Foundation of China(No.12174053,91938203,11674057,11874109)the Fundamental Research Funds for the Central Universities(No.2242021k30019).
文摘Aiming at the poor performance of speech signal detection at low signal-to-noise ratio(SNR),a method is proposed to detect active speech frames based on multi-window time-frequency(T-F)diagrams.First,the T-F diagram of the signal is calculated based on a multi-window T-F analysis,and a speech test statistic is constructed based on the characteristic difference between the signal and background noise.Second,the dynamic double-threshold processing is used for preliminary detection,and then the global double-threshold value is obtained using K-means clustering.Finally,the detection results are obtained by sequential decision.The experimental results show that the overall performance of the method is better than that of traditional methods under various SNR conditions and background noises.This method also has the advantages of low complexity,strong robustness,and adaptability to multi-national languages.
文摘In speech signal processing systems,frame-energy based voice activity detection(VAD)method may be interfered with the background noise and non-stationary characteristic of the frame-energy in voice segment.The purpose of this paper is to improve the performance and robustness of VAD by introducing visual information.Meanwhile,data-driven linear transformation is adopted in visual feature extraction,and a general statistical VAD model is designed.Using the general model and a two-stage fusion strategy presented in this paper,a concrete multimodal VAD system is built.Experiments show that a 55.0%relative reduction in frame error rate and a 98.5%relative reduction in sentence-breaking error rate are obtained when using multimodal VAD,compared to frame-energy based audio VAD.The results show that using multimodal method,sentence-breaking errors are almost avoided,and frame-detection performance is clearly improved,which proves the effectiveness of the visual modal in VAD.
基金supported by the National Youth Science Fund Project(61501052)the National Natural Science Foundation of China(61271182)
文摘Echo cancellation plays an important role in current Internet protocol(IP) based voice interactive systems. Voice state detection is an essential part in echo cancellation. It mainly comprises two parts: double talk detection(DTD) and voice activity detection(VAD). DTD is used to detect doubletalk and prevent filter divergence in the presence of near-end speech, and VAD is used to determine the near-end voice activity and output silence indicator when near-end is silent. However, DTD straightforwardly proceeded may mistakenly declare double talk under double silent condition, coefficients update under the far-end silence condition may lead to filter divergence, and current VAD algorithms may misjudge the residual echo from the near end to be far-end voice. Therefore, a voice detection algorithm combining DTD and far-end VAD is proposed. DTD is implemented when VAD declares far-end speech, filtering and coefficients update will be halted when VAD declares far-end silence, and the far-end VAD adopted is multi-feature VAD based on short-time energy and correlation. The new algorithm can improve the accuracy of DTD, prevent filter divergence, and exclude the circumstance that far-end signal only contains residual echo from near end. Actual test results show that the voice state decision of the new algorithm is accurate, and the performance of echo cancellation is improved.
文摘Parkinson's disease(PD)is a neurodegenerative disorder characterized by motor and non-motor symptoms that significantly impact an individual's quality of life.Voice changes have shown promise as early indicators of PD,making voice analysis a valuable tool for early detection and intervention.This study aims to assess and detect the severity of PD through voice analysis using the mobile device voice recordings dataset.The dataset consisted of recordings from PD patients at different stages of the disease and healthy control subjects.A novel approach was employed,incorporating a voice activity detection algorithm for speech segmentation and the wavelet scattering transform for feature extraction.A Bayesian optimization technique is used to fine-tune the hyperparameters of seven commonly used classifiers and optimize the performance of machine learning classifiers for PD severity detection.AdaBoost and K-nearest neighbor consistently demonstrated superior performance across various evaluation metrics among the classifiers.Furthermore,a weighted majority voting(WMV)technique is implemented,leveraging the predictions of multiple models to achieve a near-perfect accuracy of 98.62%,improving classification accuracy.The results highlight the promising potential of voice analysis in PD diagnosis and monitoring.Integrating advanced signal processing techniques and machine learning models provides reliable and accessible tools for PD assessment,facilitating early intervention and improving patient outcomes.This study contributes to the field by demonstrating the effectiveness of the proposed methodology and the significant role of WMV in enhancing classification accuracy for PD severity detection.
文摘介绍DRTD系统中无线列调语音业务的音频回放技术。DRTD系统有线通信基于SIP协议和R T P流进行音频传输,通过混音、加窗语音检测、缓存、格式转换、信令控制等处理流程,将音频流在无线空口上进行传输,并最终在移动终端上实现语音波形回放,从而桥接无线列调中的有线通信和无线通信,为DRTD系统的核心业务提供支撑。