Aiming at the poor performance of speech signal detection at low signal-to-noise ratio(SNR),a method is proposed to detect active speech frames based on multi-window time-frequency(T-F)diagrams.First,the T-F diagram o...Aiming at the poor performance of speech signal detection at low signal-to-noise ratio(SNR),a method is proposed to detect active speech frames based on multi-window time-frequency(T-F)diagrams.First,the T-F diagram of the signal is calculated based on a multi-window T-F analysis,and a speech test statistic is constructed based on the characteristic difference between the signal and background noise.Second,the dynamic double-threshold processing is used for preliminary detection,and then the global double-threshold value is obtained using K-means clustering.Finally,the detection results are obtained by sequential decision.The experimental results show that the overall performance of the method is better than that of traditional methods under various SNR conditions and background noises.This method also has the advantages of low complexity,strong robustness,and adaptability to multi-national languages.展开更多
基金The National Natural Science Foundation of China(No.12174053,91938203,11674057,11874109)the Fundamental Research Funds for the Central Universities(No.2242021k30019).
文摘Aiming at the poor performance of speech signal detection at low signal-to-noise ratio(SNR),a method is proposed to detect active speech frames based on multi-window time-frequency(T-F)diagrams.First,the T-F diagram of the signal is calculated based on a multi-window T-F analysis,and a speech test statistic is constructed based on the characteristic difference between the signal and background noise.Second,the dynamic double-threshold processing is used for preliminary detection,and then the global double-threshold value is obtained using K-means clustering.Finally,the detection results are obtained by sequential decision.The experimental results show that the overall performance of the method is better than that of traditional methods under various SNR conditions and background noises.This method also has the advantages of low complexity,strong robustness,and adaptability to multi-national languages.