期刊文献+

基于子带双特征的自适应保留似然比鲁棒语音检测算法 被引量:1

Adaptively Reserved Likelihood Ratio-based Robust Voice Activity Detection with Sub-band Double Features
下载PDF
导出
摘要 为了进一步提高低信噪比下语音激活检测(VAD)的准确率,该文提出一种基于子带双特征的自适应保留似然比鲁棒语音激活检测算法。算法采用子带归一化最大自相关函数与子带归一化平均过零率双重特征设置频率分量似然比的保留权值,同时利用已过去固定时长的VAD判决结果及对应的子带特征参数自适应地估计似然比的保留阈值。实验结果表明,此算法的VAD检测准确率相比原保留似然比算法在10 d B,0 d B和-10 d B平稳白噪声下分别提高了1.2%,7.2%和8.1%,在10 d B和0 d B非平稳Babble噪声下分别提高了1.6%和3.4%。当其被用于2.4 kbps低速率声码器系统时,合成语音的感知语音质量评价(PESQ)比原声码器系统在白噪声下提高了0.098~0.153,在Babble噪声下提高了0.157~0.186。 In order to improve the correct rate of Voice Activity Detection (VAD) in low Signal Noise Ratio (SNR) environment, the paper presents an adaptive reserved likelihood ratio VAD method, which is based on sub-band double features. The method employs sub-band auto correlate function and sub-band zero crossing rate in the process of setting reserved weight. Reserved threshold is estimated adaptively according to the passed VAD results and their sub-band feature parameters. The experiment shows its promising performance in comparison with similar algorithms, the VAD correct rate is improved by 1.2%, 7.2%, and 8.1% respectively in 10 dB, 0 dB, and -10 dB stationary white noisy environment, 1.6% and 3.4% respectively in 10 dB and 0 dB non-stationary Babble noisy environment. The method is also applied to 2.4 kbps low bit rate vocoder and the Perceptual Evaluation of Speech Quality (PESQ) is improved by 0.098-0.153 in white noisy environment, 0.157-0.186 in Babble noisy environment.
出处 《电子与信息学报》 EI CSCD 北大核心 2016年第11期2879-2886,共8页 Journal of Electronics & Information Technology
基金 国家自然科学基金(61571192) 广东省公益项目(2015A010103003) 中央高校基本科研业务费项目华南理工大学(2015ZM143)
关键词 语音激活检测 似然比 低信噪比 子带过零率 Voice Activity Detection (VAD) Likelihood ratio Low signal noise ratio Sub-band zero crossing rate
  • 相关文献

参考文献20

  • 1SREEKUMAR K T, GEORGE K K, ARUNRAJ K, et al. Spectral matching based voice activity detector for improved speaker recognition[C].2014 International Conference on Power Signals Control and Computations (EPSCICON), Thrissur, 2014: 1-4. doi: 10.1109/EPSCICON.2014.6887507.
  • 2DUTA C L, GHEORGHE L, and TAPUS N. Real time implementation of MELP speech compression algorithm using Blackfin processors[C].2015 9th International Symposium on Image and Signal Processing and Analysis (ISPA), Zagreb, 2015: 250-255. doi: 10.1109/ISPA.2015. 7306067.
  • 3CHUL Y I, HYEONTAEK L, and DONGSUK Y. Formant-based robust voice activity detection[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(12): 2238-2245. doi: 10.1109/TASLP. 2015.2476762.
  • 4JONGSEO S, NAM SOO K, and WONYONG S. A statistical model-based voice activity detection[J].IEEE Signal Processing Letters, 1999, 6(1): 1-3. doi: 10.1109/97. 736233.
  • 5DUK C Y, AL-NAIMI K, and KONDOZ A. Improved voice activity detection based on a smoothed statistical likelihood ratio[C].2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Salt Lake City, 2001: 737-740. doi: 10.1109/ICASSP.2001.941020.
  • 6RAMIREZ J, SEGURA J, BENITEZ C, et al. Statistical voice activity detection using a multiple observation likelihood ratio test[J].IEEE Signal Process Letters, 2005, 12(10): 689-692. doi: 10.1109/LSP.2005.855551.
  • 7RAMIREZ J, SEGURA J C, GORRIZ J M, et al. Improved voice activity detection using contextual multiple hypothesis testing for robust speech recognition[J].IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(8): 2177-2189. doi: 10.1109/TASL.2007.903937.
  • 8ICK K S, HAING J Q, and HYUK C J. Discriminative weight training for a statistical model-based voice activity detection[J].IEEE Signal Processing Letters, 2008, 15: 170-173. doi: 10.1109/LSP.2007.913595.
  • 9YOUNGJOO S and HOIRIN K. Multiple acoustic model-based discriminative likelihood ratio weighting for voice activity detection[J].Signal Processing Letters, 2012, 19(8): 507-510. doi: 10.1109/LSP.2012.2204978.
  • 10FERRONI G, BONFIGLI R, PRINCIPI E, et al. A deep neural network approach for voice activity detection in multi-room domestic scenarios[C].2015 International Joint Conference on Neural Networks (IJCNN), Killarney, 2015: 1-8. doi: 10.1109/IJCNN.2015.7280510.

二级参考文献11

  • 1庞程,李晓飞,刘宏.基于MFCC与基频特征贡献度识别说话人性别[J].华中科技大学学报(自然科学版),2013,41(S1):108-111. 被引量:8
  • 2Minotto V P,Jung C R,Lee B.Simultaneous-speaker voice activity detection and localization using midfusion of SVM and HMMs[J].IEEE Transactions on Multimedia,2014,16(4):1032-1044.
  • 3Gao D,Zhao X Q.A 600bps MELP-based speech quantization scheme for underwater acoustic channels[C]∥Proc of 2013Fifth International Conference on Computational and Information Sciences(ICCIS).Washington:IEEE,2013:1983-1986.
  • 4Mousazadeh S,Cohen I.Voice activity detection in presence of transient noise using spectral clustering[J].IEEE Transactions on Audio Speech and Language Processing,2013,21(6):1261-1271.
  • 5Jongseo S,Nam S K,Wonyong S.A statistical model-based voice activity detection[J].IEEE Signal Processing Letters,1999:6(1):1-3.
  • 6Ramirez J,Segura J C,Benitez C,et al.Statistical voice activity detection using a multiple observation likelihood ratio test[J].IEEE Signal Processing Letters,2005,12(10):689-692.
  • 7Ramirez J,Segura J C,Gorriz J M,et al.Improved voice activity detection using contextual multiple hypothesis testing for robust speech recognition[J].IEEE Transactions on Audio,Speech and Language Processing,2007,15(8):2177-2189.
  • 8Sang I K,Jo Q H,Joon H C.Discriminative weight training for a statistical model-based voice activity detection[J].IEEE Signal Processing Letters,2008,15:170-173.
  • 9Young J S,Hoirin K.Multiple acoustic model-based discriminative likelihood ratio weighting for voice activity detection[J].IEEE Signal Processing Letters,2012,19(8):507-510.
  • 10Gerkmann T,Hendriks R C.Unbiased MMSEbased noise power estimation with low complexity and low tracking delay[J].IEEE Transactions on Audio,Speech and Language Processing,2012,20(4):1383-1393.

共引文献1

同被引文献4

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部