期刊文献+

基于卷积门控循环神经网络的语音增强方法 被引量:9

Speech enhancement method based on convolutional gated recurrent neural network
原文传递
导出
摘要 为了进一步提高基于深度神经网络的语音增强方法的性能,针对单独使用卷积神经网络难以对含噪语音中的长期依赖关系进行建模的问题,提出一种基于卷积门控循环神经网络的语音增强方法.该方法首先采用卷积神经网络提取含噪语音中的局部特征,然后采用门控循环神经网络将含噪语音中不同时间段的局部特征进行关联,通过结合两种网络的不同特性,在语音增强中更好地利用含噪语音中的上下文信息.实验结果表明:该方法能够有效提高未知噪声条件下的语音增强性能,增强后的语音具有更好的语音质量和可懂度. In order to further improve the performance of speech enhancement methods based on deep neural networks,a speech enhancement method based on the convolutional gated recurrent neural network was proposed for the problem that it is difficult to model long-term dependencies in noisy speech using convolutional neural networks alone.First,the local feature of noisy speech was extracted using a convolutional neural network,and then the local feature in different time periods was correlated using a gated recurrent neural network.By combining the different characteristics of these two networks,the method made full use of the contextual information in noisy speech in speech enhancement.Experimental results show that the method can effectively improve the speech enhancement performance under unknown noise conditions,and the enhanced speech has better speech quality and intelligibility.
作者 袁文浩 娄迎曦 夏斌 孙文珠 YUAN Wenhao;LOU Yingxi;XIA Bin;SUN Wenzhu(College of Computer Science and Technology,Shandong University of Technology,Zibo 255000,Shandong China)
出处 《华中科技大学学报(自然科学版)》 EI CAS CSCD 北大核心 2019年第4期13-18,共6页 Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金 国家自然科学基金青年基金资助项目(61701286 11704229) 山东省自然科学基金资助项目(ZR2015FL003 ZR2017MF047 ZR2017LA011 ZR2017LF004)
关键词 语音增强 深度学习 卷积神经网络 循环神经网络 局部特征 speech enhancement deep learning convolutional neural network recurrent neural network local feature
  • 相关文献

参考文献2

二级参考文献67

  • 1Kim G, Lu Y, Hu Y, Loizou P C. An algorithm that im- proves speech intelligibility in noise for normal-hearing lis- teners. The Journal of the Acoustical Society of America, 2009, 126(3): 1486-1494.
  • 2Dillon H. Hearing Aids. New York: Thieme, 2001.
  • 3Allen J B. Articulation and intelligibility. Synthesis Lectures on Speech and Audio Processing, 2005, 1(1): 1-124.
  • 4Seltzer M L, Raj B, Stern R M. A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition. Speech Communication, 2004, 43(4): 379-393.
  • 5Weninger F, Erdogan H, Watanabe S, Vincent E, Le Roux J, Hershey J R, Schuller B. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation. Liberec, Czech Republic: Springer International Publishing, 2015.91 -99.
  • 6Weng C, Yu D, Seltzer M L, Droppo J. Deep neural networks for single-channel multi-talker speech recognition. IEEE/ ACM Transactions on Audio, Speech, and Language Pro- cessing, 2015, 23(10): 1670-1679.
  • 7Boll S F. Suppression of acoustic noise in speech using spec- tral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979, 27(2): 113-120.
  • 8Chen J D, Benesty J, Huang Y T, Doclo S. New insights into the noise reduction wiener filter. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4): 1218 -1234.
  • 9Loizou P C. Speech Enhancement: Theory and Practice. New York: CRC Press, 2007.
  • 10Liang S, Liu W J, Jiang W. A new Bayesian method incor- porating with local correlation for IBM estimation. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(3): 476-487.

共引文献94

同被引文献98

引证文献9

二级引证文献35

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部