期刊文献+

基于深度神经网络的因果形式语音增强模型 被引量:4

Causal Speech Enhancement Model Based on Deep Neural Network
下载PDF
导出
摘要 传统的基于深度神经网络(DNN)的语音增强方法由于采用非因果形式的输入,在处理过程中具有固定延时,不适用于实时性要求较高的场合。针对这一问题,从网络结构角度展开研究,通过实验对不同网络结构在不同输入形式下的语音增强性能进行对比,寻找适用于因果形式输入的网络结构,在此基础上,结合卷积神经网络和长短期记忆网络建立一个能充分利用先前帧信息的因果语音增强模型。实验结果表明,该模型在提高基于DNN的语音增强方法实时性的同时,保证了语音增强性能,其PESQ与STOI得分分别为2.25和0.76。 The traditional speech enhancement method based on Deep Neural Network (DNN) has a fixed delay in processing due to its non-causal input,which is unsuitable for the real-time applications.To solve this problem,studying from the perspective of network structures,comparing the speech enhancement performance of different network structures under different input formats through experiments,the network structure suitable for the causal input is found in this paper.On this basis,by combining Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM),a causal speech enhancement model that can fully utilize the information of previous frames is established.Experimental results show that the proposed model is able to improve the real-time performance of the DNN-based speech enhancement method while ensuring the speech enhancement performance,whose PESQ and STOI scores are 2.25 and 0.76.
作者 袁文浩 梁春燕 夏斌 YUAN Wenhao;LIANG Chunyan;XIA Bin(School of Computer Science and Technology,Shandong University of Technology,Zibo,Shandong 255000,China)
出处 《计算机工程》 CAS CSCD 北大核心 2019年第8期255-259,共5页 Computer Engineering
基金 国家自然科学基金(61701286,11704229) 山东省自然科学基金(ZR2015FL003,ZR2017MF047,ZR2017LA011)
关键词 语音增强 因果形式输入 延时 深度神经网络 卷积神经网络 speech enhancement causal input delay Deep Neural Network(DNN) Convolutional Neural Network(CNN)
  • 相关文献

参考文献4

二级参考文献77

  • 1邹霞,陈亮,张雄伟.基于Gamma语音模型的语音增强算法[J].通信学报,2006,27(10):118-123. 被引量:11
  • 2Kim G, Lu Y, Hu Y, Loizou P C. An algorithm that im- proves speech intelligibility in noise for normal-hearing lis- teners. The Journal of the Acoustical Society of America, 2009, 126(3): 1486-1494.
  • 3Dillon H. Hearing Aids. New York: Thieme, 2001.
  • 4Allen J B. Articulation and intelligibility. Synthesis Lectures on Speech and Audio Processing, 2005, 1(1): 1-124.
  • 5Seltzer M L, Raj B, Stern R M. A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition. Speech Communication, 2004, 43(4): 379-393.
  • 6Weninger F, Erdogan H, Watanabe S, Vincent E, Le Roux J, Hershey J R, Schuller B. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation. Liberec, Czech Republic: Springer International Publishing, 2015.91 -99.
  • 7Weng C, Yu D, Seltzer M L, Droppo J. Deep neural networks for single-channel multi-talker speech recognition. IEEE/ ACM Transactions on Audio, Speech, and Language Pro- cessing, 2015, 23(10): 1670-1679.
  • 8Boll S F. Suppression of acoustic noise in speech using spec- tral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979, 27(2): 113-120.
  • 9Chen J D, Benesty J, Huang Y T, Doclo S. New insights into the noise reduction wiener filter. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4): 1218 -1234.
  • 10Loizou P C. Speech Enhancement: Theory and Practice. New York: CRC Press, 2007.

共引文献1828

同被引文献22

引证文献4

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部