期刊文献+

基于感知相关代价函数的深度学习语音增强 被引量:4

Deep learning speech enhancement algorithm based on perceptual related cost function
下载PDF
导出
摘要 当前大多数基于深度神经网络(DNN)的语音增强算法使用的均方误差代价函数没有充分利用人耳听觉感知特性,且语音可懂度没有必要关联性,为此提出一种端到端的基于DNN的语音增强框架,并将基于频域加权分段信噪比的感知相关代价函数作为优化目标来训练DNN;在此基础上,将频域加权分段信噪比和感知加权均方误差相结合,提出一种联合优化代价函数用于训练DNN,改善人耳对含噪语音的听觉感知。实验结果表明,采用深度神经网络对含噪语音进行去噪时,通过将感知相关代价函数整合到数据驱动的模型学习中,语音质量和语音可懂度显著提高。 To solve the problem that the mean square error(MSE)cost function used by most current deep neural networks-based speech enhancement algorithms does not make full use of human auditory perception properties,and that it does not necessarily correlate with speech intelligibility,an end-to-end DNN-based speech enhancement framework was proposed,and a perceptual related cost function based on the frequency-weighted segmental SNR was employed as an optimization objective to train the DNN.Based on these,a joint optimization cost function was proposed by combining the frequency-weighted segmental SNR and the perceptual weighted mean-squared error to improve the human perception of noisy speech.Experimental results show that when DNN is used to enhance noisy speech,by integrating perceptual related cost function in data-driven model learning,speech quality and speech intelligibility are significantly improved.
作者 房慧保 马建芬 田玉玲 张朝霞 FANG Hui-bao;MA Jian-fen;TIAN Yu-ling;ZHANG Chao-xia(College of Information and Computer,Taiyuan University of Technology,Jinzhong 030600,China;College of Physics and Optoelectronics,Taiyuan University of Technology,Jinzhong 030600,China)
出处 《计算机工程与设计》 北大核心 2020年第11期3212-3217,共6页 Computer Engineering and Design
基金 国家自然科学基金面上基金项目(61472271) 山西省面上自然基金项目(201701D121009) 山西省重点研发计划(高新技术领域)基金项目(201803D121057)。
关键词 深度神经网络 语音增强 代价函数 频域加权分段信噪比 感知加权均方误差 deep neural networks speech enhancement cost function frequency-weighted segmental SNR perceptual weighted mean-squared error
  • 相关文献

参考文献1

二级参考文献8

  • 1Comon P, Jutten C. Handbook of Blind Source Separation: Independent Component Analysis and Applications[J]. IEEE Signal Processing Magazine, 2010, 30(2):133-134.
  • 2Yilmaz O, Rickard S. Blind Separation of Speech Mix- tures via Time-Frequency Masking[J]. IEEE Transactions on Signal Processing, 2004, 52(7): 1830-1847.
  • 3Nehorai A, Paldi E. Acoustic Vector-Sensor Array Proc- essing[J]. IEEE Transaction on Signal Processing, 1994, 42(9): 2481-2489.
  • 4Shujau M, Ritz C H, Burnet I S. Separation of Speech Sources using an Acoustic Vector Sensor[C]//IEEE Interna- tional Workshop on Multimedia Signal Processing, 2011.
  • 5Gunel B, Hachabiboglu H, Kondoz A M. Acoustic Source Separation of Convolutive Mixtures based on Intensity Vector Statistics[J]. IEEE Transactions on Audio, Speech and Language Processing, 2008, 16(4): 748-756.
  • 6Hung W L, Chang-Chien S J, Yang M S. Self-Updating Clustering Algorithm for Estimating the Parameters in Mixtures of von Mises Distributions[J]. Journal of App- lied Statistics, 2012, 39(10): 2259-2274.
  • 7Allen J B, Berkley D A. Image Method for Efficiently Simulating Small-Room Acoustics[J]. Journal of the Ac- oustical Society of America, 1979, 65(4): 943-950.
  • 8Thiede T, Treurniet W C, Bitto R. PEAQ-The ITU Stan- dard for Objective Measurement of Perceived Audio Qu- ality[J]. Journal of the Audio Engineering Society, 2000, 48(1): 3-29.

同被引文献44

引证文献4

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部