摘要
当前大多数基于深度神经网络(DNN)的语音增强算法使用的均方误差代价函数没有充分利用人耳听觉感知特性,且语音可懂度没有必要关联性,为此提出一种端到端的基于DNN的语音增强框架,并将基于频域加权分段信噪比的感知相关代价函数作为优化目标来训练DNN;在此基础上,将频域加权分段信噪比和感知加权均方误差相结合,提出一种联合优化代价函数用于训练DNN,改善人耳对含噪语音的听觉感知。实验结果表明,采用深度神经网络对含噪语音进行去噪时,通过将感知相关代价函数整合到数据驱动的模型学习中,语音质量和语音可懂度显著提高。
To solve the problem that the mean square error(MSE)cost function used by most current deep neural networks-based speech enhancement algorithms does not make full use of human auditory perception properties,and that it does not necessarily correlate with speech intelligibility,an end-to-end DNN-based speech enhancement framework was proposed,and a perceptual related cost function based on the frequency-weighted segmental SNR was employed as an optimization objective to train the DNN.Based on these,a joint optimization cost function was proposed by combining the frequency-weighted segmental SNR and the perceptual weighted mean-squared error to improve the human perception of noisy speech.Experimental results show that when DNN is used to enhance noisy speech,by integrating perceptual related cost function in data-driven model learning,speech quality and speech intelligibility are significantly improved.
作者
房慧保
马建芬
田玉玲
张朝霞
FANG Hui-bao;MA Jian-fen;TIAN Yu-ling;ZHANG Chao-xia(College of Information and Computer,Taiyuan University of Technology,Jinzhong 030600,China;College of Physics and Optoelectronics,Taiyuan University of Technology,Jinzhong 030600,China)
出处
《计算机工程与设计》
北大核心
2020年第11期3212-3217,共6页
Computer Engineering and Design
基金
国家自然科学基金面上基金项目(61472271)
山西省面上自然基金项目(201701D121009)
山西省重点研发计划(高新技术领域)基金项目(201803D121057)。
关键词
深度神经网络
语音增强
代价函数
频域加权分段信噪比
感知加权均方误差
deep neural networks
speech enhancement
cost function
frequency-weighted segmental SNR
perceptual weighted mean-squared error