摘要
针对传统深度神经网络语音增强算法未区分在不同信噪比环境中去噪的侧重点,同时考虑到预测相位信息对于语音增强的重要性,提出了时频掩码优化的两阶段语音增强算法。第一阶段将带噪语音的幅度谱特征输入深度神经网络进行训练,预测得到干净语音幅度谱和噪声幅度谱;第二阶段通过信噪比信息估计增益系数,以控制残留噪声和语音失真之间的平衡;同时,计算带噪语音和纯净语音的相位偏差来协助预测语音频谱,将增益系数和相位偏差引入时频掩码函数,优化网络训练模型,以更好地预测纯净语音幅度谱。实验结果表明,相比优化前的算法,该方法增强后语音的语音质量感知指标平均提高0.22,语音可懂度指标平均提高0.027,更好地去除了噪声,降低了语音失真。
Aiming at the presence of the focus of denoising in different signal-to-noise ratio environments based on traditional Deep Neural Network(DNN),at the same time,considering the importance of phase information prediction for speech enhancement,a two-stage speech enhancement method with time-frequency mask optimization is proposed.The first stage extracts the amplitude spectrum characteristics of the noisy speech,and DNN is used to train to obtain the predicted amplitude spectrum of clean speech and noise.The SNR information is used to estimate the gain coefficient to control the difference between residual noise and speech distortion in the second stage.At the same time,calculate the phase deviation of noisy speech and pure speech to help predict the speech spectrum.The gain coefficient and phase deviation are substituted into the time-frequency mask function to optimize the network training model andcalculate the pure speech amplitude spectrum.Comparing with the algorithm before optimization,the experimental results show that after the method is enhanced,the Perceptual Evaluation of Speech Quality(PESQ)index of the speech is improved by an average of 0.22,and the Short Time Objective Intelligibility(STOI)index is increased by 0.027 an average,which can better remove noise and reduce speech distortion.
作者
郑莉
李鸿燕
ZHENG Li;LI Hongyan(College of Information and Computer,Taiyuan University of Technology,Yuci 030600,China)
出处
《电子设计工程》
2022年第4期17-21,共5页
Electronic Design Engineering
基金
山西省自然科学基金资助项目(201701D121058)
山西省回国留学人员科研资助项目(2020-042)。