摘要
传统的基于深度神经网络(DNN)的语音增强方法由于采用非因果形式的输入,在处理过程中具有固定延时,不适用于实时性要求较高的场合。针对这一问题,从网络结构角度展开研究,通过实验对不同网络结构在不同输入形式下的语音增强性能进行对比,寻找适用于因果形式输入的网络结构,在此基础上,结合卷积神经网络和长短期记忆网络建立一个能充分利用先前帧信息的因果语音增强模型。实验结果表明,该模型在提高基于DNN的语音增强方法实时性的同时,保证了语音增强性能,其PESQ与STOI得分分别为2.25和0.76。
The traditional speech enhancement method based on Deep Neural Network (DNN) has a fixed delay in processing due to its non-causal input,which is unsuitable for the real-time applications.To solve this problem,studying from the perspective of network structures,comparing the speech enhancement performance of different network structures under different input formats through experiments,the network structure suitable for the causal input is found in this paper.On this basis,by combining Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM),a causal speech enhancement model that can fully utilize the information of previous frames is established.Experimental results show that the proposed model is able to improve the real-time performance of the DNN-based speech enhancement method while ensuring the speech enhancement performance,whose PESQ and STOI scores are 2.25 and 0.76.
作者
袁文浩
梁春燕
夏斌
YUAN Wenhao;LIANG Chunyan;XIA Bin(School of Computer Science and Technology,Shandong University of Technology,Zibo,Shandong 255000,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2019年第8期255-259,共5页
Computer Engineering
基金
国家自然科学基金(61701286,11704229)
山东省自然科学基金(ZR2015FL003,ZR2017MF047,ZR2017LA011)
关键词
语音增强
因果形式输入
延时
深度神经网络
卷积神经网络
speech enhancement
causal input
delay
Deep Neural Network(DNN)
Convolutional Neural Network(CNN)