摘要
利用神经网络提高语音增强模型的性能与泛化能力。对语音信号做短时傅立叶变换并提取对数能量谱特征,使用卷积循环网络(CRN)进行拟合,理想比例掩膜(IRM)作为回归目标。在方法上与全连接层网络、RNNoise对比,在目标上将理想比例掩膜与直接映射(DM)对比。在未训练过的噪声各个信噪比(SNR)上平均提高主观质量评分0.55分。
The neural network is used to improve the performance and generalization ability of speech enhancement. We perform short-time Fourier transform on the speech signal and extract log-power spectral features. Convolutional recurrent network(CRN) model is used to predict the clean spectral features, and ideal ratio mask(IRM) is used as regression target. The CRN is compared to fully connected neural network and RNNoise. IRM is compared to direct mapping(DM). On each signal-to-noise ratio(SNR) of untrained noise, perceptual evaluation of speech quality score is improved by an average of 0.55 points.
作者
王金超
WANG Jinchao(College of Communication&Information Engineering,Shanghai University,Shanghai 200444,China)
出处
《微型电脑应用》
2021年第3期108-110,共3页
Microcomputer Applications
关键词
神经网络
语音增强
卷积循环网络
理想比例掩膜
neural network
speech enhancement
convolutional recurrent network
ideal ratio mask