期刊文献+

Time-frequency mask estimation-based speech enhancement using deep encoder-decoder neural network

原文传递
导出
摘要 A time-frequency mask estimation using deep encoder-decoder neural network for speech enhancement is presented.The mask estimation is learned implicitly by the deep encoder-decoder neural network and.jointed with the time-frequency representation of the noisy speech to learn the nonlinear mapping function between the noisy and target speech.The deep encoder-decoder neural network employs convolution and de-nonvolution structure.The convolution encoder makes use of the local perception characteristic of convolution network to model the typical structural features of noisy speech in the time-frequency domain.Speech features are extracted and the influence of background noise is suppressed.At the decoder end,the speech signal is reconstructed from the extracted speech features in the encoder end and the local details are recovered layer by layer.Meanwhile,skip connections are introduced between homologous layers to circumvent the low level details losing problem induced by pooling and down-sampling operations.Experiments are conducted on the TIMIT dataset and the results demonstrate that the proposed method can effectively suppress noise and recover the detailed information of speech.
出处 《Chinese Journal of Acoustics》 CSCD 2021年第1期141-154,共14页 声学学报(英文版)
基金 supported by the National Natural Science Foundation of China (61471394,62071484) the Natural Science Foundation of Jiangsu Province for Excellent Young Scholars (BK20180080)。
关键词 DECODER network NEURAL
  • 相关文献

参考文献2

二级参考文献5

共引文献37

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部