期刊文献+

结合注意力机制的改进U-Net网络在端到端语音增强中的应用 被引量:8

Application of improved U-Net network with attention mechanism in end-to-end speech enhancement
下载PDF
导出
摘要 设计了一个适用于端到端语音增强的改进的U-Net(Attention Dilated Convolution U-Net,ADC-U-Net)网络模型。与基线U-Net网络相比,一方面通过加入空洞卷积减小由采样带来的信息损失;另一方面引入了注意力机制结构,结合了含噪语音更多的上下文信息,提取更深层次和更丰富的特征信息。与传统语音增强方法相比,所提模型无需提取特征、对特征去噪、重构语音3个步骤,避免了对显性特征的依赖,转而由网络模型通过多层次多尺度学习获得隐性特征。用多个主客观指标对增强语音的质量和可懂度进行了评价。实验数据显示所提算法在噪声抑制能力和对噪声的适应度方面均表现出良好的性能,与基线U-Net网络及其它模型相比,展示了良好的语音质量和可懂度。 An improved U-Net(ADC-U-Net)network model for end-to-end speech enhancement is designed based on the U-Net network.Compared with the baseline U-Net network,on the one hand,the information loss caused by sampling is reduced by adding the void convolution.On the other hand,the attention mechanism structure is introduced,which combines more contextual information of noisy speech to extract deeper and richer feature information.Compared with traditional speech enhancement methods,the proposed model does not need three steps of feature extraction,feature denoising and speech reconstruction,and avoids the dependence on explicit features.Instead,the network model obtains implicit features through multi-level and multi-scale learning.The quality and intelligibility of enhanced speech are evaluated by several subjective and objective indexes.Experimental data show that the proposed algorithm performs well in noise suppression and adaptability.Compared with the baseline U-Net network and other models,the proposed algorithm demonstrates good speech quality and intelligibility.
作者 武瑞沁 陈雪勤 俞杰 王丽荣 赵鹤鸣 WU Ruiqin;CHEN Xueqin;YU Jie;WANG Lirong;ZHAO Heming(School of Electronic and Information Engineering,Soochow University,Suzhou 215006)
出处 《声学学报》 EI CAS CSCD 北大核心 2022年第2期266-275,共10页 Acta Acustica
基金 国家自然科学基金项目(61340004)资助。
  • 相关文献

参考文献4

二级参考文献19

  • 1Benesty J,Makino S,Chen J. Speech enhancement[M].Berlin Germany:Springer,2005.
  • 2Christian D.Sigg,Tomas Dikk,Joachim M.Buhmann. Speech enhancement with sparse coding in learned dictionaries[A].2010.4758-4761.
  • 3K.Wilson,B.Raj,P.Smaragdis,A.Divakaran. Speech denoising using nonnegative matrix factorization with priors[A].2008.4029-4032.
  • 4Christian D.Sigg,Tomas Dikk,Joachim M.Buhmann. Speech enhancement using generative dictionary Learning[J].IEEE Transactions on audio speech and language processing,2012,(06):1698-1712.
  • 5Mikkel N.Schmidt,Jan Larsen,Fu-Tien Hsiao. Wind noise reduction using non-negative sparse coding[A].2007.431-436.
  • 6Michal Aharon,Michael Elad,Alfred M. K-SVD and its Non-Negative Variant for Dictionary Design[A].SPIE,Belingham,WA,2005.
  • 7Michal Aharon,Michael Elad,Alfred M.Bruckstein. The K-SVD:An algorithm for designing of overcomplete dictionaries for sparse and representation[J].{H}IEEE Transactions on Signal Processing,2006,(11):4311-4322.
  • 8P.O.Hoyer. Non-negative matrix factorization with sparse constraints[J].{H}JOURNAL OF MACHINE LEARNING RESEARCH,2004.1457-1469.
  • 9P.O.Hoyer. Non-negative sparse coding[A].2002.557-565.
  • 10Ron Rubinstein,Alfred M.Bruckstein,Michael Elad. Dictionaries for sparse representation modeling[J].{H}IEEE Proceedings,2010,(06):1045-1057.

共引文献23

同被引文献63

引证文献8

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部