期刊文献+

噪声情况下采用稀疏非负矩阵分解与深度吸引子网络的人声分离算法 被引量:4

Monaural noisy speech separation combining sparse non-negative matrix factorization and deep attractor network
下载PDF
导出
摘要 为实现噪声情况下的人声分离,提出了一种采用稀疏非负矩阵分解与深度吸引子网络的单通道人声分离算法。首先,通过训练得到人声与噪声的字典矩阵,将其作为先验信息从带噪混合语音中分离出人声与噪声的系数矩阵;然后,根据人声系数矩阵中不同的声源成分在嵌入空间中的相似性不同,使用深度吸引子网络将其分离为各声源语音的系数矩阵;最后,使用分离得到的各语音系数矩阵与人声的字典矩阵重构干净的分离语音。在不同噪声情况下的实验结果表明,本文算法能够在抑制背景噪声的同时提高分离语音的整体质量,优于结合声噪人声分离模型的对比算法。 The performance of monaural speech separation method is limited when the speech mixture is corrupted by background noise.To obtain the enhanced separated speeches from the noisy mixture,a monaural noisy speech separation method combining Sparse Non-negative Matrix Factorization(SNMF) and Deep Attractor Network(DANet)is proposed.This method firstly decomposes the noisy mixture into coefficients of speech and noise signal.Then the speech coefficient is projected to a high-dimensional embedding space and a DANet is trained to force the embeddings to move to different clusters.The attractor points are used to separate the speech coefficients by masking method,and finally the enhanced separated speeches are reconstructed by the speech basis and their corresponding coefficients.Experimental results in various background noise environments show that the proposed algorithm effectively suppress the noises without decreasing the speech quality of reconstructed speeches by comparison with different baseline methods.
作者 葛宛营 张天骐 范聪聪 张天 GE Wanying;ZHANG Tianqi;FAN Congcong;ZHANG Tian(School of Communication and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065)
出处 《声学学报》 EI CAS CSCD 北大核心 2021年第1期55-66,共12页 Acta Acustica
基金 国家自然科学基金项目(61671095,61371164,61702065,61701067,61771085) 信号与信息处理重庆市市级重点实验室建设项目(CSTC2009CA2003) 重庆市研究生科研创新项目(CYS17219) 重庆市教育委员会科研项目(KJ130524,KJ1600427,KJ1600429)资助。
  • 相关文献

参考文献6

二级参考文献80

  • 1Kim G, Lu Y, Hu Y, Loizou P C. An algorithm that im- proves speech intelligibility in noise for normal-hearing lis- teners. The Journal of the Acoustical Society of America, 2009, 126(3): 1486-1494.
  • 2Dillon H. Hearing Aids. New York: Thieme, 2001.
  • 3Allen J B. Articulation and intelligibility. Synthesis Lectures on Speech and Audio Processing, 2005, 1(1): 1-124.
  • 4Seltzer M L, Raj B, Stern R M. A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition. Speech Communication, 2004, 43(4): 379-393.
  • 5Weninger F, Erdogan H, Watanabe S, Vincent E, Le Roux J, Hershey J R, Schuller B. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation. Liberec, Czech Republic: Springer International Publishing, 2015.91 -99.
  • 6Weng C, Yu D, Seltzer M L, Droppo J. Deep neural networks for single-channel multi-talker speech recognition. IEEE/ ACM Transactions on Audio, Speech, and Language Pro- cessing, 2015, 23(10): 1670-1679.
  • 7Boll S F. Suppression of acoustic noise in speech using spec- tral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979, 27(2): 113-120.
  • 8Chen J D, Benesty J, Huang Y T, Doclo S. New insights into the noise reduction wiener filter. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4): 1218 -1234.
  • 9Loizou P C. Speech Enhancement: Theory and Practice. New York: CRC Press, 2007.
  • 10Liang S, Liu W J, Jiang W. A new Bayesian method incor- porating with local correlation for IBM estimation. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(3): 476-487.

共引文献99

同被引文献13

引证文献4

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部