期刊文献+

基于TasNet的单通道语音分离技术的研究综述 被引量:1

Research Review of Single-channel Speech Separation Technology Based on TasNet
下载PDF
导出
摘要 语音分离是声学信号处理中的一项基本任务,具有广泛的应用。得益于深度学习的发展,近年来单通道语音分离系统的性能有了显着提升。特别是,随着一种被称为时域音频网络(Time-domain audio separation Network,TasNet)的新语音分离方法被提出,语音分离技术的研究也逐步从基于时-频域的传统方法过渡至基于时域的方法。本文综述基于TasNet的单通道语音分离技术的研究现状与展望。在回顾基于时-频域的语音分离传统方法之后,本文重点介绍基于TasNet的Conv-TasNet模型以及DPRNN模型,并对比针对各模型的改进研究。最后,本文阐述目前基于TasNet的单通道语音分离模型的局限性,并从模型、数据集、说话人数量以及如何解决复杂场景下的语音分离等层面对未来的研究方向进行讨论。 Speech separation is a fundamental task in acoustic signal processing with a wide range of applications.Thanks to the development of deep learning,the performance of single-channel speech separation systems has been significantly improved in recent years.In particular,with the introduction of a new speech separation method called time-domain audio separation network(TasNet),speech separation technology is also gradually transitioning from the traditional method based on time-frequency domain to the one based on time domain methods.This paper reviews the research status and prospect of single-channel speech separation technology based on TasNet.After reviewing the traditional methods of speech separation based on time-frequency domain,this paper focuses on the TasNet-based Conv-TasNet model and DPRNN model,and compares the improvement research on each model.Finally,this paper expounds the limitations of the current single-channel speech separation model based on TasNet,and discusses future research directions from the aspects of model,dataset,number of speakers,and how to solve speech separation in complex scenarios.
作者 陆炜 朱定局 LU Wei;ZHU Ding-ju(School of Computer Science,South China Normal University,Guangzhou 510631,China)
出处 《计算机与现代化》 2022年第11期119-126,共8页 Computer and Modernization
基金 国家自然科学基金重点项目资助(U18112000)。
关键词 语音分离 时域音频网络 全卷积时域音频网络 双路径循环神经网络 speech separation TasNet Conv-TasNet DPRNN
  • 相关文献

参考文献2

二级参考文献67

  • 1Kim G, Lu Y, Hu Y, Loizou P C. An algorithm that im- proves speech intelligibility in noise for normal-hearing lis- teners. The Journal of the Acoustical Society of America, 2009, 126(3): 1486-1494.
  • 2Dillon H. Hearing Aids. New York: Thieme, 2001.
  • 3Allen J B. Articulation and intelligibility. Synthesis Lectures on Speech and Audio Processing, 2005, 1(1): 1-124.
  • 4Seltzer M L, Raj B, Stern R M. A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition. Speech Communication, 2004, 43(4): 379-393.
  • 5Weninger F, Erdogan H, Watanabe S, Vincent E, Le Roux J, Hershey J R, Schuller B. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation. Liberec, Czech Republic: Springer International Publishing, 2015.91 -99.
  • 6Weng C, Yu D, Seltzer M L, Droppo J. Deep neural networks for single-channel multi-talker speech recognition. IEEE/ ACM Transactions on Audio, Speech, and Language Pro- cessing, 2015, 23(10): 1670-1679.
  • 7Boll S F. Suppression of acoustic noise in speech using spec- tral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979, 27(2): 113-120.
  • 8Chen J D, Benesty J, Huang Y T, Doclo S. New insights into the noise reduction wiener filter. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4): 1218 -1234.
  • 9Loizou P C. Speech Enhancement: Theory and Practice. New York: CRC Press, 2007.
  • 10Liang S, Liu W J, Jiang W. A new Bayesian method incor- porating with local correlation for IBM estimation. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(3): 476-487.

共引文献85

同被引文献10

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部