基于TasNet的单通道语音分离技术的研究综述被引量：1

Research Review of Single-channel Speech Separation Technology Based on TasNet

下载PDF

导出

摘要语音分离是声学信号处理中的一项基本任务,具有广泛的应用。得益于深度学习的发展,近年来单通道语音分离系统的性能有了显着提升。特别是,随着一种被称为时域音频网络(Time-domain audio separation Network,TasNet)的新语音分离方法被提出,语音分离技术的研究也逐步从基于时-频域的传统方法过渡至基于时域的方法。本文综述基于TasNet的单通道语音分离技术的研究现状与展望。在回顾基于时-频域的语音分离传统方法之后,本文重点介绍基于TasNet的Conv-TasNet模型以及DPRNN模型,并对比针对各模型的改进研究。最后,本文阐述目前基于TasNet的单通道语音分离模型的局限性,并从模型、数据集、说话人数量以及如何解决复杂场景下的语音分离等层面对未来的研究方向进行讨论。 Speech separation is a fundamental task in acoustic signal processing with a wide range of applications.Thanks to the development of deep learning,the performance of single-channel speech separation systems has been significantly improved in recent years.In particular,with the introduction of a new speech separation method called time-domain audio separation network(TasNet),speech separation technology is also gradually transitioning from the traditional method based on time-frequency domain to the one based on time domain methods.This paper reviews the research status and prospect of single-channel speech separation technology based on TasNet.After reviewing the traditional methods of speech separation based on time-frequency domain,this paper focuses on the TasNet-based Conv-TasNet model and DPRNN model,and compares the improvement research on each model.Finally,this paper expounds the limitations of the current single-channel speech separation model based on TasNet,and discusses future research directions from the aspects of model,dataset,number of speakers,and how to solve speech separation in complex scenarios.

作者陆炜朱定局 LU Wei;ZHU Ding-ju(School of Computer Science,South China Normal University,Guangzhou 510631,China)

机构地区华南师范大学计算机学院

出处《计算机与现代化》 2022年第11期119-126,共8页 Computer and Modernization

基金国家自然科学基金重点项目资助(U18112000)。

关键词语音分离时域音频网络全卷积时域音频网络双路径循环神经网络 speech separation TasNet Conv-TasNet DPRNN

分类号 TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献2

1黄雅婷,石晶,许家铭,徐波.鸡尾酒会问题与相关听觉模型的研究现状与展望[J].自动化学报,2019,45(2):234-251. 被引量：23
2刘文举,聂帅,梁山,张学良.基于深度学习语音分离技术的研究现状与进展[J].自动化学报,2016,42(6):819-833. 被引量：70

二级参考文献67

1Kim G, Lu Y, Hu Y, Loizou P C. An algorithm that im- proves speech intelligibility in noise for normal-hearing lis- teners. The Journal of the Acoustical Society of America, 2009, 126(3): 1486-1494.
2Dillon H. Hearing Aids. New York: Thieme, 2001.
3Allen J B. Articulation and intelligibility. Synthesis Lectures on Speech and Audio Processing, 2005, 1(1): 1-124.
4Seltzer M L, Raj B, Stern R M. A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition. Speech Communication, 2004, 43(4): 379-393.
5Weninger F, Erdogan H, Watanabe S, Vincent E, Le Roux J, Hershey J R, Schuller B. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation. Liberec, Czech Republic: Springer International Publishing, 2015.91 -99.
6Weng C, Yu D, Seltzer M L, Droppo J. Deep neural networks for single-channel multi-talker speech recognition. IEEE/ ACM Transactions on Audio, Speech, and Language Pro- cessing, 2015, 23(10): 1670-1679.
7Boll S F. Suppression of acoustic noise in speech using spec- tral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979, 27(2): 113-120.
8Chen J D, Benesty J, Huang Y T, Doclo S. New insights into the noise reduction wiener filter. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4): 1218 -1234.
9Loizou P C. Speech Enhancement: Theory and Practice. New York: CRC Press, 2007.
10Liang S, Liu W J, Jiang W. A new Bayesian method incor- porating with local correlation for IBM estimation. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(3): 476-487.

共引文献85

1李艳生,刘园,张毅,杨美美.混响环境下移动机器人语音控制方法及系统实现[J].仪器仪表学报,2019,40(11):165-171. 被引量：14
2杨海龙,曾祥福,钟维良.多尺度时域单通道语音分离网络设计[J].电声技术,2021,45(10):96-99.
3黄张翼,周翊,舒晓峰,刘宏清.联合贝叶斯估计与深度神经网络的语音增强方法[J].小型微型计算机系统,2019,40(1):40-44. 被引量：5
4吕菲,夏秀渝.基于方位特征的听觉选择性注意计算模型研究[J].自动化学报,2017,43(4):634-644. 被引量：5
5支艳利,张云伟.基于环形麦克风阵列的远场语音识别系统[J].微型电脑应用,2017,33(4):62-64. 被引量：2
6王程,周婉,何军.面向自动音乐生成的深度递归神经网络方法[J].小型微型计算机系统,2017,38(10):2412-2416. 被引量：14
7袁文浩,孙文珠,夏斌,欧世峰.利用深度卷积神经网络提高未知噪声下的语音增强性能[J].自动化学报,2018,44(4):751-759. 被引量：39
8凌佳佳,袁晓兵.联合噪声分类和掩码估计的语音增强方法[J].电子设计工程,2018,26(17):30-34. 被引量：3
9袁文浩,梁春燕,夏斌,孙文珠.一种融合相位估计的深度卷积神经网络语音增强方法[J].电子学报,2018,46(10):2359-2366. 被引量：7
10时文华,倪永婧,张雄伟,邹霞,孙蒙,闵刚.联合稀疏非负矩阵分解和神经网络的语音增强[J].计算机研究与发展,2018,55(11):2430-2438. 被引量：9

同被引文献10

1邓智恒,罗伟栋.实际语音盲分离客观评价指标研究[J].电声技术,2007,31(9):61-65. 被引量：3
2张晓冰,杨启亮,邢建春,韩德帅.面向软件模糊自适应的语音式任务目标识别与结构化转换[J].计算机工程,2018,44(4):59-65. 被引量：9
3郑皓,赵庶旭,屈睿涛.一种用于城市交通的优化声音识别仿真[J].计算机技术与发展,2019,29(2):60-64. 被引量：6
4陈修凯,陆志华,周宇.基于卷积编解码器和门控循环单元的语音分离算法[J].计算机应用,2020,40(7):2137-2141. 被引量：7
5张天,张天骐,葛宛营,喻盛琪.融合声源分离及反复结构模型的音乐分离方法[J].声学学报,2020,45(5):707-715. 被引量：3
6贾怡恬,杨淇善,贾懋珅,许文杰,鲍长春.利用概率混合模型的理想比率掩蔽多声源分离方法[J].信号处理,2021,37(10):1806-1815. 被引量：1
7张星池,胡进.基于自编码器的雷达信号联合预分选方法[J].电光与控制,2022,29(10):71-75. 被引量：1
8周帅,李理,彭章君,黄鹏程.基于多通道特征和混合注意力的环境声音分类[J].计算机技术与发展,2023,33(8):43-50. 被引量：1
9刘恺忻,付进,邹男,张光普,郝宇.利用协方差矩阵拟合的阵列孔径扩展方法[J].声学学报,2023,48(5):911-919. 被引量：3
10张凤,赵昆,蒋振伟,郑正奇,姚勇俊.基于视觉和UWB融合的室内定位方法研究[J].互联网周刊,2023(19):26-29. 被引量：1

引证文献1

1曾援,李剑,马明星,庞润嘉,贺斌.基于改进Transformer模型的多声源分离方法[J].计算机技术与发展,2024,34(5):60-65.

1Xi-Yong Yuan,Shao-Gui Deng,Zhi-Qiang Li,Xiao-Mei Han,Xu-Fei Hu.Deep-detection of formation boundary using transient multicomponent electromagnetic logging measurements[J].Petroleum Science,2022,19(3):1085-1098. 被引量：1
2胡传瞻,蒋林,朱筠,谢晓燕,王萍,杨坤.阵列处理器上一种基于DFGSP的分像素插值算法实现[J].计算机应用与软件,2022,39(10):49-53. 被引量：1
3王立羽,傅云飞.基于GPM与ERA5数据的北太平洋冬季风暴路径降水个例分析[J].暴雨灾害,2022,41(5):525-535. 被引量：2
4Chetna Tyagi,Ambika Devi.Alteration of structural, optical and electrical properties of CdSe incorporated polyvinyl pyrrolidone nanocomposite for memory devices[J].Journal of Advanced Dielectrics,2018,8(3):64-78.
5TING WANG,JI-LIANG WU,XU-CHENG ZHANG,YANG SHI,YUE-DE YANG,JIN-LONG XIAO,DA-MING ZHANG,GUAN-SHI QIN,YONG-ZHEN HUANG.Octave-spanning frequency comb generation based on a dual-mode microcavity laser[J].Photonics Research,2022,10(9):2107-2114. 被引量：1
6Emad Ewais,Ragab Mahani,Samy Mostafa,Adel Ahmedx.Doping effect of SrZrO_(3) on KNLN ceramics structure and their dielectric properties[J].Journal of Advanced Dielectrics,2018,8(5):28-34.
7Ning He,Zhentao Li,Changjun Hu,Zilin Chen.In situ synthesis of a spherical covalent organic framework as a stationary phase for capillary electrochromatography[J].Journal of Pharmaceutical Analysis,2022,12(4):610-616. 被引量：1
8Eunice Obamiro,Radhika Trivedi,Nasim Ahmed.Changes in trends of orthopedic services due to the COVID-19 pandemic: A review[J].World Journal of Orthopedics,2022,13(11):955-968.

计算机与现代化

2022年第11期

浏览历史

内容加载中请稍等...

基于TasNet的单通道语音分离技术的研究综述被引量：1

参考文献2

二级参考文献67

共引文献85

同被引文献10

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于TasNet的单通道语音分离技术的研究综述 被引量：1

参考文献2

二级参考文献67

共引文献85

同被引文献10

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于TasNet的单通道语音分离技术的研究综述被引量：1