基于CNN-SVM性别组合分类的单通道语音分离被引量：1

CNN-SVM Gender Combination Classification Based Single-channel Speech Separation

下载PDF

导出

摘要实际语音分离时,混合语音的说话人性别组合相关信息往往是未知的。若直接在普适的模型上进行分离,语音分离效果欠佳。为了更好地进行语音分离,本文提出一种基于卷积神经网络-支持向量机(CNN-SVM)的性别组合判别模型,来确定混合语音的两个说话人是男-男、男-女还是女-女组合,以便选用相应性别组合的分离模型进行语音分离。为了弥补传统单一特征表征性别组合信息不足的问题,本文提出一种挖掘深度融合特征的策略,使分类特征包含更多性别组合类别的信息。本文的基于CNN-SVM性别组合分类的单通道语音分离方法,首先使用卷积神经网络挖掘梅尔频率倒谱系数和滤波器组特征的深度特征,融合这两种深度特征作为性别组合的分类特征,然后利用支持向量机对混合语音性别组合进行识别,最后选择对应性别组合的深度神经网络/卷积神经网络(DNN/CNN)模型进行语音分离。实验结果表明,与传统的单一特征相比,本文所提的深度融合特征可以有效提高混合语音性别组合的识别率;本文所提的语音分离方法在主观语音质量评估(PESQ)、短时客观可懂度(STOI)、信号失真比(SDR)指标上均优于普适的语音分离模型。 In actual speech separation,the information related to the speaker gender combination of mixed speech is often unknown.If the mixed speech is separated directly on the universal model,the performance of speech separation is not satisfactory.In order to better carry out speech separation,a gender combination discrimination model based on convolutional neural network(CNN)-support vector machine(SVM)was proposed in this paper,which determined that the gender group of mixture speech is male-male,male-female or female-female,so as to select the corresponding gender separation model for speech separation task.To make up for the lack of gender combination information represented by traditional single feature,a strategy of mining deep fusion features was also proposed,so that the classification features contained more information of gender combination categories.The proposed single-channel speech separation method based on CNN-SVM gender combination classification first used CNN to mine the deep features of Mel frequency cepstrum coefficients and filter bank features,and fused these two deep features as gender combination classification features.Then,SVM was used to recognize the gender combination of mixed speech.Finally,the deep neural network(DNN)or CNN model corresponding to gender combination was selected for speech separation.The experimental results show that compared with the traditional single feature,the deep fusion feature proposed can effectively improve the recognition rate of gender combination of mixed speech.In signal distortion ratio(SDR),perceptual evaluation of speech quality(PESQ)and short-time target intelligibility(STOI),the proposed speech separation method is superior to the universal speech separation model.

作者孙林慧张蒙梁文清 SUN Linhui;ZHANG Meng;LIANG Wenqing(College of Telecommunications&Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing,Jiangsu 210003,China)

机构地区南京邮电大学通信与信息工程学院

出处《信号处理》 CSCD 北大核心 2022年第12期2519-2531,共13页 Journal of Signal Processing

基金国家自然科学基金(61901227) 中国国家留学基金资助(202008320043)。

关键词性别组合识别卷积神经网络-支持向量机单通道语音分离深度特征 gender combination recognition convolutional neural network-support vector machine single-channel speech separation deep feature

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献7

1刘文举,聂帅,梁山,张学良.基于深度学习语音分离技术的研究现状与进展[J].自动化学报,2016,42(6):819-833. 被引量：70
2孙林慧,吴子皓,谢可丽,李平安.基于双层字典学习的单通道语音增强方法[J].信号处理,2020,36(6):1001-1012. 被引量：3
3范存航,刘斌,陶建华,温正棋,易江燕.一种基于卷积神经网络的端到端语音分离方法[J].信号处理,2019,35(4):542-548. 被引量：13
4谢福仕,康迂勇,施明月,郑能恒.基于多目标联合优化的语音增强方法研究[J].信号处理,2021,37(10):1996-2003. 被引量：2
5王志杰,张学良.基于双路径循环神经网络的单通道语音增强[J].信号处理,2021,37(10):1872-1879. 被引量：9
6王涛,全海燕.基于生成对抗网络联合训练的语音分离方法[J].信号处理,2020,36(6):1013-1019. 被引量：3
7胡涛,张超,程炳,吴小培.卷积神经网络在异常声音识别中的研究[J].信号处理,2018,34(3):357-367. 被引量：19

二级参考文献79

1Kim G, Lu Y, Hu Y, Loizou P C. An algorithm that im- proves speech intelligibility in noise for normal-hearing lis- teners. The Journal of the Acoustical Society of America, 2009, 126(3): 1486-1494.
2Dillon H. Hearing Aids. New York: Thieme, 2001.
3Allen J B. Articulation and intelligibility. Synthesis Lectures on Speech and Audio Processing, 2005, 1(1): 1-124.
4Seltzer M L, Raj B, Stern R M. A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition. Speech Communication, 2004, 43(4): 379-393.
5Weninger F, Erdogan H, Watanabe S, Vincent E, Le Roux J, Hershey J R, Schuller B. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation. Liberec, Czech Republic: Springer International Publishing, 2015.91 -99.
6Weng C, Yu D, Seltzer M L, Droppo J. Deep neural networks for single-channel multi-talker speech recognition. IEEE/ ACM Transactions on Audio, Speech, and Language Pro- cessing, 2015, 23(10): 1670-1679.
7Boll S F. Suppression of acoustic noise in speech using spec- tral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979, 27(2): 113-120.
8Chen J D, Benesty J, Huang Y T, Doclo S. New insights into the noise reduction wiener filter. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4): 1218 -1234.
9Loizou P C. Speech Enhancement: Theory and Practice. New York: CRC Press, 2007.
10Liang S, Liu W J, Jiang W. A new Bayesian method incor- porating with local correlation for IBM estimation. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(3): 476-487.