期刊文献+

基于CNN-SVM性别组合分类的单通道语音分离

CNN-SVM Gender Combination Classification Based Single-channel Speech Separation
下载PDF
导出
摘要 实际语音分离时,混合语音的说话人性别组合相关信息往往是未知的。若直接在普适的模型上进行分离,语音分离效果欠佳。为了更好地进行语音分离,本文提出一种基于卷积神经网络-支持向量机(CNN-SVM)的性别组合判别模型,来确定混合语音的两个说话人是男-男、男-女还是女-女组合,以便选用相应性别组合的分离模型进行语音分离。为了弥补传统单一特征表征性别组合信息不足的问题,本文提出一种挖掘深度融合特征的策略,使分类特征包含更多性别组合类别的信息。本文的基于CNN-SVM性别组合分类的单通道语音分离方法,首先使用卷积神经网络挖掘梅尔频率倒谱系数和滤波器组特征的深度特征,融合这两种深度特征作为性别组合的分类特征,然后利用支持向量机对混合语音性别组合进行识别,最后选择对应性别组合的深度神经网络/卷积神经网络(DNN/CNN)模型进行语音分离。实验结果表明,与传统的单一特征相比,本文所提的深度融合特征可以有效提高混合语音性别组合的识别率;本文所提的语音分离方法在主观语音质量评估(PESQ)、短时客观可懂度(STOI)、信号失真比(SDR)指标上均优于普适的语音分离模型。 In actual speech separation,the information related to the speaker gender combination of mixed speech is often unknown.If the mixed speech is separated directly on the universal model,the performance of speech separation is not satisfactory.In order to better carry out speech separation,a gender combination discrimination model based on convolutional neural network(CNN)-support vector machine(SVM)was proposed in this paper,which determined that the gender group of mixture speech is male-male,male-female or female-female,so as to select the corresponding gender separation model for speech separation task.To make up for the lack of gender combination information represented by traditional single feature,a strategy of mining deep fusion features was also proposed,so that the classification features contained more information of gender combination categories.The proposed single-channel speech separation method based on CNN-SVM gender combination classification first used CNN to mine the deep features of Mel frequency cepstrum coefficients and filter bank features,and fused these two deep features as gender combination classification features.Then,SVM was used to recognize the gender combination of mixed speech.Finally,the deep neural network(DNN)or CNN model corresponding to gender combination was selected for speech separation.The experimental results show that compared with the traditional single feature,the deep fusion feature proposed can effectively improve the recognition rate of gender combination of mixed speech.In signal distortion ratio(SDR),perceptual evaluation of speech quality(PESQ)and short-time target intelligibility(STOI),the proposed speech separation method is superior to the universal speech separation model.
作者 孙林慧 张蒙 梁文清 SUN Linhui;ZHANG Meng;LIANG Wenqing(College of Telecommunications&Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing,Jiangsu 210003,China)
出处 《信号处理》 CSCD 北大核心 2022年第12期2519-2531,共13页 Journal of Signal Processing
基金 国家自然科学基金(61901227) 中国国家留学基金资助(202008320043)。
关键词 性别组合识别 卷积神经网络-支持向量机 单通道语音分离 深度特征 gender combination recognition convolutional neural network-support vector machine single-channel speech separation deep feature
  • 相关文献

参考文献7

二级参考文献79

  • 1Kim G, Lu Y, Hu Y, Loizou P C. An algorithm that im- proves speech intelligibility in noise for normal-hearing lis- teners. The Journal of the Acoustical Society of America, 2009, 126(3): 1486-1494.
  • 2Dillon H. Hearing Aids. New York: Thieme, 2001.
  • 3Allen J B. Articulation and intelligibility. Synthesis Lectures on Speech and Audio Processing, 2005, 1(1): 1-124.
  • 4Seltzer M L, Raj B, Stern R M. A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition. Speech Communication, 2004, 43(4): 379-393.
  • 5Weninger F, Erdogan H, Watanabe S, Vincent E, Le Roux J, Hershey J R, Schuller B. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation. Liberec, Czech Republic: Springer International Publishing, 2015.91 -99.
  • 6Weng C, Yu D, Seltzer M L, Droppo J. Deep neural networks for single-channel multi-talker speech recognition. IEEE/ ACM Transactions on Audio, Speech, and Language Pro- cessing, 2015, 23(10): 1670-1679.
  • 7Boll S F. Suppression of acoustic noise in speech using spec- tral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979, 27(2): 113-120.
  • 8Chen J D, Benesty J, Huang Y T, Doclo S. New insights into the noise reduction wiener filter. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4): 1218 -1234.
  • 9Loizou P C. Speech Enhancement: Theory and Practice. New York: CRC Press, 2007.
  • 10Liang S, Liu W J, Jiang W. A new Bayesian method incor- porating with local correlation for IBM estimation. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(3): 476-487.

共引文献96

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部