期刊文献+

一种基于聚类的门控卷积网络语声分离方法

Clustering-based speech separation method for gated convolutional networks
下载PDF
导出
摘要 基于深度聚类的语声分离方法已被证明能有效地解决混合语声中说话人输出标签排列的问题,然而,现有关于聚类进行说话人分离方法,大多数是优化嵌入使每个源的重建误差最小化。该文以时域卷积网络为基础网络设计了一种改进基于聚类的门控卷积语声分离方法,在时域上通过堆叠的门控卷积网络,实现端到端深度聚类的源分离。该框架将非线性门控激活用于时域卷积网络中,提取语声信号的深层次特征;同时在高维特征空间中聚类对语声信号的特征进行表示和划分,为恢复不同信号源提供了一个长期的说话者表示信息。该框架解决了说话人输出标签排列问题并对语声信号的长期依赖性进行建模。通过华尔街日报数据集进行实验得出,该方法在信号失真比和尺度不变信号噪声比指标上分别达到了16.72 dB和16.33 dB的效果。 Deep clustering-based speech separation methods have been shown to be effective in solving the problem of speaker output label alignment in mixed speech,however,most of the existing methods on clustering for speaker separation optimize the embedding to minimize the reconstruction error of each source.In this paper,we design an improved gate-convolutional cluster speech separation method based on the time-domain convolutional network as the base network.The framework uses nonlinear gated activation in time-domain convolutional networks to extract deep features of speech signals;and clustering in a high-dimensional feature space to represent and segment the features of speech signals,providing a long-term speaker representation information for recovering different sources.The framework solves the speaker output label alignment problem and models the long-term dependency of speech signals.Experiments with the Wall Street Journal dataset yield that the method achieves 16.72 dB and 16.33 dB in the signal distortion ratio and scale invariant signal-to-noise ratio metrics,respectively.
作者 罗宇 胡维平 吴华楠 LUO Yu;HU Weiping;WU Huanan(Electronic Engineering,Guangxi Normal University,Guilin 541000,China)
出处 《应用声学》 CSCD 北大核心 2023年第5期1099-1105,共7页 Journal of Applied Acoustics
基金 国家自然科学基金项目(NSFC 61861005)。
关键词 深度聚类 门控卷积 语声分离 Deep clustering Gated convolution Speech separation
  • 相关文献

参考文献4

二级参考文献79

  • 1Kim G, Lu Y, Hu Y, Loizou P C. An algorithm that im- proves speech intelligibility in noise for normal-hearing lis- teners. The Journal of the Acoustical Society of America, 2009, 126(3): 1486-1494.
  • 2Dillon H. Hearing Aids. New York: Thieme, 2001.
  • 3Allen J B. Articulation and intelligibility. Synthesis Lectures on Speech and Audio Processing, 2005, 1(1): 1-124.
  • 4Seltzer M L, Raj B, Stern R M. A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition. Speech Communication, 2004, 43(4): 379-393.
  • 5Weninger F, Erdogan H, Watanabe S, Vincent E, Le Roux J, Hershey J R, Schuller B. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation. Liberec, Czech Republic: Springer International Publishing, 2015.91 -99.
  • 6Weng C, Yu D, Seltzer M L, Droppo J. Deep neural networks for single-channel multi-talker speech recognition. IEEE/ ACM Transactions on Audio, Speech, and Language Pro- cessing, 2015, 23(10): 1670-1679.
  • 7Boll S F. Suppression of acoustic noise in speech using spec- tral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979, 27(2): 113-120.
  • 8Chen J D, Benesty J, Huang Y T, Doclo S. New insights into the noise reduction wiener filter. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4): 1218 -1234.
  • 9Loizou P C. Speech Enhancement: Theory and Practice. New York: CRC Press, 2007.
  • 10Liang S, Liu W J, Jiang W. A new Bayesian method incor- porating with local correlation for IBM estimation. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(3): 476-487.

共引文献94

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部