一种基于聚类的门控卷积网络语声分离方法

Clustering-based speech separation method for gated convolutional networks

下载PDF

导出

摘要基于深度聚类的语声分离方法已被证明能有效地解决混合语声中说话人输出标签排列的问题,然而,现有关于聚类进行说话人分离方法,大多数是优化嵌入使每个源的重建误差最小化。该文以时域卷积网络为基础网络设计了一种改进基于聚类的门控卷积语声分离方法,在时域上通过堆叠的门控卷积网络,实现端到端深度聚类的源分离。该框架将非线性门控激活用于时域卷积网络中,提取语声信号的深层次特征;同时在高维特征空间中聚类对语声信号的特征进行表示和划分,为恢复不同信号源提供了一个长期的说话者表示信息。该框架解决了说话人输出标签排列问题并对语声信号的长期依赖性进行建模。通过华尔街日报数据集进行实验得出,该方法在信号失真比和尺度不变信号噪声比指标上分别达到了16.72 dB和16.33 dB的效果。 Deep clustering-based speech separation methods have been shown to be effective in solving the problem of speaker output label alignment in mixed speech,however,most of the existing methods on clustering for speaker separation optimize the embedding to minimize the reconstruction error of each source.In this paper,we design an improved gate-convolutional cluster speech separation method based on the time-domain convolutional network as the base network.The framework uses nonlinear gated activation in time-domain convolutional networks to extract deep features of speech signals;and clustering in a high-dimensional feature space to represent and segment the features of speech signals,providing a long-term speaker representation information for recovering different sources.The framework solves the speaker output label alignment problem and models the long-term dependency of speech signals.Experiments with the Wall Street Journal dataset yield that the method achieves 16.72 dB and 16.33 dB in the signal distortion ratio and scale invariant signal-to-noise ratio metrics,respectively.

作者罗宇胡维平吴华楠 LUO Yu;HU Weiping;WU Huanan(Electronic Engineering,Guangxi Normal University,Guilin 541000,China)

机构地区广西师范大学电子工程学院

出处《应用声学》 CSCD 北大核心 2023年第5期1099-1105,共7页 Journal of Applied Acoustics

基金国家自然科学基金项目(NSFC 61861005)。

关键词深度聚类门控卷积语声分离 Deep clustering Gated convolution Speech separation

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献4

1刘航,李扬,袁浩期,王俊影.基于生成对抗网络的语音信号分离[J].计算机工程,2020,46(1):302-308. 被引量：6
2黄雅婷,石晶,许家铭,徐波.鸡尾酒会问题与相关听觉模型的研究现状与展望[J].自动化学报,2019,45(2):234-251. 被引量：24
3郝敏,刘航,李扬,简单,王俊影.基于聚类分析与说话人识别的语音跟踪[J].计算机与现代化,2020,0(4):7-13. 被引量：4
4刘文举,聂帅,梁山,张学良.基于深度学习语音分离技术的研究现状与进展[J].自动化学报,2016,42(6):819-833. 被引量：70

二级参考文献79

1Kim G, Lu Y, Hu Y, Loizou P C. An algorithm that im- proves speech intelligibility in noise for normal-hearing lis- teners. The Journal of the Acoustical Society of America, 2009, 126(3): 1486-1494.
2Dillon H. Hearing Aids. New York: Thieme, 2001.
3Allen J B. Articulation and intelligibility. Synthesis Lectures on Speech and Audio Processing, 2005, 1(1): 1-124.
4Seltzer M L, Raj B, Stern R M. A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition. Speech Communication, 2004, 43(4): 379-393.
5Weninger F, Erdogan H, Watanabe S, Vincent E, Le Roux J, Hershey J R, Schuller B. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation. Liberec, Czech Republic: Springer International Publishing, 2015.91 -99.
6Weng C, Yu D, Seltzer M L, Droppo J. Deep neural networks for single-channel multi-talker speech recognition. IEEE/ ACM Transactions on Audio, Speech, and Language Pro- cessing, 2015, 23(10): 1670-1679.
7Boll S F. Suppression of acoustic noise in speech using spec- tral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979, 27(2): 113-120.
8Chen J D, Benesty J, Huang Y T, Doclo S. New insights into the noise reduction wiener filter. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4): 1218 -1234.
9Loizou P C. Speech Enhancement: Theory and Practice. New York: CRC Press, 2007.
10Liang S, Liu W J, Jiang W. A new Bayesian method incor- porating with local correlation for IBM estimation. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(3): 476-487.

共引文献94

1李艳生,刘园,张毅,杨美美.混响环境下移动机器人语音控制方法及系统实现[J].仪器仪表学报,2019,40(11):165-171. 被引量：15
2杨海龙,曾祥福,钟维良.多尺度时域单通道语音分离网络设计[J].电声技术,2021,45(10):96-99.
3黄张翼,周翊,舒晓峰,刘宏清.联合贝叶斯估计与深度神经网络的语音增强方法[J].小型微型计算机系统,2019,40(1):40-44. 被引量：5
4吕菲,夏秀渝.基于方位特征的听觉选择性注意计算模型研究[J].自动化学报,2017,43(4):634-644. 被引量：5
5支艳利,张云伟.基于环形麦克风阵列的远场语音识别系统[J].微型电脑应用,2017,33(4):62-64. 被引量：2
6王程,周婉,何军.面向自动音乐生成的深度递归神经网络方法[J].小型微型计算机系统,2017,38(10):2412-2416. 被引量：14
7袁文浩,孙文珠,夏斌,欧世峰.利用深度卷积神经网络提高未知噪声下的语音增强性能[J].自动化学报,2018,44(4):751-759. 被引量：39
8凌佳佳,袁晓兵.联合噪声分类和掩码估计的语音增强方法[J].电子设计工程,2018,26(17):30-34. 被引量：3
9袁文浩,梁春燕,夏斌,孙文珠.一种融合相位估计的深度卷积神经网络语音增强方法[J].电子学报,2018,46(10):2359-2366. 被引量：7
10时文华,倪永婧,张雄伟,邹霞,孙蒙,闵刚.联合稀疏非负矩阵分解和神经网络的语音增强[J].计算机研究与发展,2018,55(11):2430-2438. 被引量：9

1黄银藏.先天性缺牙患者采用口腔修复治疗临床效果探究[J].中文科技期刊数据库（全文版）医药卫生,2022(7):66-69.
2康坊,杨飞然,杨军.子带t分布的快速独立向量分析在语声盲源分离中的应用[J].应用声学,2022,41(2):173-181. 被引量：3
3朱应俊,周文君,朱川,马建敏.注意力机制融合前端网络中间层的语声情感识别[J].应用声学,2023,42(5):1090-1098.
4向芝谊.“一带一路”对外话语体系优化面向[J].中国出版,2023(16):28-32. 被引量：2
5蔡银盛,苏靖琳.血浆置换对血常规检测影响参数的验证[J].中文科技期刊数据库（引文版）医药卫生,2022(10):67-70.
6赵煜,韩旭昊.基于CEEMDAN-LSTM组合的兰州空气质量指数预测[J].安徽师范大学学报（自然科学版）,2023,46(5):433-439. 被引量：1
7田玉静,左红伟,王超.语声通信降噪研究[J].应用声学,2020,39(6):932-939. 被引量：1
8何静,张一凡.基于长短期记忆网络的股票走势及预测——以国内外文旅元宇宙产业股票为例[J].现代商业,2023(14):130-133. 被引量：1
9王玫,李江和,宋浠瑜,刘小娟.基于轻量级卷积门控循环神经网络的语声增强方法[J].应用声学,2023,42(3):652-658. 被引量：1
10外媒[J].中国报道,2023(9):10-10.

应用声学

2023年第5期

浏览历史

内容加载中请稍等...

一种基于聚类的门控卷积网络语声分离方法

参考文献4

二级参考文献79

共引文献94

相关作者

相关机构

相关主题

浏览历史