期刊文献+

基于卷积神经网络的时域语音盲分离方法研究 被引量:1

The Research of Time-Domain Speech Separation with Blind Source Based on Convolutional Neural Network
下载PDF
导出
摘要 已有的语音分离方法大多都是通过混合信号的频域表示来处理分离问题,然而这些方法一直存在着包括信号的相位与幅度的解耦、语音分离时频表示的次优性以及计算频谱的高时间延迟等问题.为了探索处理上述问题的方法,在原有卷积时域网络(Conv-TasNet)的卷积运算中对语音信号的长期依赖性进行了重新建模.为了弥补零填充导致的有效数据损失,新的时间卷积块会采取以递补数据代替零填充以保持输入输出长度一致,用有效数据代替卷积中的零填充来增加底层片段两端的卷积参与率,并减少相邻语音片段的20%层叠部分以减少计算量.改进后的模块用于分离两说话人的混合语音,得到的目标语音在信噪比方面比原方法改善了0.6%,相对于已有的时频掩蔽方法在性能相近的前提下其模型缩小为时频掩蔽方法的五分之一. Most of the previous speech separation methods deal with the separation problem through the frequency domain representation of the mixed signals.However,these methods always have some problems,such as the decoupling of the signal phase and amplitude,the suboptimal time-frequency representation of speech separation,and the high time delay of spectrum calculation.In order to explore the method to deal with the above problems,this paper re-models the long-term dependence of speech signal in the convolution operation of the original convolution time-domain network(conv-tasnet).In order to compensate for the loss of valid data caused by zero padding,the new time convolution block will replace the original zero padding with the supplement data to maintain the length of input and output.Use valid data instead of zero padding in convolution to increase the convolutional participation rate at both ends of the bottom fragments,and reduce the 20%overlapping part of adjacent speech segments to reduce the amount of calculation.The improved module is used to separate the mixed speech of two speakers,and the SNR of the target speech is improved by 0.6%compared with the original method.Compared with the previous time-frequency masking method,the model size is reduced to one fifth of the original one under the same performance.
作者 景源 孙浩源 JING Yuan;SUN Hao-yuan(College of Information,Liaoning University,Shenyang 110036,China)
出处 《辽宁大学学报(自然科学版)》 CAS 2021年第3期204-214,共11页 Journal of Liaoning University:Natural Sciences Edition
关键词 语音分离 深度神经网络 端到端模型 时间卷积网络 时域 递补填充 speech separation deep neural network end-to-end model temporal convolutional network time-domain supplement padding
  • 相关文献

同被引文献3

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部