摘要
单通道语音分离主要采用循环神经网络或卷积神经网络对语音序列建模,但这些方法都存在对较长停顿的语音序列建模困难的问题。提出一种双路径多尺度多层感知混合分离网络(DPMNet)去解决这个问题。提出多尺度上下文感知建模方法,将三个不同时间尺度的输入通道特征融合。与传统的方法相比,加入全连接层以弱化噪音的干扰,卷积和全连接的交叉融合增加了模型的感受野,强化了长序列建模能力。实验表明,这种双路径多尺度混合感知的方案拥有更少的参数,在Libri2mix及其实验嘈杂的版本WHAM!,以及课堂真实数据的ICSSD都表明DPMNet始终优于其他先进的模型。
Single-channel speech separation mainly uses recurrent neural networks or convolutional neural networks to model speech sequences,but these methods all have the problem of difficulty in modeling speech sequences with longer pauses.A dual-path multi-scale multi-layer perceptual hybrid separation network(DPMNet)is proposed to solve this problem.A multi-scale context-aware modeling method is proposed to fuse the input channel features of three different time scales.Compared with the traditional method,adding the fully connected layer could weaken the interference of noise.And the cross-fusion of convolution and fully connected increases the receptive field of the model and strengthens the modeling ability of long sequences.Experiments show that this dual-path multi-scale hybrid perceptual scheme has a fewer parameters.In Libri2mix and its experimental noisy version WHAM!,as well as ICSSD on real classroom data show that DPMNet consistently outperforms other advanced models.
作者
刘雄涛
周书民
方江雄
LIU Xiongtao;ZHOU Shumin;FANG Jiangxiong(Jiangxi Engineering Research Center of Process and Equipment for New Energy,East China University of Technology,Nanchang 330013,China;School of Electronics and Information Engineering,Taizhou University,Taizhou 318000,China)
出处
《现代信息科技》
2023年第1期8-13,共6页
Modern Information Technology
基金
国家自然科学基金项目(61966001,61866001,62163004,61866016,62206195)。
关键词
多尺度上下文建模
混合感知
全连接层
双路径网络
语音分离
multi-scale context modeling
hybrid perception
fully connected layer
dual-path network
speech separation