摘要
近年来,卷积神经网络在语音增强任务中得到了广泛的应用.然而,目前广泛使用的跳跃连接机制在特征信息传输时会引入噪声成分,从而不可避免地降低了去噪性能;除此之外,普遍使用的固定形状的卷积核在处理各种声纹信息时效率低下,基于上述考虑,提出了一种跨维度协同注意力机制和形变卷积模块的端到端编-解码器网络CADNet.具体来说,在跳跃连接中引入跨维度协同注意力模块,进一步提高信息控制能力.并且在每个标准卷积层之后引入形变卷积层,从而更好地匹配声纹的自然特征.在TIMIT公开数据集上进行的实验验证了所提出的方法在语音质量和可懂度的评价指标方面的有效性.
Monaural speech enhancement aims to recover clean speech from complex noise scenes,thus improving the quality of the noise-corrupted voice signals.This problem has been studied for decades.In recent years,convolutional encoder-decoder neural networks have been widely used in speech enhancement tasks.The convolutional models reflect strong correlations of speech in time and can extract important voiceprint features.However,two challenges still remain.Firstly,skip connection mechanisms widely used in recent state-of-the-art methods introduce noise components in the transmission of feature information,which degrades the denoising performance inevitably;Secondly,widely used standard fix-shaped convolution kernels are inefficient of dealing with various voiceprints due to their limitation of receptive field.Taking into consideration the above concerns,we propose a novel end-to-end encoder-decoder-based network CADNet that incorporates the cross-dimensional collaborative attention mechanism and deformable convolution modules.In specific,we insert cross-dimensional collaborative attention blocks into skip connections to further facilitate the ability of voice information control.In addition,we introduce a deformable convolution layer after each standard convolution layer in order to better match the natural characteristics of voiceprints.Experiments conducted on the TIMIT open corpus verify the effectiveness of the proposed architecture in terms of objective intelligibility and quality metrics.
作者
康宏博
冯雨佳
台文鑫
蓝天
吴祖峰
刘峤
Kang Hongbo;Feng Yujia;Tai Wenxin;Lan Tian;Wu Zufeng;Liu Qiao(School of Information and Software Engineering,University of Electronic Science and Technology of China,Chengdu 610054;School of Computer Science and Engineering,University of Electronic Science and Technology of China,Chengdu 611731)
出处
《计算机研究与发展》
EI
CSCD
北大核心
2023年第7期1639-1648,共10页
Journal of Computer Research and Development
基金
国家自然科学基金项目(U19B2028)
国家科技重大专项(2021YFC3330403)
中国电子科技集团54所开放课题(201148)
攀钢集团有限公司开放课题(211129)。
关键词
语音增强
自注意力
跨维度协同注意力
形变卷积
跳跃连接
speech enhancement
self-attention
cross-dimensional collaborative attention
deformable convolution
skip connection