期刊文献+

基于注意力门控膨胀卷积网络的单通道语音增强 被引量:7

Monaural Speech Enhancement Based on Attention-Gate Dilated Convolution Network
下载PDF
导出
摘要 在有监督语音增强任务中,上下文信息对目标语音的估计产生重要影响,为了获取更加丰富的语音全局相关特征,该文以尽可能小的参数为前提,设计了一种新型卷积网络来进行语音增强。所提网络包含编码层、传输层与解码层3个部分:编解码部分提出一种2维非对称膨胀残差(2D-ADR)模块,其能明显减小训练参数并扩大感受野,提升网络对上下文信息的获取能力;传输层提出一种1维门控膨胀残差(1D-GDR)模块,该模块结合膨胀卷积、残差学习与门控机制,能够选择性传递特征并获取更多时序相关信息,同时采用密集跳跃连接的方式对8个1D-GDR模块进行堆叠,以增强层间信息流动并提供更多梯度传播方式;最后,对相应编解码层进行跳跃连接并引入注意力机制,以使解码过程获得更加鲁棒的底层特征。实验部分,使用了不同的参数设置以及对比方法来验证网络的有效性与鲁棒性,通过在28种噪声环境下训练及测试,相比于其他方法,该文方法以1.25×10^(6)的参数取得了更优的客观和主观指标,具备较强的增强效果与泛化能力。 In supervised speech enhancement,contextual information has an important influence on the estimation of target speech.In order to obtain richer global related features of speech,a new convolution network for speech enhancement on the premise of the smallest possible parameters is designed in this paper.The proposed network contains three parts:encode layer,transfer layer and decode layer.The encode and decode part propose a Two-Dimensional Asymmetric Dilated Residual(2D-ADR)module,which can significantly reduce training parameters and expand the receptive field,and improve the model’s ability to obtain contextual information.The transfer layer proposes a One-Dimensional Gating Dilated Residual(1DGDR)module,which combines dilated convolution,residual learning and gating mechanism to transfer selectively features and obtain more time-related information.Moreover,the eight 1D-GDR modules are stacked by a dense skip-connection way to enhance the information flow between layers and provide more gradient propagation path.Finally,the corresponding encode and decode layer is connected by skip-connection and attention mechanism is introduced to make the decoding process obtain more robust underlying features.In the experimental part,different parameter settings and comparison methods are used to verify the effectiveness and robustness of the network.By training and testing under 28 kinds of noise,compared with other methods,the proposed method has achieved better objective and subjective metrics with 1.25 million parameters,and has better enhancement effect and generalization ability.
作者 张天骐 柏浩钧 叶绍鹏 刘鉴兴 ZHANG Tianqi;BAI Haojun;YE Shaopeng;LIU Jianxing(School of Communication and Information Engineering,Chongqing University of Posts and Telecommunications(CQUPT),Chongqing 400065,China)
出处 《电子与信息学报》 EI CSCD 北大核心 2022年第9期3277-3288,共12页 Journal of Electronics & Information Technology
基金 国家自然科学基金(61671095,61702065,61701067,61771085) 信号与信息处理重庆市市级重点实验室建设项目(CSTC2009CA2003) 重庆市自然科学基金(cstc2021jcyj-msxmX0836)。
关键词 语音增强 膨胀卷积 残差学习 门控机制 注意力机制 Speech enhancement Dilated convolution Residual learning Gate mechanism Attention mechanism
  • 相关文献

同被引文献119

引证文献7

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部