摘要
基于深度神经网络的全频带语音增强系统面临着计算资源需求高以及语音在各频段分布不平衡的困难。本文提出了一种轻量级全频带网络模型。该模型在双路径卷积循环网络模型的基础上,利用可学习的频谱压缩映射对高频段频谱信息进行有效压缩,同时利用多头注意力机制对频域的全局信息进行建模。实验结果表明本文模型只需0.89×106的参数即可实现有效的全频带语音增强,验证了本文模型的有效性。
Deep neural network based full-band speech enhancement systems face challenges of high demand of computational resources and imbalanced frequency distribution.In this paper,a light-weight full-band model is proposed based on dual path convolutional recurrent network with two dedicated strategies,i.e.,a learnable spectral compression mapping for more effective high-band spectral information compression,and the utilization of the multi-head attention mechanism for more effective modeling of the global spectral pattern.Experiments validate the efficacy of the proposed strategies and show that the proposed model achieves competitive performance with only 0.89×106 parameters.
作者
胡沁雯
侯仲舒
乐笑怀
卢晶
HU Qinwen;HOU Zhongshu;LE Xiaohuai;LU Jing(Key Laboratory of Modern Acoustics,Institute of Acoustics,Nanjing University,Nanjing 210093,China)
出处
《数据采集与处理》
CSCD
北大核心
2023年第2期274-282,共9页
Journal of Data Acquisition and Processing
基金
国家自然科学基金(12274221)。
关键词
全频带语音增强
深度学习
多头注意力机制
full-band speech enhancement
deep learning
multi-head attention mechanism