复谱映射下融合高效Transformer的语音增强方法

Speech Enhancement Method Based on Complex Spectrum Mapping with Efficient Transformer

下载PDF

导出

摘要针对卷积神经网络(Convolutional Neural Network,CNN)过去在语音增强中表现优异但对全局特征捕获不足,以及Transformer近年展现出长序列间依赖优势但又存在局部细节特征丢失、参数量大等问题,该文为了充分利用CNN与Transformer的优势并弥补各自不足,提出了一种在复频谱映射下的新型卷积模块与高效Transformer融合的单通道语音增强网络。该网络由编码层、传输层与双分支解码层组成:在编解码部分设计了一种协作学习模块(Collaborative Learning Block,CLB)来监督交互信息,在减少参数量的同时提高主干网络对复特征的获取能力;传输层中则提出一种时频空间注意Transformer模块分别对语音子频带和全频带信息建模,充分利用声学特性来模拟局部频谱模式并捕获谐波间依赖关系。将该模块进一步与通道注意分支相结合,设计了一种可学习的双分支注意融合(Dual-branch Attention Fusion,DAF)机制,从空间-通道角度提取上下文特征以加强信息的多维度传输;最后,在此基础上搭建一种高斯加权渐进网络作为中间传输层,通过堆叠DAF模块进行加权求和后输出以充分利用深层特征,使得解码过程更具鲁棒性。分别在英文VoiceBank-DEMAND数据集、中文THCHS30语料库与115种环境噪声下进行消融以及综合对比实验,结果表明,该文方法仅以最小0.68×10^(6)的参数量,相比于大部分最新相关网络模型取得了更优的主、客观指标,具有较为突出的增强性能与泛化能力。 Convolution neural networks(CNNs)perform well in speech enhancement but fail to capture global features.In addition,in recent years,a transformer has shown the advantage of long sequence dependence,but there have been problems such as local detail feature loss and a large number of parameters.In order to make full use of the advantages of a CNN and transformer and make up for their shortcomings,this study investigated a single channel speech enhancement network based on the fusion of novel convolutional modules and an efficient transformer under complex spectrum mapping.The network was composed of a coding layer,transmission layer,and double-branch decoding layer.In the codec part,a collaborative learning block was designed to supervise the interactive information,which reduced the number of parameters and improved the ability of the backbone network to obtain complex features.In the transmission layer,a time-frequency spatial attention transformer module was proposed to model the voice sub-band and full band information,and made full use of acoustic characteristics to simulate local spectrum patterns and capture the harmonic dependency.By combining this module with a channel attention branch,a learnable dual-branch attention fusion(DAF)mechanism was designed to extract contextual features from the space-channel perspective to enhance multi-dimensional information transmission.Finally,a Gaussian weighted asymptotic network was built as the intermediate transmission layer,and the weighted sum output was performed by stacking DAF modules to make full use of the deep features and make the decoding process more robust.Ablation and comprehensive comparison experiments were conducted on the English VoiceBank-DEMAND dataset,the Chinese THCHS30 corpus,and 115 kinds of ambient noise.The results showed that the proposed method achieved better subjective and objective indicator results with the minimum number of parameters(0.68×10^(6))compared with most of the latest correlation network models.It demonstrated outstanding enhancement performance and generalization ability.

作者张天骐罗庆予张慧芝方蓉 ZHANG Tianqi;LUO Qingyu;ZHANG Huizhi;FANG Rong(School of Communication and Information Engineering,Chongqing University of Posts andTelecommunications(CQUPT),Chongqing 400065,China)

机构地区重庆邮电大学通信与信息工程学院

出处《信号处理》 CSCD 北大核心 2024年第2期406-416,共11页 Journal of Signal Processing

基金国家自然科学基金(61671095,61702065,61701067,61771085) 重庆市自然科学基金项目(cstc2021jcyj-msxm X0836) 信号与信息处理重庆市市级重点实验室建设项目(CSTC2009CA2003) 重庆市教育委员会科研项目(KJQN201900601,KJ1600429)。

关键词语音增强复频谱映射高效Transformer 轻量型网络 speech enhancement complex spectral mapping efficient transformer lightweight network

分类号 TN912.35 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献1

1张天骐,柏浩钧,叶绍鹏,刘鉴兴.基于门控残差卷积编解码网络的单通道语音增强方法[J].信号处理,2021,37(10):1986-1995. 被引量：5

二级参考文献1

1时文华,倪永婧,张雄伟,邹霞,孙蒙,闵刚.联合稀疏非负矩阵分解和神经网络的语音增强[J].计算机研究与发展,2018,55(11):2430-2438. 被引量：9

共引文献4

1范君怡,杨吉斌,张雄伟,郑昌艳.基于Transformer的单通道语音增强模型综述[J].计算机工程与应用,2022,58(12):25-36. 被引量：5
2张天骐,罗庆予,方蓉,张慧芝.基于信息提炼与残差特征聚合网络的单通道语音增强[J].信号处理,2023,39(7):1285-1298. 被引量：1
3张天骐,熊天,吴超,闻斌.基于压缩激励残差分组扩张卷积和密集线性门控Unet歌声分离方法[J].应用科学学报,2023,41(5):815-830.
4金玉堂,王以松,王丽会,赵鹏利.基于多尺度阶梯时频Conformer GAN的语音增强算法[J].计算机应用,2023,43(11):3607-3615. 被引量：3

1罗庆予,张天骐,方蓉,张慧芝.联合频谱映射与掩蔽估计的协作式语音增强方法[J].电子测量与仪器学报,2023,37(10):14-23.
2樊敏,宋世军.基于多源大数据分析的图像特征智能识别模型[J].吉林大学学报（工学版）,2023,53(2):555-561. 被引量：1
3范祥林.调频频段数字音频广播模数同播实验方法研究[J].广播与电视技术,2023,50(4):90-94. 被引量：1
4金旭梅,李崇民,李静.矩阵能量与其迹的关系研究[J].青海师范大学学报（自然科学版）,2022,38(4):20-24.
5刘金威,左路.基于复特征体系的线性系统动力学轨迹分析[J].高师理科学刊,2023,43(8):30-37.
6刘越.数字音频广播CDR编码技术及应用探究[J].数字传媒研究,2023,40(5):63-68. 被引量：6
7叶含月,徐永能,陈新.交通弱势出行者轨迹预测及安全管控策略[J].江苏大学学报（自然科学版）,2024,45(2):198-205.
8贾红剑,徐天杨.基于稀疏约束非负矩阵分解的水下线谱增强方法[J].黑龙江大学自然科学学报,2023,40(5):613-620.
9欧频,李昱丞,郑雪婷,游慧圆.流气式正比计数管低本底α/β测量仪测定总α与总β放射性时的串道消除方法探究[J].中国辐射卫生,2023,32(6):679-684.
10郝然,王红军,李天瑞.基于双分支串行混合注意力的输电线路缺陷检测深度神经网络模型[J].计算机科学,2024,51(3):135-140.

信号处理

2024年第2期

浏览历史

内容加载中请稍等...

复谱映射下融合高效Transformer的语音增强方法

参考文献1

二级参考文献1

共引文献4

相关作者

相关机构

相关主题

浏览历史