面向语音增强的多分支轻量化算法

A Multi-Branch Lightweight Algorithm for Speech Enhancement

下载PDF

导出

摘要近年来,深度学习网络在语音增强领域表现出优越的性能,先进算法通常使用卷积增强Transformer(Conformer)作为特征提取层的主要组成部分。虽然Conformer在语音信号处理上表现出优异的性能,但是其需要大量的计算成本。提出了一种名为多分支双阶段网络(Multi-Branch Two-Stage Net,MBTS-Net)的高效语音增强算法。多分支双阶段网络的结构包括编码器、时频特征提取层和解码器。为了降低计算成本,结合经典Transformer结构,以极简原则对一维免激活模块(Unidimensional Activation Free,UAF)进行设计。为了灵活高效地提取频谱信息,多分支双阶段网络在时频特征提取层中用多分支的双阶段一维免激活模块取代了双阶段Conformer模块。在双阶段网络中引入可学习残差连接,使网络能更灵活地学习语音的时间和频率特征。实验结果表明,相比于当前主流的语音增强算法,多分支双阶段网络的参数量更小,并且语音质量客观评价指标有所提高,明显改善了受损语音的语音质量。 In recent years,deep learning networks demonstrate superior performance in the domain of speech enhancement.State-of-the-art algorithms commonly employ a Conformer(Convolutional-enhanced Transformers)as the primary component of the feature extraction layer.Although Conformer exhibits outstanding performance in speech signal processing,its computational demands pose significant constraints on algorithm deployment.This paper proposes an efficient speech enhancement algorithm called MBTSNet(Multi-Branch Two-Stage Net).The structure of the network includes an encoder,a time-frequency feature extraction layer and a decoder.In order to reduce the computational cost,combined with the classical Transformer structure,it is designed with a minimalist principle for UAF(Unidimensional Activation Free).To extract the spectral information flexibly and efficiently,the MBTS-Net replaces the two-stage Conformer module with a MBTS-UAF module in the time-frequency feature extraction layer.In the two-stage network,this paper introduces learnable residual connections,enabling the network to learn temporal and frequency features of speech more flexibly.The experimental results indicate that compared with the current mainstream speech enhancement algorithms,the number of parameters of MBTS-Net is smaller,and the objective evaluation metrics of speech quality is improved,which significantly enhances the speech quality of damaged speech.

作者林子震高勇 LIN Zizhen;GAO Yong(College of Electronics and Information Engineering,Sichuan University,Chengdu Sichuan 610065,China)

机构地区四川大学电子信息学院

出处《通信技术》 2024年第7期660-665,共6页 Communications Technology

关键词轻量化多分支双阶段语音增强声纹 lightweight multi-branch two-stage speech enhancement voice print

分类号 TP37 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

1糖心烙饼(编绘).“听话”的电子设备[J].当代学生,2024(9):30-32.
2周国庆,黄亮,孙乔.改进Oriented R-CNN的遥感舰船目标细粒度检测方法[J].计算机工程与应用,2024,60(15):307-317.
3张刚敏,李雅荣,贾海蓉,王鲜霞,段淑斐.基于多任务自适应知识蒸馏的语音增强[J].太原理工大学学报,2024,55(4):720-726.
4李建勋,王开.基于时频特征的反监听技术研究与应用[J].电子测量技术,2024,47(7):1-8.
5张蓬鹤,秦译为,宋如楠,陈敢超.广义S变换下串联故障电弧的时频分析及识别研究[J].电网技术,2024,48(7):2995-3003.
6植物抗病毒新机制:细胞自噬与核糖核酸降解的“平衡术”[J].农业科技与信息,2024(7):100-100.
7陆旭锋,王霄峰,左昊忱.基于电容耦合法的高压断路器真空度在线监测技术研究[J].仪表技术,2024(4):82-86.
8司亚中,张旭龙,杨帆,王健宗,程宁,肖京.基于生成对抗网络的多特征融合去雾技术[J].大数据,2024,10(4):77-88.
9沈启敏,贾月静,程艳.改进MSCNN-ECA的轴承故障诊断方法研究[J].重庆理工大学学报（自然科学）,2024,38(7):180-187.
10袁颖,程天馨,钟朝辉,韦捷,于丹,徐辉.人工智能辅助压缩感知技术在肩关节MRI中的应用[J].中国临床医学影像杂志,2024,35(7):503-507.

通信技术

2024年第7期

浏览历史

内容加载中请稍等...

面向语音增强的多分支轻量化算法

相关作者

相关机构

相关主题

浏览历史