摘要
近年来,深度学习网络在语音增强领域表现出优越的性能,先进算法通常使用卷积增强Transformer(Conformer)作为特征提取层的主要组成部分。虽然Conformer在语音信号处理上表现出优异的性能,但是其需要大量的计算成本。提出了一种名为多分支双阶段网络(Multi-Branch Two-Stage Net,MBTS-Net)的高效语音增强算法。多分支双阶段网络的结构包括编码器、时频特征提取层和解码器。为了降低计算成本,结合经典Transformer结构,以极简原则对一维免激活模块(Unidimensional Activation Free,UAF)进行设计。为了灵活高效地提取频谱信息,多分支双阶段网络在时频特征提取层中用多分支的双阶段一维免激活模块取代了双阶段Conformer模块。在双阶段网络中引入可学习残差连接,使网络能更灵活地学习语音的时间和频率特征。实验结果表明,相比于当前主流的语音增强算法,多分支双阶段网络的参数量更小,并且语音质量客观评价指标有所提高,明显改善了受损语音的语音质量。
In recent years,deep learning networks demonstrate superior performance in the domain of speech enhancement.State-of-the-art algorithms commonly employ a Conformer(Convolutional-enhanced Transformers)as the primary component of the feature extraction layer.Although Conformer exhibits outstanding performance in speech signal processing,its computational demands pose significant constraints on algorithm deployment.This paper proposes an efficient speech enhancement algorithm called MBTSNet(Multi-Branch Two-Stage Net).The structure of the network includes an encoder,a time-frequency feature extraction layer and a decoder.In order to reduce the computational cost,combined with the classical Transformer structure,it is designed with a minimalist principle for UAF(Unidimensional Activation Free).To extract the spectral information flexibly and efficiently,the MBTS-Net replaces the two-stage Conformer module with a MBTS-UAF module in the time-frequency feature extraction layer.In the two-stage network,this paper introduces learnable residual connections,enabling the network to learn temporal and frequency features of speech more flexibly.The experimental results indicate that compared with the current mainstream speech enhancement algorithms,the number of parameters of MBTS-Net is smaller,and the objective evaluation metrics of speech quality is improved,which significantly enhances the speech quality of damaged speech.
作者
林子震
高勇
LIN Zizhen;GAO Yong(College of Electronics and Information Engineering,Sichuan University,Chengdu Sichuan 610065,China)
出处
《通信技术》
2024年第7期660-665,共6页
Communications Technology
关键词
轻量化
多分支
双阶段
语音增强
声纹
lightweight
multi-branch
two-stage
speech enhancement
voice print