摘要
针对声纹识别任务在含噪背景下鲁棒性欠佳的问题,文章提出了一种基于TDNN改进的含噪声纹识别方法。该算法先提取说话人音频的梅尔频谱,利用自注意力机制(SE)使得网络更加聚焦于重要特征,引入残差连接(Res)修正梅尔频谱与输出层的特征损失信息,一定程度缓解神经网络退化的问题,使用多层特征聚合(MFA)密集连接输出特征,生成关注统计池的特征,最终生成一种强鲁棒性的声纹特征。在AISHELL-ASR0009含噪数据集进行实验表明:与Base-TDNN相比,i-TDNN算法的识别准确率提升16.63%,验证了此算法在含噪背景下的鲁棒性。
To solve the problem that voice print recognition is not robust under background noise,this paper proposes an end-to-end Speaker Vector based on TDNN.Firstly,the algorithm extracts the Mahr spectrum of the speaker audio,and corrects the feature loss information of the Mahr spectrum and the output layer with the residuals connection(Res).Secondly,the seif-attention mechanism is introduced to make the network focus more on the important features and to some extent alleviate the problem of neural network degradation.Multi-layer feature aggregation(MFA)is used to intensively connect the output features.??Generate features that focus on the statistical pool,and finally generate a robust voicing vector.??Experiments on Aishell-1 dataset with noise show that compared with TDNN-base,this Speaker-Vector improves by 16.63%,thus verifying the effectiveness of this algorithm in the background of noise.
作者
伍雄
陈为真
WU Xiong;CHEN Weizhen(School of Electrical and Electronic Engineering of Wuhan Polytechnic University,Wuhan 430048,China)
出处
《长江信息通信》
2023年第2期27-30,共4页
Changjiang Information & Communications
基金
湖北省教育厅科技项目(B2020061)。