期刊文献+

基于注意力机制的联合监督端到端说话人识别模型

End-to-end Speaker Recognition Model for Joint Supervision Based on Attention Mechanism
下载PDF
导出
摘要 随着深度学习网络模型在生物识别领域的应用,将说话人识别的发展推向一个新的阶段。早期用于说话人识别的深度学习模型主要为深度神经网络(DNN),在一定程度上改善了说话人识别的性能,但模型训练速度和识别精度都有待提升。笔者基于提取局部特征,引入模型训练复杂程度更低的卷积神经网络(CNN),采用跳跃连接的方法,解决了CNN在训练阶段随着卷积层数的增加引起的梯度消失问题,并在训练阶段对话语采用基于注意力机制的由帧级到段级聚合,以及softmax loss、center loss联合监督的方法对模型进行训练,大幅提升了CNN用于说话人识别的性能。 With the application of deep learning network model in the field of biometrics,the development of speaker recognition is pushed to a new stage.The early deep learning model for speaker recognition is mainly deep neural network(DNN),which improves the performance of speaker recognition to a certain extent,but its training speed and recognition accuracy still need to be improved.Based on the extraction of local features and convolutional neural network(CNN)that is less complex,this paper introduces the method of jump connection,which solves the problem of gradient disappearance caused by the increase of convolution layer in CNN training stage.Besides the method uses the attention mechanism based utterance level aggregation,and joint supervision method of softmax loss and center loss to train the model,which greatly improves the performance of CNN for speaker recognition.
作者 史王雷 冯爽 Shi Wanglei;Feng Shuang(Key Laboratory of Intelligent Financial Media of Ministry of Education,Communication University of China,Beijing 100024,China)
出处 《信息与电脑》 2020年第4期145-147,共3页 Information & Computer
关键词 说话人识别 卷积神经网络 聚合 联合监督 speaker recognition convolutional neural network aggregation joint supervision

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部