摘要
随着深度学习网络模型在生物识别领域的应用,将说话人识别的发展推向一个新的阶段。早期用于说话人识别的深度学习模型主要为深度神经网络(DNN),在一定程度上改善了说话人识别的性能,但模型训练速度和识别精度都有待提升。笔者基于提取局部特征,引入模型训练复杂程度更低的卷积神经网络(CNN),采用跳跃连接的方法,解决了CNN在训练阶段随着卷积层数的增加引起的梯度消失问题,并在训练阶段对话语采用基于注意力机制的由帧级到段级聚合,以及softmax loss、center loss联合监督的方法对模型进行训练,大幅提升了CNN用于说话人识别的性能。
With the application of deep learning network model in the field of biometrics,the development of speaker recognition is pushed to a new stage.The early deep learning model for speaker recognition is mainly deep neural network(DNN),which improves the performance of speaker recognition to a certain extent,but its training speed and recognition accuracy still need to be improved.Based on the extraction of local features and convolutional neural network(CNN)that is less complex,this paper introduces the method of jump connection,which solves the problem of gradient disappearance caused by the increase of convolution layer in CNN training stage.Besides the method uses the attention mechanism based utterance level aggregation,and joint supervision method of softmax loss and center loss to train the model,which greatly improves the performance of CNN for speaker recognition.
作者
史王雷
冯爽
Shi Wanglei;Feng Shuang(Key Laboratory of Intelligent Financial Media of Ministry of Education,Communication University of China,Beijing 100024,China)
出处
《信息与电脑》
2020年第4期145-147,共3页
Information & Computer
关键词
说话人识别
卷积神经网络
聚合
联合监督
speaker recognition
convolutional neural network
aggregation
joint supervision