
基于深度神经网络的说话人识别模型研究 被引量:3

Research on Speaker Recognition Model Based on Depth Neural Network
摘要 在传统的说话人识别中,普遍采用的是高斯混合模型(GMM)及GMM-UBM模型。然而GMM及GMM-UBM模型由于对噪声非常敏感及对语音的长度有一定的要求,所以对说话人数据库质量要求很高。并且传统的机器学习算法(GMM,GMM-UBM)属于浅层以及不完全的学习,识别率随着识别人数的增加下降的剧烈,模型的鲁棒性相对较差。并且存在训练时间长,收敛困难的缺点,从而限制了说话人识别在实际中的应用。深度神经网络(DNN)具有强大的非线性特性以及对数据具有良好的模式分类能力,对语音信号的质量及长度要求不高,并且对噪声的容忍度较高,所以论文把深度神经网络引入到了说话人识别中。 In the traditional speaker recognition, Gaussian mixture model (GMM) and GMM-UBM model are widely used. However, GMM and GMM-UBM models are very demanding on the quality of the speaker database because they are very sensitive to noise and have certain requirements on the length of the speech. And the traditional machine learning algorithm (GMM, GMM-UBM) belongs to the shallow and incomplete learning. The recognition rate decreases with the increase of the identification number, and the robustness of the model is relatively poor. And there isa short training time,convergence difficulties, thus limiting the speaker recognition in practice. Deep neural network (DNN) has strong nonlinear characteristics and good data classification ability of the model, the quality and length of the speech signal is less demanding,and the noise tolerance is higher, so the paper introduces the deep neural network In the speaker recognition.
作者 李浩 鲍鸿 张晶 LI Hao;BAO Hong;ZHANG Jing(School of Automation,Guangdong University of Technology,Guangzhou 510006,China;Institute of Information Science and Technology,Guangdong University of Foreign Studies,Guangzhou 510420,China)
出处 《电脑与信息技术》 2018年第5期1-3,8,共4页 Computer and Information Technology
基金 教育部人文社科项目(项目编号:17YJCZH242)
关键词 说话人识别 高斯混合模型 鲁棒性 深度神经网络 speaker recognition Gauss mixture model Robustness deep neural networks
  • 相关文献



  • 1Bengio Y.Learning Deep Architectures for AI[J].Foundations and Trends in Machine Learning,2009,2(1):1-127.
  • 2Dahl G E,Ranzato M,Mohamed A,et al.Phonerecognition with the Mean-covariance Restricted Boltzmann Machine[C]//Proceedings of the 24th Annual Conference on Neural Information Processing Systems.Berlin,Germany:Springer,2010:469-477.
  • 3Mohamed A,Dahl G E,Hinton G,et al.Acoustic Modeling Using Deep Belief Networks[J].IEEE Transactions on Audio,Speech and Language Processing,2012,20(1):14-22.
  • 4Salakhutdinov R,Hinton G.An Efficient Learning Procedure for Deep Boltzmann Machines[J].Neural Computation,2012,24(8):1967-2006.
  • 5Hinton G E,Osindero S,Teh Y W.A Fast Learning Algorithm for Deep Belief Nets[J].Neural Computation,2006,18(7):1527-1554.
  • 6Fischer A,Igel C.An Introduction to Restricted Boltzmann Machines[C]//Proceedings of Progress in Pattern Recognition,Image Analysis,Computer Vision,and Applications.Berlin,Germany:Springer,2012:14-36.
  • 7Mohamed A,Dahl G,Hinton G.Deep Belief Networks for Phone Recognition[C]//Proceedings of Workshop on Deep Learning for Speech Recognition and Related Applications.Berlin,Germany:Springer,2009.
  • 8Hinon G E.Training Products of Experts by Minimizing Contrastive Divergence[J].Neural Computation,2002,14(8):1771-1800.
  • 9Tóth L,Grósz T.A Comparison of Deep Neural Network Training Methods for Large Vocabulary Speech Recognition[C]//Proceedings of the 16th International Conference on Text,Speech,and Dialogue.Berlin,Germany:Springer,2013:36-43.
  • 10Tieleman T.Training Restricted Boltzmann Machines Using Approximations to the Likelihood Gradient[C]//Proceedings of the 25th International Conference on Machine Learning.New York,USA:ACM Press,2008:1064-1071.












使用帮助 返回顶部