摘要
在传统的说话人识别中,普遍采用的是高斯混合模型(GMM)及GMM-UBM模型。然而GMM及GMM-UBM模型由于对噪声非常敏感及对语音的长度有一定的要求,所以对说话人数据库质量要求很高。并且传统的机器学习算法(GMM,GMM-UBM)属于浅层以及不完全的学习,识别率随着识别人数的增加下降的剧烈,模型的鲁棒性相对较差。并且存在训练时间长,收敛困难的缺点,从而限制了说话人识别在实际中的应用。深度神经网络(DNN)具有强大的非线性特性以及对数据具有良好的模式分类能力,对语音信号的质量及长度要求不高,并且对噪声的容忍度较高,所以论文把深度神经网络引入到了说话人识别中。
In the traditional speaker recognition, Gaussian mixture model (GMM) and GMM-UBM model are widely used. However, GMM and GMM-UBM models are very demanding on the quality of the speaker database because they are very sensitive to noise and have certain requirements on the length of the speech. And the traditional machine learning algorithm (GMM, GMM-UBM) belongs to the shallow and incomplete learning. The recognition rate decreases with the increase of the identification number, and the robustness of the model is relatively poor. And there isa short training time,convergence difficulties, thus limiting the speaker recognition in practice. Deep neural network (DNN) has strong nonlinear characteristics and good data classification ability of the model, the quality and length of the speech signal is less demanding,and the noise tolerance is higher, so the paper introduces the deep neural network In the speaker recognition.
作者
李浩
鲍鸿
张晶
LI Hao;BAO Hong;ZHANG Jing(School of Automation,Guangdong University of Technology,Guangzhou 510006,China;Institute of Information Science and Technology,Guangdong University of Foreign Studies,Guangzhou 510420,China)
出处
《电脑与信息技术》
2018年第5期1-3,8,共4页
Computer and Information Technology
基金
教育部人文社科项目(项目编号:17YJCZH242)