摘要
说话人辨认技术在许多领域有着广泛的应用前景。首先研究了两种基本的深度神经网络模型(深度信念网络和降噪自编码)在说话人辨认上的应用,深度神经网络通过逐层无监督的预训练和有监督的反向微调避免了反向传播容易陷入局部最小值的缺陷,通过实验证明了当神经元个数达到一定数量之后深度网络模型是优于普通BP网络的,并且其性能随着网络规模的扩大而提升。考虑到大规模的深度网络训练时间较长的缺点,提出使用整流线性单元(Re LU)代替传统的sigmoid类函数对说话人识别的深度模型进行改进,实验结果表明改进后的深度模型平均训练时间减少了35%,平均误识率降低了8.3%。
The technology of speaker identification will be used in many areas in the future. Firstly,a research is made on the use of two basic Deep Neural Network models which refer to Stacked Denoising-Autoencoders and Deep Belief Network on speaker identification. By pre-training layer-wisely without labels and back fine-tuning with labels,Deep Neural Network has overcome the shortcoming that is easy to fall into local minimum caused by back propagation. The experiments proves that Deep Network Model performs better than normal BP Network when the amount of neurons is bigger than certain number and its performance grows with the scale of Network enlarges. Considering the training time of large Deep Model is too long,this text proposes using Rectifier Linear Unit to replace traditional sigmoid function to improve deep model on speaker identification. The results of experiment show that the training time and error rate of improved deep model has decreased by 35% and 8.3% respectively.
作者
赵艳
吕亮
赵力
ZHAO Yan1 ,Lu Liang3, ZHAO Li3(1.School of Electric Power Engineering,Nanjing Institute of Technology,Nanjing 211167 China; 2. School of Information Science and Engineering, Southeast university, Nanjing 210096, Chin)
出处
《电子器件》
CAS
北大核心
2017年第5期1229-1233,共5页
Chinese Journal of Electron Devices
基金
国家自然科学基金项目(61301219)
南京工程学院校级项目(YKJ201107)
2014年青蓝工程项目
关键词
说话人辨认
堆叠降噪自编码
深度信念网络
整流线性单元
speakeridentification
stacked denoising-autoencoders
deep belief network
rectifier neural network