期刊文献+

基于卷积神经网络的说话人识别算法 被引量:12

Speaker recognition based on convolutional neural network
下载PDF
导出
摘要 语音信号是一个时变信号,受个体、环境等影响较大。为提高说话人识别率,对原始语音信号进行一定预处理是必要的,提出一种利用卷积神经网络的说话人识别算法。该算法利用卷积神经网络的卷积和降采样两种操作对说话人识别中的语音信号进行预处理,通过构建一维和二维卷积操作,对预处理后的信号提取梅尔频率倒谱系数特征参数,并采用经典的通用背景模型对说话人进行识别模型建模。通过自建库和TIMIT标准库测试表明,该算法与经典的直接基于梅尔频率倒谱系数特征和通用背景模型的方法相比,识别率提升了8%~15%,并且有效地降低了算法的时间复杂度和空间复杂度。 Speech signal is a time varying signal which influences by speaker and environment easily.In order to improve speaker recognition rate,some preprocesses are needed.A speaker recognition algorithm based on Convolutional Neural Network( CNN) was proposed.In CNN,there were two main operations named convolution and down-sampling respectively,and the two operations were adopted to preprocess speech signal before feature extraction with MFCC( Mel Frequency Cepstrum Coefficient) processing,and then the classical universal background model method was used to model the speaker features.Experimental results based on a self-built database and TIMIT database show that the proposed method outperforms the classical method using MFCC features and GMM( Gaussian Mixture Model)-UBM( Universal Background Model) classifier,with respect to recognition rate improvement by 8% to 15% and time and space complexity reduction.
作者 胡青 刘本永
出处 《计算机应用》 CSCD 北大核心 2016年第A01期79-81,200,共4页 journal of Computer Applications
基金 国家自然科学基金资助项目(60862003) 科技部国际合作项目(2009DFR10530) 贵州省工业科技攻关项目(黔科合GY字(2010)2054) 教育部高等院校博士点基金资助项目(20095201110002) 贵州大学研究生创新基金资助项目(2015081)
关键词 卷积神经网络 说话人识别 通用背景模型 梅尔频率倒谱系数 预处理 Convolutional Neural Network(CNN) speaker recognition universal background model Mel Frequency Cepstrum Coefficient(MFCC) preprocessing
  • 相关文献

参考文献12

  • 1RABINER L R, JUANG B H. Fundamentals of speech recognition [ M]. Englewood Cliffs: PTR Prentice Hall, 1993:23 -51.
  • 2ATAL B S. Automatic recognition of speakers from their voices[ J]. Proceedings of the IEEE, 1976,64(4) : 460 -475.
  • 3DAVIS S B, MERMELSTEIN P. Comparison of parametric repre- sentations for monosyllabic word recognition in continuously spoken sentences[ J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1980, 28(4) : 357 -366.
  • 4REYNOLDS D A, ROSE R C. Robust text-independent speaker i- dentification using Gaussian mixture speaker models [ J]. IEEE Transactions on Speech and Audio Processing, 1995, 3(1) : 72 - 83.
  • 5SOONG F K, ROSENBERG A E, JUANG B H, et al. Report: a vector quantization approach to speaker recognition[ J]. AT&T Tech- nical Journal, 1987,66(2) : 14 -26.
  • 6WAN V, CAMPBELL W M. Support vector machines for speaker verification and identification [ C]// Neural Networks for Signal Processing X, 2000: Proceedings of the 2000 IEEE Signal Processing Society Workshop. Piscataway: IEEE. 2000, 2:775 - 784.
  • 7REYNOLDS D A, QUATIERI T F, DUNN R B. Speaker verification using adapted Ganssian mixture models [ J]. Digital Signal Processing, 2000, 10(1/2/3) : 19 -41.
  • 8鲁晓倩.基于VP树和GMM的说话人识别研究[D].合肥:中国科学技术大学,2012:34-38.
  • 9LECUN Y . Generalization and network design strategies [ D ] .Toronto: University of Toronto, 1989:143 - 155.
  • 10ABDEL-HAMID O, MOHAMED A-R, JIANG H, et al. Convolutional neural networks for speech recognition[ J]. IEEE/ ACM Transactions on Audio, Speech and Language Processing, 2014, 22(10) :. 1533 - 1545.

同被引文献118

引证文献12

二级引证文献88

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部