期刊文献+

基于SGMM和DNN结合提高音素识别率的研究 被引量:1

Research on Improving Phoneme Recognition Rate Based on Subspace Gaussian Mixture Model and Deep Neural Network Combination
下载PDF
导出
摘要 为降低声学特征在语音识别系统中的音素识别错误率,提高系统性能,提出一种子空间高斯混合模型和深度神经网络结合提取特征的方法,分析了子空间高斯混合模型的参数规模并在减少计算复杂度后将其与深度神经网络串联进一步提高音素识别率。把经过非线性特征变换的语音数据输入模型,找到深度神经网络结构的最佳配置,建立学习与训练更可靠的网络模型进行特征提取,通过比较音素识别错误率来判断系统性能。实验仿真结果证明,基于该系统提取的特征明显优于传统声学模型。 In order to reduce the phoneme recognition error rate of acoustic features in speech recognition system and improve system performance,a Subspace Gaussian Mixture Model(SGMM)and Deep Neural Network(DNN)combined with extraction features are proposed.The parameter size of SGMM is analyzed and the computational complexity is reduced.After the degree is connected with DNN,the phoneme recognition rate is further improved.The speech data transformed by nonlinear feature is input into the model to find the optimal configuration of the deep neural network structure,and a more reliable network model for learning and training is established for feature extraction.The phoneme recognition error rate is compared to judge the system performance.Experimental simulation results show that the features extracted based on the system are significantly better than the traditional acoustic model.
作者 贾兵兵 曹辉 秦驰杰 JIA Bingbing;CAO Hui;QIN Chijie(School of Physics and Information Technology,Shaanxi Normal University,Xi’an 710119,China)
出处 《计算机工程与应用》 CSCD 北大核心 2019年第24期117-121,127,共6页 Computer Engineering and Applications
基金 国家自然科学基金(No.1202020368,No.11074159,No.11374199)
关键词 声学特征 音素识别 子空间高斯混合模型 深度神经网络 acoustic feature phoneme recognition subspace Gaussian mixture model deep neural network
  • 相关文献

参考文献1

二级参考文献12

  • 1Hinton G, Salakhutdinov R. Reducing the Dimensionality of Data with Neural Networks [ J ]. Science 2006, 313 (5786): 504-507.
  • 2Li Deng, Dong Yu. Deep Learning for Signal and Infor- mation Processing. Microsoft Research. 2013.
  • 3Li Deng. An Overview of Deep-Structured Learning for In- formation Processing[ C ]//Proceedings of the Asian-Pacif- ic Signal and Information Processing-Annual Summit and Conference, Xian, China, 1-14.
  • 4October 2011. Krizhevsky A, Hinton G. Learning Multiple Layers of Fea- tures from Tiny Images [ C ]//Computer Science Depart- merit, University of Toronto,Tech. Rep. 2009.
  • 5Dong Yu, Seltzer M. Improved Bottleneck Features Using Pretrained Deep Neural Networks [ C ] ///Proceedings of INTERSPEECH 2011, Italy, 237-240, August 2011.
  • 6Mohamed A, Hinton G, Penn G. Acoustic modeling using deep belief networks. Audio, Speech, and Language Pro- cessing[ J ]. IEEE Transactions on, 2012,1 (20) : 14-22.
  • 7Veselr K, Ghoshal A, Burget L, Povey D. Sequence-dis- criminative training of deep neural networks [ C ] //[N- TERSPEECH. 2013,8.
  • 8Nguyen P, Kuhn R, Junqua J C, Niedzielski N. Rapid speaker adaptation in eigenvoice space [ J ]. IEEE Trans- actions on Speech and Audio Processing, 2000, 8 ( 6 ) : 695 -707.
  • 9Siniscalchi S M, Dong Yu, Li Deng, Chin-Hui Lee. Speech Recognition Using Long-Span Temporal Patterns in a Deep Network Mode. IEEE Signal Processing Let- ters. 2013 : 20 ( 3 ) :201 - 204.
  • 10Yebo Bao, Hui Jiang, Cong Liu, Yu Hu. Investigation on dimensionality reduction of concatenated features with deep neural network for LVCSR systems [ C ]///Proceed- ings of the IEEE l lth International Conference on Signal Processing ( ICSP2012), Beijing, China, 2012 : 562-566.

共引文献8

同被引文献12

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部