期刊文献+

基于改进i-vector的说话人感知训练方法研究

Research on Speaker Aware Training Method Based on Improved i-vector
下载PDF
导出
摘要 基于辨识向量(i-vector)的说话人感知训练方法使用MFCC作为输入特征对i-vector进行提取,但MFCC较差的特征鲁棒性会影响该训练方法的识别性能。为此,提出一种基于改进i-vector的说话人感知训练方法。设计基于SVD的低维特征提取方法,用其提取的特征替代MFCC对表征能力更优的i-vector进行提取。实验结果表明,在捷克语语料库中,相对于DNN-HMM语音识别系统与原始基于i-vector的说话人感知训练方法,该方法的识别性能分别提升了1.62%与1.52%,在WSJ语料库中,该方法识别性能分别提升了3.9%和1.48%。 The performance of speaker aware training method based on i-vector is poor because of using MFCC which has the relative poor robustness as the input feature for the extraction of the i-vector. To solve this problem, an improved i-vector based speaker aware training method is proposed. Firstly,a low dimensional feature extraction method based on SVD is proposed, and then the feature extracted by this method is used to replace the MFCC,which can extract better ivector.Experimental results show that,in the Vystadial_cz corpus,compared with the DNN-HMM speech recognition system and the original i-vector based speaker aware training method,the recognition performance of this method is increased by 1. 62% and 1. 52% respectively,in the WSJ corpus,the recognition performance of this method is increased by 3. 9% and 1. 48% respectively.
作者 梁玉龙 屈丹 邱泽宇 LIANG Yulong;QU Dan;QIU Zeyu(School of Information and Systems Engineering, PLA Information Engineering University ,Zhengzhou 450002, Chin)
出处 《计算机工程》 CAS CSCD 北大核心 2018年第5期262-267,共6页 Computer Engineering
基金 国家自然科学基金(61673395 61403415) 河南省自然科学基金(162300410331)
关键词 说话人感知训练 辨识向量 深度神经网络 奇异值矩阵分解 瓶颈特征 speaker aware training i-vector Deep Neural Network (DNN) Singular Value Matrix Decomposition(SVMD) bottleneck feature
  • 相关文献

参考文献3

二级参考文献20

  • 1HINTON G, LI D, DONG Y, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups [J]. IEEE Signal Processing Magazine, 2012, 29(6): 82-97.
  • 2DAHL G E, YU D, DENG L, et al. Context-dependent pre-trained deep neural networks for large vocabulary speech recognition [J]. IEEE Transactions on Audio, Speech and Language Processing, 2012, 20(1): 30-42.
  • 3ABDEL-HAMID O, MOHAMED A-R, JIANG H, et al. Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition [C]// ICASSP 2012: Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ: IEEE, 2012: 4277-4280.
  • 4ABDEL-HAMID O, MOHAMED A-R, JIANG H, et al. Convolutional neural networks for speech recognition [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2014, 22(10): 1533-1545.
  • 5ABDEL-HAMID O, DENG L, YU D. Exploring convolutional neural network structures and optimization techniques for speech recognition [EB/OL]. [2016-01-05]. https://www.researchgate.net/publication/264859599_Exploring_Convolutional_Neural_Network_Structures_and_Optimization_Techniques_for_Speech_Recognition.
  • 6SAINATH T N, MOHAMED A-R, KINGSBURY B, et al. Deep convolutional neural networks for LVCSR [C]// ICASSP 2013: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ: IEEE, 2013: 8614-8618.
  • 7SAINATH T N, MOHAMED A-R, KINGSBURY B, et al. Improvements to deep convolutional neural networks for LVCSR [C]// ASRU 2013: Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. Piscataway, NJ: IEEE, 2013: 315-320.
  • 8MIAO Y J, METZE F. Improving language-universal feature extraction with deep maxout and convolutional neural networks [C]// INTERSPEECH 2014: Proceedings of the 2014 International Speech Communication Association Annual Conference. Singapore: International Speech Communication Association, 2013: 800-804.
  • 9CHAN W, LANE I. Deep convolutional neural networks for acoustic modeling in low resource languages [C]// ICASSP 2015: Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ: IEEE, 2015: 2056-2060.
  • 10HUANG J T, LI J Y, YU D, et al. Cross language knowledge transfer using multilingual deep neural network with shared hidden layers [C]// ICASSP 2013: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ: IEEE, 2013: 7304-7308.

共引文献72

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部