期刊文献+

低资源语音识别中融合多流特征的卷积神经网络声学建模方法 被引量:7

Acoustic modeling approach of multi-stream feature incorporated convolutional neural network for low-resource speech recognition
下载PDF
导出
摘要 针对卷积神经网络(CNN)声学建模参数在低资源训练数据条件下的语音识别任务中存在训练不充分的问题,提出一种利用多流特征提升低资源卷积神经网络声学模型性能的方法。首先,为了在低资源声学建模过程中充分利用有限训练数据中更多数量的声学特征,先对训练数据提取几类不同的特征;其次,对每一类类特征分别构建卷积子网络,形成一个并行结构,使得多特征数据在概率分布上得以规整;然后通过在并行卷积子网络之上加入全连接层进行融合,从而得到一种新的卷积神经网络声学模型;最后,基于该声学模型搭建低资源语音识别系统。实验结果表明,并行卷积层子网络可以将不同特征空间规整得更为相似,且该方法相对传统多特征拼接方法和单特征CNN建模方法分别提升了3.27%和2.08%的识别率;当引入多语言训练时,该方法依然适用,且识别率分别相对提升了5.73%和4.57%。 Aiming at solving the problem of insufficient training of Convolutional Neural Network (CNN) acoustic modeling parameters under the low-resource training data condition in speech recognition tasks, a method for improving CNN acoustic modeling performance in low-resource speech recognition was proposed by utilizing multi-stream features. Firstly, in order to make use of enough acoustic information of features from limited data to build acoustic model, multiple features of low- resource data were extracted from training data. Secondly, convolutional subnetworks were built for each type of features to form a parallel structure, and to regularize distributions of multiple features. Then, some fully connected layers were added above the parallel convolutional subnetworks to incorporate multi-stream features, and to form a new CNN acoustic model. Finally, a low-resource speech recognition system was built based on this acoustic model. Experimental results show that parallel convolutional subnetworks normalize different feature spaces more similar, and it gains 3.27% and 2.08% recognition accuracy improvement respeetively compared with traditional multi-feature splicing training approach and baseline CNN system. Furthermore, when muhilingnal training is introduced, the proposed method is still applicable, and the recognition aceuraey is improved by 5.73% and 4. 57% respectively.
出处 《计算机应用》 CSCD 北大核心 2016年第9期2609-2615,共7页 journal of Computer Applications
基金 国家自然科学基金资助项目(61175017 61403415)~~
关键词 低资源语音识别 卷积神经网络 特征规整 多流特征 low-resource speech recognition Convolutional Neural Network (CNN) feature normalization muhi-streamfeature
  • 相关文献

参考文献14

  • 1HINTON G, LI D, DONG Y, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups [J]. IEEE Signal Processing Magazine, 2012, 29(6): 82-97.
  • 2DAHL G E, YU D, DENG L, et al. Context-dependent pre-trained deep neural networks for large vocabulary speech recognition [J]. IEEE Transactions on Audio, Speech and Language Processing, 2012, 20(1): 30-42.
  • 3ABDEL-HAMID O, MOHAMED A-R, JIANG H, et al. Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition [C]// ICASSP 2012: Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ: IEEE, 2012: 4277-4280.
  • 4ABDEL-HAMID O, MOHAMED A-R, JIANG H, et al. Convolutional neural networks for speech recognition [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2014, 22(10): 1533-1545.
  • 5ABDEL-HAMID O, DENG L, YU D. Exploring convolutional neural network structures and optimization techniques for speech recognition [EB/OL]. [2016-01-05]. https://www.researchgate.net/publication/264859599_Exploring_Convolutional_Neural_Network_Structures_and_Optimization_Techniques_for_Speech_Recognition.
  • 6SAINATH T N, MOHAMED A-R, KINGSBURY B, et al. Deep convolutional neural networks for LVCSR [C]// ICASSP 2013: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ: IEEE, 2013: 8614-8618.
  • 7SAINATH T N, MOHAMED A-R, KINGSBURY B, et al. Improvements to deep convolutional neural networks for LVCSR [C]// ASRU 2013: Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. Piscataway, NJ: IEEE, 2013: 315-320.
  • 8MIAO Y J, METZE F. Improving language-universal feature extraction with deep maxout and convolutional neural networks [C]// INTERSPEECH 2014: Proceedings of the 2014 International Speech Communication Association Annual Conference. Singapore: International Speech Communication Association, 2013: 800-804.
  • 9CHAN W, LANE I. Deep convolutional neural networks for acoustic modeling in low resource languages [C]// ICASSP 2015: Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ: IEEE, 2015: 2056-2060.
  • 10HUANG J T, LI J Y, YU D, et al. Cross language knowledge transfer using multilingual deep neural network with shared hidden layers [C]// ICASSP 2013: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ: IEEE, 2013: 7304-7308.

同被引文献44

引证文献7

二级引证文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部