摘要
在语音识别中,卷积神经网络(convolutional neural networks,CNNs)相比于目前广泛使用的深层神经网络(deep neural network,DNNs),能在保证性能的同时,大大压缩模型的尺寸.本文深入分析了卷积神经网络中卷积层和聚合层的不同结构对识别性能的影响情况,并与目前广泛使用的深层神经网络模型进行了对比.在标准语音识别库TIMIT以及大词表非特定人电话自然口语对话数据库上的实验结果证明,相比传统深层神经网络模型,卷积神经网络明显降低模型规模的同时,识别性能更好,且泛化能力更强.
Convolutional neural networks ( CNNs ) , which show success in achieving translation invariance for many image processing tasks, were investigated for continuous speech recognition. Compared to deep neural networks ( DNNs) , which are proven to be successful in many speech recognition tasks nowadays, CNNs can reduce the neural network model sizes significantly, and at the same time achieve even a better recognition accuracy. Experiments on standard speech corpus TIMIT and conversational speech corpus show that CNNs outperform DNNs in terms of the accuracy and the generalization ability.
出处
《工程科学学报》
EI
CAS
CSCD
北大核心
2015年第9期1212-1217,共6页
Chinese Journal of Engineering
基金
国家自然科学基金资助项目(11161140319
91120001
61271426)
中国科学院战略性先导科技专项(XDA06030100
XDA06030500)
国家高技术研究发展计划资助项目(2012AA012503)
中国科学院重点部署项目(KGZD-EW-103-2)
关键词
卷积神经网络
连续语音识别
权值共享
聚合
泛化性
convolutional neural networks
continuous speech recognition
weight sharing
pooling
generalization