期刊文献+

端到端的深度卷积神经网络语音识别 被引量:30

END-TO-END SPEECH RECOGNITION BASED ON DEEP CONVOLUTION NEURAL NETWORK
下载PDF
导出
摘要 卷积神经网络(Convolutional Neural Networks,CNN)是目前流行的语音识别模型之一,其特有卷积结构保证了语音信号时域和频域的平移不变性。但是CNN存在着对语音信号建模能力有所不足的问题。为此,将链接时序准则(CTC)应用在CNN结构中,构建端到端卷积神经网络(CTC-CNN)模型。同时,引入残差块结构,提出一种新的端到端深度卷积神经网络(CTC-DCNN)模型,并利用maxout激活函数对其进行优化。通过TIMIT和Thchs-30语音库测试实验,结果表明在中英文识别中,采用该模型比现有卷积神经网络模型,准确率分别提高约4.7%和6.3%。 Convolutional Neural Networks(CNN)is one of the most popular speech recognition models.Its unique convolution structure guarantees the translation invariance of speech signals in time domain and frequency domain.However,CNN has the problem of insufficient ability of speech signal modeling.Therefore,the end-to-end convolutional neural network(CTC-CNN)model was constructed by applying the link timing criterion(CTC)to the structure of CNN.A new end-to-end deep convolution neural network(CTC-DCNN)model was proposed by introducing residual block structure.The maxout activation function was used to optimize it.Through the test of TIMIT and Thchs-30 speech database,the results show that the accuracy of our model in Chinese-English recognition is 4.7%and 6.3%higher than that of the existing CNN model.
作者 刘娟宏 胡彧 黄鹤宇 Liu Juanhong;Hu Yu;Huang Heyu(College of Physics and Optoelectronics,Taiyuan University of Technology,Jinzhong 030600,Shanxi,China)
出处 《计算机应用与软件》 北大核心 2020年第4期192-196,共5页 Computer Applications and Software
关键词 语音识别 卷积神经网络 maxout激活函数 端到端 Speech recognition Convolution neural network Maxout activation function End-to-end
  • 相关文献

参考文献9

二级参考文献40

共引文献98

同被引文献236

引证文献30

二级引证文献66

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部