摘要
Deep Speech是一个端到端的语音识别系统,该系统使用深度学习的方法取代了传统的特征提取方法,直接从根据波形文件产生的频谱图中提取特征生成对应的文字信息。该系统使用门限循环单元构建的循环神经网络能够对具有时间序列相关性的语音信息进行学习,还使用了CTC进行输入到输出的映射以及网络模型参数的更新。将这种方法与语言模型相结合之后,对单词的拼写错误进行修正,能够得到更好的识别效果,使用方法也更加简单。
Deep Speech is an end-to-end speech recognition system that uses adepth-of-learning method instead of a tradi-tional feature extraction method to generate the corresponding textual information directly from the spectral map generated from the waveform file. The cyclic neural network constructed by the threshold cycle unit can be used to study the speech information with time series correlation. It also uses the CTC to perform the input to output mapping and the updating of the network model parame-ters. Combining this method with the language model,it can correct the misspelling of the word and get a better recognition result, and the method is more simple.
出处
《计算机与数字工程》
2017年第8期1620-1624,共5页
Computer & Digital Engineering
关键词
语音识别
深度学习
循环神经网络
CTC
门限循环单元
随机梯度下降
语言模型
speech recognition
deep learning
recurrent neural network
CTC
gated recurrent unit
random gradient de-scent
language model