期刊文献+

一种基于CNN/CTC的端到端普通话语音识别方法 被引量:3

An End-to-End Mandarin Speech Recognition Method Based on CNN/CTC
下载PDF
导出
摘要 为了实现离线状态较高正确率的中文普通话语音识别,提出一种基于深度全卷积神经网络CNN表征的语音识别系统的声学模型,将频谱图作为输入,在模型结构上参考了VGG模型。在输出端,该模型可以与连接时序分类完美结合,从而实现整个模型的端到端训练,将声波信号转换成普通话拼音序列。语言模型则采用最大熵马尔可夫模型,将拼音序列转换为中文文本。实验表明,此算法在测试集上已经获得了80.82%的正确率。 In order to achieve Mandarin speech recognition with higher accuracy in offline state,we come up with an acoustic model of a speech recognition system based on deep full convolutional neural network(CNN).We choose the spectrogram of acoustic signals as input.As for the structure of the model,we refer the VGG model.At the output end,the model can be perfectly combined with the connectionist temporal classification(CTC).We realize the end-to-end training of the entire model using this method,and the acoustic signal is directly converted into a Mandarin Pinyin sequence.Our language model uses the Maximum Entropy Markov Model to convert Pinyin sequences into Chinese text.Our experiments show that this algorithm has achieved 80.82%accuracy on our test set.
作者 潘粤成 刘卓 潘文豪 蔡典仑 韦政松 PAN Yuecheng;LIU Zhuo;PAN Wenhao;CAI Dianlun;WEI Zhengsong(School of Automation Science and Engineering,South China University of Technology,Guangzhou 510641,China;School of Mechanical and Automotive Engineering,South China University of Technology,Guangzhou 510641,China)
出处 《现代信息科技》 2020年第5期65-68,共4页 Modern Information Technology
基金 国家级大学生创新创业训练计划项目(201910561167)。
关键词 卷积神经网络 中文语音识别 连接时序分类 端到端系统 convolutional neural network Chinese speech recognition connectionist temporal classification end-to-end system
  • 相关文献

参考文献3

二级参考文献13

共引文献44

同被引文献42

引证文献3

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部