期刊文献+

混合CTC/Attention模型在普通话识别中的应用

Application of Hybrid CTC/Attention Model in Mandarin Recognition
下载PDF
导出
摘要 基于链接时序分类(Connectionist Temporal Classification,CTC)的端到端语音识别模型具有结构简单且能自动对齐的优点,但识别准确率有待进一步提高。本文引入注意力机制(Attention)构成混合CTC/Attention端到端模型,采用多任务学习方式,充分发挥CTC的对齐优势和Attention机制的上下文建模优势。实验结果表明,当选取80维FBank特征和3维pitch特征作为声学特征,选择VGG-双向长短时记忆网络(VGG-Bidirectional long short-time memory,VGG-BiLSTM)作为编码器应用于中文普通话识别时,该模型与基于CTC的端到端模型相比,字错误率下降约6.1%,外接语言模型后,字错误率进一步下降0.3%;与传统基线模型相比,字错误率也有大幅度下降。 The end-to-end speech recognition model based on Connectionist Temporal Classification(CTC)has the advantages of simple structure and automatic alignment,but the recognition accuracy needs to be further improved.This paper introduces the attention mechanism to form a hybrid CTC/Attention end-to-end model.This method adopts the multi-task learning approach,combining the alignment advantage of CTC with the context modeling advantage of attention mechanism.The experimental results show that when the 80-dimensional FBank feature and the 3-dimensional pitch feature are selected as the acoustic features,and the VGG-Bidirectional long short-time memory network is selected as the encoder for Chinese Mandarin recognition,the character error rate of this hybrid model is reduced by about 6.1%compared with the end-to-end model based on CTC,after the external language model is connected,the character error rate is further reduced by 0.3%.Compared with the traditional baseline model,the character error rate also decreased significantly.
作者 许鸿奎 张子枫 卢江坤 周俊杰 胡文烨 姜彤彤 XU Hong-kui;ZHANG Zi-feng;LU Jiang-kun;ZHOU Jun-jie;HU Wen-ye;JIANG Tong-tong(School of Information and Electrical Engineering,Shandong Jianzhu University,Jinan 250101,China;Shandong Key Laboratory of Intelligent Buildings Technology,Jinan 250101,China)
出处 《计算机与现代化》 2022年第8期1-6,共6页 Computer and Modernization
基金 山东省重大科技创新工程项目(2019JZZY010120) 山东省重点研发计划项目(2019GSF111054)。
关键词 语音识别 链接时序分类 注意力机制 端到端 speech recognition connectionist temporal classification attention mechanism end-to-end
  • 相关文献

参考文献11

二级参考文献33

  • 1俞士汶,朱学锋,王惠,张芸芸.现代汉语语法信息词典规格说明书[J].中文信息学报,1996,10(2):1-22. 被引量:34
  • 2张建平.大词汇量自然连续语音识别中的语言模型和理解算法研究.博士论文[M].北京:清华大学,1999..
  • 3徐波.汉语非特定人听写机系统研究和集成.博士论文[M].北京:中国科学院自动化研究所,1997..
  • 4张建平,博士学位论文,1999年
  • 5Liu J,Chin J Electron,1998年,7卷,2期,135页
  • 6Zheng R,Chin J Electron,1998年,7卷,2期,122页
  • 7Juang B H,IEEE Signal Processing Magazine,1998年,24页
  • 8智能机研究动态,1998年,4期
  • 9计算机世界,1998年
  • 10Lee L S,IEEE Signal Processing Magazine,1997年,63页

共引文献209

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部