期刊文献+

基于注意力机制语谱图特征提取的语音识别 被引量:1

Speech Recognition Based on Attention Mechanism and Spectrogram Feature Extraction
下载PDF
导出
摘要 针对连接时序分类模型需具有输出独立性的假设,对语言模型的依赖性强且训练周期长的问题,提出一种基于连接时序分类模型的语音识别方法.首先,基于传统声学模型的框架,利用先验知识训练基于注意力机制的语谱图特征提取网络,有效提高了语音特征的区分性和鲁棒性;其次,将语谱图特征提取网络拼接在连接时序分类模型的前端,并减少模型中循环神经网络层数进行重新训练.测试分析结果表明,该改进模型缩短了训练时间,有效提升了语音识别准确率. Aiming at the problem that the connected temporal classification model needed to have output independence assumption,and there was strong dependence on language model and long training period,we proposed a speech recognition method based on connected temporal classification model.Firstly,based on the framework of traditional acoustic model,spectrogram feature extraction network based on attention mechanism was trained by using prior knowledge,which effectively improved the discrimination and robustness of speech features.Secondly,the spectrogram feature extraction network was spliced in the front of the connected temporal classification model,and the number of layers of the recurrent neural network in the model was reduced for retraining.The test analysis results show that the improved model shortens the training time,and effectively improves the accuracy of speech recognition.
作者 姜囡 庞永恒 高爽 JIANG Nan;PANG Yongheng;GAO Shuang(School of Public Security Information Technology and Intelligence,Criminal Investigation Police University of China,Shenyang 110854,China;College of Information Science and Engineering,Northeastern University,Shenyang 110819,China)
出处 《吉林大学学报(理学版)》 CAS 北大核心 2024年第2期320-330,共11页 Journal of Jilin University:Science Edition
基金 教育部重点研究项目(批准号:E-AQGABQ20202710) 辽宁省自然科学基金(批准号:2019-ZD-0168) 辽宁省科技厅联合开放基金机器人学国家重点实验室开放基金(批准号:2020-KF-12-11) 中国刑事警察学院重大计划培育项目(批准号:3242019010) 公安学科基础理论研究创新计划项目(批准号:2022XKGJ0110) 证据科学教育部重点实验室(中国政法大学)开放基金(批准号:2021KFKT09)。
关键词 语音识别 CTC模型 循环神经网络 注意力机制 speech recognition CTC model recurrent neural network attention mechanism
  • 相关文献

参考文献9

二级参考文献85

  • 1胡瑞敏,薛东辉,姚天任,黄铁侠.神经网络方法及其在语音识别中的应用[J].高技术通讯,1995,5(6):11-15. 被引量:5
  • 2刘鹏,王作英.多模式语音端点检测[J].清华大学学报(自然科学版),2005,45(7):896-899. 被引量:6
  • 3李晔,张仁智,崔慧娟,唐昆.低信噪比下基于谱熵的语音端点检测算法[J].清华大学学报(自然科学版),2005,45(10):1397-1400. 被引量:37
  • 4胡刚,金振伟,司小平,郭海涛.车载导航技术现状及其发展趋势[J].系统工程,2006,24(1):41-47. 被引量:20
  • 5王博,郭英,段艳丽,等.基于倒谱特征的语音端点检测算法研究[C]//Proc of CCSP-2005,2005:212-215.
  • 6LIU C H,HUANG C C.Voice activity detector based on CAPDM architecture[J].Electronics Letters,2001,37(1):68-69.
  • 7ZHANG Jian-ping,WARD W,PELLOM B.Phone based voice acti-vity detection using online Bayesian adaptation with conjugate normal distributions[C]//Proc of International Conference on Acoustics,Speech and Signal Processing.2002:321-324.
  • 8CHANG J H,KIM N S,MITRA S K.Voice activity detection based on multiple statistical models[J].IEEE Trans on Signal Proces-sing,2006,54(6):1965-1976.
  • 9CHO Y D,KONDOZ A.Analysis and improvement of a statistical model-based voice activity detector[J].IEEE Signal Processing Letters,2001,8(10):276-278.
  • 10ATAL B S,RABINER L R.A pattern recognition approach to voice-dunvoiced-silence classification with applications to speech recognition[J].IEEE Trans on Acoustics,Speech,and Signal Proces-sing,1976,24(3):201-212.

共引文献101

同被引文献3

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部