摘要
针对连接时序分类模型需具有输出独立性的假设,对语言模型的依赖性强且训练周期长的问题,提出一种基于连接时序分类模型的语音识别方法.首先,基于传统声学模型的框架,利用先验知识训练基于注意力机制的语谱图特征提取网络,有效提高了语音特征的区分性和鲁棒性;其次,将语谱图特征提取网络拼接在连接时序分类模型的前端,并减少模型中循环神经网络层数进行重新训练.测试分析结果表明,该改进模型缩短了训练时间,有效提升了语音识别准确率.
Aiming at the problem that the connected temporal classification model needed to have output independence assumption,and there was strong dependence on language model and long training period,we proposed a speech recognition method based on connected temporal classification model.Firstly,based on the framework of traditional acoustic model,spectrogram feature extraction network based on attention mechanism was trained by using prior knowledge,which effectively improved the discrimination and robustness of speech features.Secondly,the spectrogram feature extraction network was spliced in the front of the connected temporal classification model,and the number of layers of the recurrent neural network in the model was reduced for retraining.The test analysis results show that the improved model shortens the training time,and effectively improves the accuracy of speech recognition.
作者
姜囡
庞永恒
高爽
JIANG Nan;PANG Yongheng;GAO Shuang(School of Public Security Information Technology and Intelligence,Criminal Investigation Police University of China,Shenyang 110854,China;College of Information Science and Engineering,Northeastern University,Shenyang 110819,China)
出处
《吉林大学学报(理学版)》
CAS
北大核心
2024年第2期320-330,共11页
Journal of Jilin University:Science Edition
基金
教育部重点研究项目(批准号:E-AQGABQ20202710)
辽宁省自然科学基金(批准号:2019-ZD-0168)
辽宁省科技厅联合开放基金机器人学国家重点实验室开放基金(批准号:2020-KF-12-11)
中国刑事警察学院重大计划培育项目(批准号:3242019010)
公安学科基础理论研究创新计划项目(批准号:2022XKGJ0110)
证据科学教育部重点实验室(中国政法大学)开放基金(批准号:2021KFKT09)。
关键词
语音识别
CTC模型
循环神经网络
注意力机制
speech recognition
CTC model
recurrent neural network
attention mechanism