Speech Recognition via CTC-CNN Model

下载PDF

导出

摘要 In the speech recognition system,the acoustic model is an important underlying model,and its accuracy directly affects the performance of the entire system.This paper introduces the construction and training process of the acoustic model in detail and studies the Connectionist temporal classification(CTC)algorithm,which plays an important role in the end-to-end framework,established a convolutional neural network(CNN)combined with an acoustic model of Connectionist temporal classification to improve the accuracy of speech recognition.This study uses a sound sensor,ReSpeakerMic Array v2.0.1,to convert the collected speech signals into text or corresponding speech signals to improve communication and reduce noise and hardware interference.The baseline acousticmodel in this study faces challenges such as long training time,high error rate,and a certain degree of overfitting.The model is trained through continuous design and improvement of the relevant parameters of the acousticmodel,and finally the performance is selected according to the evaluation index.Excellentmodel,which reduces the error rate to about 18%,thus improving the accuracy rate.Finally,comparative verificationwas carried out from the selection of acoustic feature parameters,the selection of modeling units,and the speaker’s speech rate,which further verified the excellent performance of the CTCCNN_5+BN+Residual model structure.In terms of experiments,to train and verify the CTC-CNN baseline acoustic model,this study uses THCHS-30 and ST-CMDS speech data sets as training data sets,and after 54 epochs of training,the word error rate of the acoustic model training set is 31%,the word error rate of the test set is stable at about 43%.This experiment also considers the surrounding environmental noise.Under the noise level of 80∼90 dB,the accuracy rate is 88.18%,which is the worst performance among all levels.In contrast,at 40–60 dB,the accuracy was as high as 97.33%due to less noise pollution.

作者 Wen-Tsai Sung Hao-WeiKang Sung-Jung Hsiao

机构地区 Department of Electrical Engineering Department of Information Technology

出处《Computers, Materials & Continua》 SCIE EI 2023年第9期3833-3858,共26页 计算机、材料和连续体（英文）

基金 Supported by the Department of Electrical Engineering at National Chin-Yi University of Technology National Chin-Yi University of Technology,TakmingUniversity of Science and Technology,Taiwan,for supporting this research。

关键词 Artificial intelligence speech recognition speech to text convolutional neural network automatic speech recognition

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

Computers, Materials & Continua

2023年第9期

浏览历史

内容加载中请稍等...

Speech Recognition via CTC-CNN Model

相关作者

相关机构

相关主题

浏览历史