摘要
针对卷积神经网络(CNN)语音信号建模能力不足的问题,提出了一种基于深度卷积神经网络和连接时序分类器(DCNN-CTC)的中文童声识别模型。此模型以CTC作为目标损失函数,通过在卷积神经网络的层之间引入残差跳跃连接,将前一层的输出直接传递到后一层,构建一套残差卷积层,增加了声学模型中卷积层的数量。然后,在残差结构的内部和外部分别应用了Mish和Maxout激活函数,减少网络的崩溃现象和过拟合问题,进而增强语音识别的效率。结果表明,与传统的语音识别模型CNN、DCNN和CTC相比,DCNN-CTC模型在中文儿童语音识别中的音素错误率(PER)和词错误率(WER)最低。
Aiming at the insufficient capability of convolutional neural networks(CNN)for speech signal modelling,a Chinese child voice recognition model based on deep convolutional neural networks and connected timing classifier(DCNN-CTC)is proposed.This model takes CTC as its target loss function,and increases the number of convolutional layers in the acoustic model by introducing residual jump connections between the layers of the convolutional neural network to achieve that the outputs of the former layer are directly passed to the latter layer,and a set of residual convolutional layers are constructed.Then,Mish and Maxout activation functions are applied inside and outside the residual structure,respectively,to reduce the network′s collapse phenomenon and overfitting problem,and thus enhance the efficiency of speech recognition.The results show that the DCNN-CTC model has the lowest phoneme error rate(PER)and word error rate(WER)in Chinese children′s speech recognition compared to the traditional speech recognition models CNN,DCNN and CTC.
作者
董胡
夏明霞
李垣陵
DONG Hu;XIA Mingxia;LI Yuanling(School of Information Science and Engineering,Changsha Normal University,Changsha,Hunan 410100,China)
出处
《自动化应用》
2024年第23期28-30,共3页
Automation Application
基金
教育部人文社会科学研究青年基金项目资助“基于深度学习的中文儿童语音识别声学模型及其语音能力评估研究”(22YJCZH025)
长沙市社科联哲学社会科学规划课题研究成果(2024CSSKKT153)。