期刊文献+

基于深度学习的词语级中文唇语识别

Chinese word-level lip reading based deep learning
下载PDF
导出
摘要 在无声或噪声干扰严重的环境下,或对于存在听觉障碍的人群,唇语识别至关重要。针对词语级中文唇语识别的问题,提出了SinoLipReadingNet模型,前端采用Conv3D+ResNet34结构用于时空特征提取,后端分别采用Conv1D结构和Bi-LSTM结构用于分类预测,并引入Self-Attention、CTCLoss对Bi-LSTM后端进行改进。最终在新网银行唇语识别数据集上进行实验,结果表明,SinoLipReadingNet模型在识别准确率上明显优于中科院D3D模型,多模型融合的预测准确率达到了77.64%,平均字错率为21.68%。 Lip reading is crucial in the silent environment or environments with serious noise interference,or for people with hearing impairment.For word-level Chinese lip reading problem,SinoLipReadingNet model is proposed,the front end of which with Conv3D and ResNet34 is used to extract temporal-spatial features,and the back end of which with Conv1D and Bi-LSTM are used for classification and prediction respectively.Also,self-attention and CTCLoss are added to improve the back end with Bi-LSTM.Finally,the SinoLipReadingNet model is tested on XWBank lipreading dataset and results show that the prediction accuracy is significantly better than that of D3D model,the prediction accuracy and avrage CER of multi-model fusion reaches 77.64%and 21.68%respectively.
作者 陈红顺 陈观明 Chen Hongshun;Chen Guanming(School of Information Technology,Beijing Normal University(Zhuhai),Zhuhai 519087,China;Zhuhai Orbita Aerospace Science&Technology Co.,Ltd.,Zhuhai 519080,China)
出处 《电子技术应用》 2022年第12期54-58,共5页 Application of Electronic Technique
关键词 唇语识别 ResNet Bi-LSTM CTCLoss 自注意力机制 lip reading ResNet Bi-LSTM CTCLoss self-attention
  • 相关文献

参考文献4

二级参考文献12

共引文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部