期刊文献+

基于多模态的端到端语音识别

End-to-End Speech Recognition Based on Multimode
下载PDF
导出
摘要 为了去除复杂的音频切分和强制对齐过程,并在噪音环境下充分发挥说话人发音过程中发音器官的视觉作用,本文提出了一种融合唇部特征的端到端的多模态语音识别算法。本文首先对说话人视频进行处理得到对应图像集,使用基于回归树的人脸对齐算法对图像集中发音的主要视觉部分进行特征提取,并与说话人的声学特征进行对齐融合得到新的特征,然后使用支持变长输入的端到端双向长短期记忆网络模型(DeepBiLstmCtc)对特征进行处理,输出对应的音素序列。实验结果表明该算法能有效地识别出视听觉信息中的音素序列,在噪声情况下也有一定的识别率提升。 In order to remove the complex audio segmentation and forced alignment process, and give full play to the visual effect of the speaker’s articulatory organs in the speaker’s pronunciation process in a noisy environment, this paper proposes an end-to-end multi-modal speech recognition that incorporates lip features algorithm. This paper first processes the speaker’s video to obtain the corresponding image set, uses the regression tree-based face alignment algorithm to extract the features of the main visual parts of the voice in the image set, and aligns and fuses it with the speaker’s acoustic features to obtain new features, and then uses the end-to-end bidirectional long and short-term memory network model (DeepBiLstmCtc) that supports variable-length input to process the features and output the corresponding phoneme sequence. The experimental results show that the algorithm can effectively identify the phoneme sequence in the audiovisual information, and it also has a certain improvement in the recognition rate in the case of noise.
机构地区 东华大学
出处 《计算机科学与应用》 2021年第5期1315-1324,共10页 Computer Science and Application
  • 相关文献

参考文献6

二级参考文献65

  • 1陈希孺.最小一乘线性回归(上)[J].数理统计与管理,1989,8(5):48-55. 被引量:84
  • 2Craw I, Ellis H, Lishman J. Automatic extraction of face features. Pattern Recognition Letters, 1987, 5(2):183-187
  • 3Yang G Z, Huang T S. Human face detection in a complex background. Pattern Recognition, 1994, 27(1):53-63
  • 4Dai Y, Nakano Y. Face-texture model based on SGLD and its application in face detection in a color scene. Pattern Recognition, 1996, 29(6):1007-1017
  • 5Kouzani A Z, He F, Sammut K. Commonsense knowledge-based face detection. In: Proc Conference on Intelligent Engineering Systems, Budapast, Hungary, 1997. 215-220
  • 6Garcia C, Tziritas G. Face detection using quantized skin color regions merging and wavelet packet analysis. IEEE Trans Multimedia, 1999, 1(3):264-277
  • 7Sun Q B, Huang W M, Wu J K. Face detection based on color and local symmetry information. In: Proc Conference Automatic Face and Gesture Recognition, Nara, Japan, 1998. 130-135
  • 8Kim S H, Kim H G. Face detection using multi-modal information. In: Proc Conference on Automatic Face and Gesture Recognition, Grenoble, France, 2000. 70-76
  • 9Govindaraju V, Srihari S N, Sher D B. A computational model for face location. In: Proc IEEE Conference on Computer Vision, Osaka, Japan, 1990. 718-721
  • 10Lam K M. A fast approach for detecting human faces in a complex background. In: Proc Symposium on Circuits and Systems, Monterey, 1998, 4:85-88

共引文献460

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部