基于多模态的端到端语音识别

End-to-End Speech Recognition Based on Multimode

下载PDF

导出

摘要为了去除复杂的音频切分和强制对齐过程,并在噪音环境下充分发挥说话人发音过程中发音器官的视觉作用,本文提出了一种融合唇部特征的端到端的多模态语音识别算法。本文首先对说话人视频进行处理得到对应图像集,使用基于回归树的人脸对齐算法对图像集中发音的主要视觉部分进行特征提取,并与说话人的声学特征进行对齐融合得到新的特征,然后使用支持变长输入的端到端双向长短期记忆网络模型(DeepBiLstmCtc)对特征进行处理,输出对应的音素序列。实验结果表明该算法能有效地识别出视听觉信息中的音素序列,在噪声情况下也有一定的识别率提升。 In order to remove the complex audio segmentation and forced alignment process, and give full play to the visual effect of the speaker’s articulatory organs in the speaker’s pronunciation process in a noisy environment, this paper proposes an end-to-end multi-modal speech recognition that incorporates lip features algorithm. This paper first processes the speaker’s video to obtain the corresponding image set, uses the regression tree-based face alignment algorithm to extract the features of the main visual parts of the voice in the image set, and aligns and fuses it with the speaker’s acoustic features to obtain new features, and then uses the end-to-end bidirectional long and short-term memory network model (DeepBiLstmCtc) that supports variable-length input to process the features and output the corresponding phoneme sequence. The experimental results show that the algorithm can effectively identify the phoneme sequence in the audiovisual information, and it also has a certain improvement in the recognition rate in the case of noise.

作者谭振宇吴怡之

机构地区东华大学

出处《计算机科学与应用》 2021年第5期1315-1324,共10页 Computer Science and Application

关键词多模态端到端语音识别双向长短期记忆网络

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

参考文献6

1王海坤,潘嘉,刘聪.语音识别技术的研究进展与展望[J].电信科学,2018,34(2):1-11. 被引量：75
2赵荣刚,贺庆民.计算机人脸识别技术的应用[J].电子技术与软件工程,2018(4):137-137. 被引量：16
3徐彦君,杜利民,侯自强.面向未来的交互信息技术——听觉视觉双模态语音识别(AVSR)(下)[J].电子科技导报,1999(2):11-13. 被引量：2
4郭春霞,裘雪红.基于MFCC的说话人识别系统[J].电子科技,2005,18(11):53-56. 被引量：19
5于维生.最小残差绝对和回归模型参数的递推估计方法[J].中国管理科学,1995,3(2):49-55. 被引量：1
6梁路宏,艾海舟,徐光祐,张钹.人脸检测研究综述[J].计算机学报,2002,25(5):449-458. 被引量：353

二级参考文献65

1陈希孺.最小一乘线性回归(上)[J].数理统计与管理,1989,8(5):48-55. 被引量：84
2Craw I, Ellis H, Lishman J. Automatic extraction of face features. Pattern Recognition Letters, 1987, 5(2):183-187
3Yang G Z, Huang T S. Human face detection in a complex background. Pattern Recognition, 1994, 27(1):53-63
4Dai Y, Nakano Y. Face-texture model based on SGLD and its application in face detection in a color scene. Pattern Recognition, 1996, 29(6):1007-1017
5Kouzani A Z, He F, Sammut K. Commonsense knowledge-based face detection. In: Proc Conference on Intelligent Engineering Systems, Budapast, Hungary, 1997. 215-220
6Garcia C, Tziritas G. Face detection using quantized skin color regions merging and wavelet packet analysis. IEEE Trans Multimedia, 1999, 1(3):264-277
7Sun Q B, Huang W M, Wu J K. Face detection based on color and local symmetry information. In: Proc Conference Automatic Face and Gesture Recognition, Nara, Japan, 1998. 130-135
8Kim S H, Kim H G. Face detection using multi-modal information. In: Proc Conference on Automatic Face and Gesture Recognition, Grenoble, France, 2000. 70-76
9Govindaraju V, Srihari S N, Sher D B. A computational model for face location. In: Proc IEEE Conference on Computer Vision, Osaka, Japan, 1990. 718-721
10Lam K M. A fast approach for detecting human faces in a complex background. In: Proc Symposium on Circuits and Systems, Monterey, 1998, 4:85-88

共引文献460

1鄢丽娟,张彦虎.基于图像梯度补偿的人脸快速识别算法[J].计算机系统应用,2020,29(12):194-201. 被引量：1
2苏婕,于汝滨.基于年龄变化的人脸自动识别算法[J].计算机应用研究,2020,37(S02):383-385.
3乔丹,刘刚,杨执钧,钟韬,白雪.基于迁移学习的船舶目标识别[J].计算机应用研究,2020,37(S01):324-325. 被引量：1
4杨琳,管业鹏.基于肤色分割与Adaboost融合鲁棒人脸检测方法[J].电子器件,2007,30(5):1716-1719. 被引量：2
5刘远社,陈辉.火车客票联网预售订监控系统[J].西南民族大学学报（自然科学版）,2006,32(4):839-841.
6赵敏,舒俭.基于K-L变换的人脸识别系统[J].华东交通大学学报,2006,23(5):70-74. 被引量：3
7嵇新浩.基于NMF和LVQ神经网络的人脸识别[J].微电子学与计算机,2009,26(2):147-150. 被引量：1
8程俊红.基于级联分类器的人脸检测[J].硅谷,2008,1(10):19-19.
9赵磊,袁春婉.一种基于DiaPCA和概率神经网络的人脸识别方法[J].军事通信技术,2010,31(3):16-19. 被引量：1
10刘礼辉.基于Adaboost的快速人脸检测系统[J].科技风,2009(3):60-61.

1刘敏.不用Yes或No回答的一般疑问句[J].中小学英语教学与研究,2021(5):77-80.
2崔磊.从词乐结合谈赵元任艺术歌曲创作之法[J].当代音乐,2021(6):99-101. 被引量：1
3南润奇,张艳珠,张宏旺,丁鸿儒.人脸对齐算法研究[J].信息技术与信息化,2021(5):78-80.
4孙凤霄,孙仁诚.基于KL散度的波形对齐算法[J].信息技术与信息化,2021(5):103-105. 被引量：3
5韩筠,罗泽钦.基于卷积神经网络的食品图像识别[J].科技创新导报,2021,18(3):104-107.
6张志扬,杨海晨.讲好体育故事:《中国女排》电视纪录片的多模态话语分析[J].体育与科学,2021,42(3):82-88. 被引量：11
7刘剑.基于主观性视角的日语动词及物性研究--以「夜を明かす」句为例[J].日语学习与研究,2021(2):29-36. 被引量：1
8王克晓,周蕊,欧毅,虞豹,黄祥,王茜.基于光学与雷达遥感的山地油菜分布提取研究[J].西南大学学报（自然科学版）,2021,43(6):139-146. 被引量：2
9金航,竞霞,高媛,刘良云.遥感探测小麦条锈病严重度的GBRT模型研究[J].遥感技术与应用,2021,36(2):411-419. 被引量：1

计算机科学与应用

2021年第5期

浏览历史

内容加载中请稍等...

基于多模态的端到端语音识别

参考文献6

二级参考文献65

共引文献460

相关作者

相关机构

相关主题

浏览历史