期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Audiovisual speech recognition based on a deep convolutional neural network
1
作者 Shashidhar Rudregowda Sudarshan Patilkulkarni +2 位作者 Vinayakumar Ravi gururaj h.l. Moez Krichen 《Data Science and Management》 2024年第1期25-34,共10页
Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for India... Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for Indian English linguistics and categorized it into three main categories:(1)audio recognition,(2)visual feature extraction,and(3)combined audio and visual recognition.Audio features were extracted using the mel-frequency cepstral coefficient,and classification was performed using a one-dimension convolutional neural network.Visual feature extraction uses Dlib and then classifies visual speech using a long short-term memory type of recurrent neural networks.Finally,integration was performed using a deep convolutional network.The audio speech of Indian English was successfully recognized with accuracies of 93.67%and 91.53%,respectively,using testing data from 200 epochs.The training accuracy for visual speech recognition using the Indian English dataset was 77.48%and the test accuracy was 76.19%using 60 epochs.After integration,the accuracies of audiovisual speech recognition using the Indian English dataset for training and testing were 94.67%and 91.75%,respectively. 展开更多
关键词 Audiovisual speech recognition Custom dataset 1D Convolution neural network(CNN) Deep CNN(DCNN) Long short-term memory(LSTM) LIPREADING Dlib Mel-frequency cepstral coefficient(MFCC)
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部