期刊文献+

基于Transformer-LSTM的闽南语唇语识别

A Research on Minnan Dialect Lip-Language Recognition Based on Transformer-LSTM
下载PDF
导出
摘要 针对端到端句子级闽南语唇语识别的问题,提出一种基于Transformer和长短时记忆网络(LSTM)的编解码模型.编码器采用时空卷积神经网络及Transformer编码器用于提取唇读序列时空特征,解码器采用长短时记忆网络并结合交叉注意力机制用于文本序列预测.最后,在自建闽南语唇语数据集上进行实验.实验结果表明:模型能有效地提高唇语识别的准确率. An Encoder-Decoder Model based on Transformer and long short term memory(LSTM)was proposed for end-to-end sentence level Minnan dialect lip recognition.The encoder used a spatiotemporal convolutional neural network and Transformer encoder to extract spatiotemporal features of lip reading sequences.The decoder used a long-term and short-term memory network combined with cross attention mechanism for text sequence prediction.Finally,experiments were conducted on the self built Minnan dialect lip language dataset,and the experimental results showed that the model can effectively improve the accuracy of lip language recognition.
作者 曾蔚 罗仙仙 王鸿伟 ZENG Wei;LUO Xianxian;WANG Hongwei(School of Mathematics and Computer Science,Quanzhou Normal University,Quanzhou Fujian 362000,China;Fujian Provincial Key Laboratory of Data Intensive Computing,Quanzhou Fujian 362000,China;Key Laboratory of Intelligent Computing and Information Processing,Quanzhou Fujian 362000,China)
出处 《泉州师范学院学报》 2024年第2期10-17,共8页 Journal of Quanzhou Normal University
基金 福建省教育厅中青年教师教育科研项目(JAT200542)。
关键词 唇语识别 闽南语 TRANSFORMER 长短时记忆网络(LSTM) 用时空卷积神经网络 注意力机制 端到端模型 lip-language Minnan language Transformer long short term memory(LSTM) using spatiotemporal convolutional neural networks attention mechanism end-to-end model

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部