期刊文献+

Fine-grained sequence-to-sequence lip reading based on self-attention and self-distillation 被引量:1

原文传递
导出
摘要 1 Introduction The lip reading involves converting the image sequence into the corresponding text sequence.Currently,lip reading has significant applications in many fields,such as assisted speech recognition,helping the speech impaired.Lip reading belongs to fine-grained video analysis and requires the local information and the overall spatial information of sequence.Most existing approaches capture local spatial information with CNN and temporal information with RNN generally.Considering these general methods,we propose a fine-grained method based on self-attention and self-distillation.The whole model mainly includes the CNN front-end,pixel-wise learning,temporal learning,and decoder.Specifically,we apply the CNN front-end to capture shallow spatial features inside the image sequence,and employ the Resformer module including self-attention to learn the global spatial correlation between pixels,namely,pixel-wise learning.
出处 《Frontiers of Computer Science》 SCIE EI CSCD 2023年第6期151-153,共3页 中国计算机科学前沿(英文版)
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部