Fine-grained sequence-to-sequence lip reading based on self-attention and self-distillation 被引量：1

导出

摘要 1 Introduction The lip reading involves converting the image sequence into the corresponding text sequence.Currently,lip reading has significant applications in many fields,such as assisted speech recognition,helping the speech impaired.Lip reading belongs to fine-grained video analysis and requires the local information and the overall spatial information of sequence.Most existing approaches capture local spatial information with CNN and temporal information with RNN generally.Considering these general methods,we propose a fine-grained method based on self-attention and self-distillation.The whole model mainly includes the CNN front-end,pixel-wise learning,temporal learning,and decoder.Specifically,we apply the CNN front-end to capture shallow spatial features inside the image sequence,and employ the Resformer module including self-attention to learn the global spatial correlation between pixels,namely,pixel-wise learning.

作者 Junxiao XUE Shibo HUANG Huawei SONG Lei SHI

机构地区 Research Institute of Artificial Intelligence College of Life Sciences School of Cyber Science and Engineering

出处《Frontiers of Computer Science》 SCIE EI CSCD 2023年第6期151-153,共3页 中国计算机科学前沿（英文版）

关键词 DISTILLATION IMPAIRED apply

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

引证文献1

1胡宇,殷继彬.Partition-Time Masking:一种唇语识别数据增强方法[J].计算机科学,2024,51(S02):473-478.

1Kai-Di Feng,Jia-Qi Shen,Wen-Yan Liu,Zi-Xuan Zhao,Xin-Ju Li.Relationship between amygdala and mental disorders[J].Psychosomatic Medicine Research,2023,5(4):15-16.
2Yunqiang ZHU,Kai SUN,Shu WANG,Chenghu ZHOU,Feng LU,Hairong LV,Qinjun QIU,Xinbing WANG,Yanmin QI.An adaptive representation model for geoscience knowledge graphs considering complex spatiotemporal features and relationships[J].Science China Earth Sciences,2023,66(11):2563-2578.
3余泳,CHEN Shudong,TONG Da,QI Donglin,PENG Fei,ZHAO Hua.RotatS:temporal knowledge graph completion based on rotation and scaling in 3D space[J].High Technology Letters,2023,29(4):348-357.
4李武男,曹毓,宁禹,习锋杰,孙全,许晓军.Single-pixel wavefront sensing via vectorial polarization modulation[Invited][J].Chinese Optics Letters,2023,21(9):70-78. 被引量：1
5Lan Xinzhen.ALL EYES ON CIIE--CIIE promotes the construction of an open world economy,which is conducive to helping navigate trade protectionism and other challenges[J].China Report ASEAN,2023,8(12):26-27.
6Yu-Juan Zhang,Zeyu Luo,Yawen Sun,Junhao Liu,Zongqing Chen.From beasts to bytes:Revolutionizing zoological research with artificial intelligence[J].Zoological Research,2023,44(6):1115-1131. 被引量：2
7Dali Wang,Xiaochong Tong,Chenguang Dai,Congzhou Guo,Yi Lei,Chunping Qiu,He Li,Yuekun Sun.Voxel modeling and association of ubiquitous spatiotemporal information in natural language texts[J].International Journal of Digital Earth,2023,16(1):868-890.
8Liufeng Du,Shaoru Shang,Linghua Zhang,Chong Li,JianingYang,Xiyan Tian.Multidomain Correlation-Based Multidimensional CSI Tensor Generation for Device-FreeWi-Fi Sensing[J].Computer Modeling in Engineering & Sciences,2024,138(2):1749-1767.
9邓浩.中国经略上合组织:进展、战略与前景[J].China International Studies,2023(5):37-58.
10Bruno Gomes Pereira.Who Is Odete Roitman’s Killer?Speech and Power in Building Antagonist Characters[J].US-China Foreign Language,2023,21(11):431-436.

Frontiers of Computer Science

2023年第6期

浏览历史

内容加载中请稍等...

Fine-grained sequence-to-sequence lip reading based on self-attention and self-distillation 被引量：1

引证文献1

相关作者

相关机构

相关主题

浏览历史