期刊文献+

针对唇语识别的上下文相关性蒸馏方法

Context Correlation Distillation for Lip Reading
下载PDF
导出
摘要 针对唇语识别模型的性能受到数据集大小限制的问题,提出一种跨模态知识蒸馏方法C2KD.C2KD将语音识别模型的多尺度上下文相关性知识蒸馏到唇语识别模型中.首先,利用Transformer模型的自注意力模块得到上下文相关性知识;其次,使用层映射策略来决定从语音识别模型的哪一层提取知识;最后,使用自适应训练策略来根据唇语识别模型的性能动态地进行知识的传递.C2KD在数据集LRS2和LRS3上取得了优异的表现,词错误率分别比基线方法低2.0%和2.7%. A cross-modal knowledge distillation method C2KD(context correlation knowledge distillation)is proposed to address the problem that the performance of the lip reading model is limited by the size of the dataset.C2KD distills the multi-scale context correlation from the speech recognition model to the lip reading model.Firstly,the self-attention module of the Transformer model is used to obtain the context correlation knowledge.Secondly,a layer mapping strategy is used to decide which layers of the speech recognition model to extract knowledge from.Finally,an adaptive training process is used to dynamically transfer speech recognition model’s knowledge based on lip reading model’s performance.C2KD achieves comparable performance on LRS2 and LRS3 datasets,outperforming the baseline by a margin of 2.0%and 2.7%in word error rate,respectively.
作者 赵雅 冯尊磊 王慧琼 宋明黎 Zhao Ya;Feng Zunlei;Wang Huiqiong;Song Mingli(College of Computer Science and Technology,Zhejiang University,Hangzhou 310027;School of Software Technology,Zhejiang University,Hangzhou 310027;Ningbo Research Institute,Zhejiang University,Ningbo 315100;Zhejiang Lab,Hangzhou 311121)
出处 《计算机辅助设计与图形学学报》 EI CSCD 北大核心 2022年第10期1559-1566,共8页 Journal of Computer-Aided Design & Computer Graphics
基金 国家自然科学基金(61976186) 浙江省重点研发计划(2020C01023) 之江实验室重点项目(2019KD0AC01).
关键词 唇语识别 知识蒸馏 跨模态 lip reading knowledge distillation cross-modal
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部