摘要
随着深度学习的发展,唇语识别技术在英文方面取得了长足的进步,但针对中文无论是在数据集丰富性还是识别准确率上均存在一定的落差。通过分析中文发音的视觉特点,提出"视觉拼音",意图规避中文在视觉表达上的歧义性。为了验证视觉拼音的有效性,建立了中文句子级唇语识别模型CHSLR-VP。该模型是一个端到端结构,其中以视觉拼音为媒介,将视频帧序列转换成最终的汉字语句。通过实验得出,相比于其他唇语识别方法,基于视觉拼音建立的CHSLR-VP模型性能更优,证明了视觉拼音的参与可明显提高中文唇语识别的准确率,为将来的相关工作提供了基准。
With the development of deep learning,lip reading has made great progress in English.However,there is a large gap in both the richness of dataset and the accuracy of recognition in Chinese.According to the visual characteristic of Chinese pronunciation,this paper proposes“visual pinyin”to avoid the ambiguity of Chinese visual expression.Then,in order to verify the effectiveness of visual pinyin,a Chinese sentence-level lip reading model CHSLR-VP is established.This model is an end-to-end structure,in which visual pinyin is used as a medium to convert video frames into Chinese characters.Through experiments,CHSLR-VP performs better than other prior methods,which proves that visual pinyin can significantly improve the accuracy of Chinese lip reading.It can provide a benchmark for future related work.
作者
何珊
袁家斌
陆要要
HE Shan;YUAN Jiabin;LU Yaoyao(College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China;Information Department,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)
出处
《计算机工程与应用》
CSCD
北大核心
2022年第4期157-162,共6页
Computer Engineering and Applications
基金
南京市产学研合作后补助项目计划(201722025)。