摘要
手语是听障人士和其他人之间重要的沟通方式,但许多普通人看不懂手语,导致听障人士和普通人交流障碍.随着深度学习网络在连续手语识别方向的应用,算法大模型为复杂难懂的连续手语动作翻译成通俗易懂的文本语句提供了技术基础.但是连续手语识别仍然面临着冗余帧过多,空间特征提取和时间特征提取网络不平衡,手语语序与文本语序不匹配等诸多问题.因此,研究准确率高、耗时少、场景通用性高的连续手语识别算法成为计算机视觉领域的热点问题之一.本文首先分析单模态和多模态的连续手语识别框架,并重点阐述关键帧提取、特征提取、序列学习3个模块在连续手语识别中的作用,以及模块中所用网络的优势与不足,然后总结连续手语数据集以及识别结果的评价指标,最后阐述连续手语识别算法的难点并展望其未来的发展方向.
Sign language is an important way of communication between hearing impaired people and other people,but many ordinary people cannot read sign language,which leads to communication barriers between hearing impaired people and ordinary people.With the application of deep learning networks in the direction of continuous sign language recognition,algorithmic large models provide a technical basis for the translation of complex and difficult continuous sign language actions into easy-to-understand text utterances.However,continuous sign language recognition still faces several challenges,such as excessive redundant frames,network imbalance between spatial feature extraction and temporal feature extraction,and sign language order mismatch with text order.Therefore,the research on continuous sign language recognition algorithms with high accuracy,low time consumption and high scene generality has become one of the hot issues in computer vision.This paper firstly analyses the framework of unimodal and multimodal continuous sign language recognition,and focuses on the roles of the three modules of keyframe extraction,feature extraction,and sequence learning in continuous sign language recognition,as well as the strengths and weaknesses of the network used in the module.Then,the continuous sign language dataset and the evaluation indexes of recognition results are summarized.Finally,the difficulties of continuous sign language recognition algorithms are described and future directions are presented.
作者
孟巾凯
彭健钧
肖智东
郭立
金凯
郑彤
MENG Jinkai;PENG Jianjun;XIAO Zhidong;GUO Li;JIN Kai;ZHENG Tong(School of Information Science and Engineering,Dalian Polytechnic University,Dalian 116034,China;National Centre for Computer Animation,Bournemouth University,Bournemouth BH125BB,United Kingdom;Liaoning Provincial Internet Public Opinion Monitoring Center,Shenyang 110000,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2024年第10期2428-2441,共14页
Journal of Chinese Computer Systems
基金
辽宁省教育厅科学研究经费项目(面上项目)(LJKZ0529)资助
国家留学基金项目(202008210334)资助.
关键词
关键帧提取
特征提取
序列学习
连续手语识别
综述
keyframe extraction
feature extraction
sequence learning
continuous sign language recognition
review