基于CM-Transformer的连续手语识别

Continuous Sign Language Recognition Based on CM-Transformer

导出

摘要针对捕获手语动作的全局特征和局部特征以及保留图像中原有的结构和捕获上下文联系,提出了一种改进的卷积多层感知机-自注意力(CM-Transformer)方法用于连续手语识别。CM-Transformer将卷积层的结构一致性优势与自注意力模型编码器的全局建模性能相结合,以捕获长期的序列依赖。同时将自注意力模型前馈层替换为多层感知机,以发挥其平移不变性和局部性。使用随机帧丢弃和随机梯度停止技术,减少时间和空间上的训练计算量,防止过拟合,由此构建一种高效计算的轻量级网络;最后使用连接主义时间分类解码器对输入和输出序列对齐,得到最终的识别结果。在两个大型基准数据集上的实验结果表明了所提方法的有效性。 To capture the global and local features of sign language actions and preserve the original structure and context in the image,an improved convolution multilayer perceptron Transformer(CM-Transformer)model is proposed for continuous sign language recognition.The structural consistency advantage of convolution layer and the global modeling performance of self attention model encoder are combined by CM-Transformer to capture long-term sequence dependence.Meanwhile,the feedforward layer of self attention model is replaced by multilayer perceptron to perform translation invariance and locality.In addition,random frame discarding and random gradient stopping techniques are used to reduce the training computation in time and space,and prevent over fitting.Thus,an efficient and lightweight network has been constructed.Finally the connectionist temporal classification decoder is used to align the input and output sequences to obtain the final recognition result.Experimental results on two large benchmark data sets show the effectiveness of the proposed method.

作者叶康张淑军郭淇李辉崔雪红 YE Kang;ZHANG Shujun;GUO Qi;LI Hui;CUI Xuehong(School of Information Science and Technology,Qingdao University of Science and Technology,Qingdao 266061,China)

机构地区青岛科技大学信息科学技术学院

出处《北京邮电大学学报》 EI CAS CSCD 北大核心 2022年第5期49-53,78,共6页 Journal of Beijing University of Posts and Telecommunications

关键词连续手语识别卷积神经网络自注意力模型多层感知机 continuous sign language recognition convolutional neural network self-attention model multilayer perceptron

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1王慧芳,叶睿恺,罗斌,张波,吴雪峰,刘建敏.电力领域数据驱动建模实践与思考[J].浙江电力,2022,41(10):3-10. 被引量：5
2陈佳豪,付晓峰,张佳明.基于轻量级Xception网络的人脸痛苦表情识别[J].工业控制计算机,2022,35(11):109-110. 被引量：3
3王侃,王孟洋,刘鑫,田国强,李川,刘伟.融合自注意力机制与CNN-BiGRU的事件检测[J].西安电子科技大学学报,2022,49(5):181-188. 被引量：5
4杨登杰,叶爱芬,袁舸凡,郭熔,王环.基于MobileNetV3-YOLOv4超市取货机器人目标检测策略优化设计[J].电脑知识与技术,2022,18(30):18-22. 被引量：2
5孙长迪,潘志松,张艳艳.基于低开销可变形卷积的MobileNet再轻量化方法[J].计算机科学,2022,49(12):312-318. 被引量：2
6高坤,李汪根,束阳,王志格,葛英奎.融入密集连接的多尺度轻量级人体姿态估计[J].计算机工程与应用,2022,58(24):196-204. 被引量：2
7张瑶,潘志松.GP-YOLOX:无预训练的轻量级红外目标检测模型[J].计算机技术与发展,2022,32(12):165-172. 被引量：2
8贺丹,何希平,李悦,袁锐,牛园园.基于区域分块和轻量级网络的人脸反欺骗方法[J].计算机应用,2022,42(12):3708-3714.
9于润润,姜晓燕,朱凯赢,蒋光好.基于上下文注意力机制的实时语义分割[J].电子科技,2022,35(12):57-63. 被引量：7
10许永花,王娜,刘金明.基于谱区优选的近红外光谱快速预测玉米秸秆中木质纤维素含量的研究[J].分析化学,2022,50(10):1587-1596. 被引量：2

北京邮电大学学报

2022年第5期

浏览历史

内容加载中请稍等...

基于CM-Transformer的连续手语识别

相关作者

相关机构

相关主题

浏览历史