摘要
为了解决单模态数据所提供的特征信息缺乏而导致的识别准确率难以提高、模型鲁棒性较低等问题,提出了面向人机交互的加工作业多模态数据融合动态手势识别策略。首先,采用C3D网络模型并在视频的空间维度和时间维度对深度图像和彩色图像两种模态数据进行特征提取;其次,将两种模态数据识别结果在决策层按最大值规则进行融合,同时,将原模型使用的Relu激活函数替换为Mish激活函数优化梯度特性;最后,通过3组对比实验得到6种动态手势的平均识别准确率为96.8%。结果表明:所提方法实现了加工作业中动态手势识别的高准确率和高鲁棒性的目标,对人机交互技术在实际生产场景中的应用起到推动作用。
In order to solve the problem of difficulty in improving the recognition accuray and the low robustness of the model caused by the lack of feature information provided by single mode data,a dynamic gesture recognition strategy based on multi-modal data fusion of machining operations for human-computer interaction was proposed.Firstly,the C3D network model was used to extract features from the depth image and color image modal data based on the spatial and temporal dimensions of videos.Secondly,the recognition results of the two modal data were fused according to the maximum principle at the decision-making level.Meanwhile,the Relu activation function used in the original model was replaced by Mish activation function to optimize the gradient update effect.Finally,through three sets of comparative experiments,it was found that the average recognition accuracy of six dynamic gestures reached 96.8%.The results showed that the proposed method achieved the goal of high accuracy and high robustness of dynamic gesture recognition in machining operation,which would play a role in promoting the application of human-computer interaction technology in actual production scenes.
作者
张富强
曾夏
白筠妍
丁凯
ZHANG Fuqiang;ZENG Xia;BAI Junyan;DING Kai(Key Laboratory of Road Construction Technology and Equipment of MOE,Chang'an University,Xi'an 710064,China;Institute of Smart Manufacturing Systems,Chang'an University,Xi'an 710064,China)
出处
《郑州大学学报(工学版)》
CAS
北大核心
2024年第5期30-36,共7页
Journal of Zhengzhou University(Engineering Science)
基金
国家重点研发计划项目(2021YFB3301702)
陕西省科技重大专项(2018zdzx01-01-01)。
关键词
多模态数据融合
加工作业
动态手势识别
C3D
Mish激活函数
人机交互
multi-modal data fusion
machining operation
dynamic gesture recognition
C3D
Mish activation function
human-computer interaction