摘要
人机交互系统中,计算机视觉技术和深度学习算法常用来实现手势分割、特征提取和分类识别等。系统由本地终端和云端服务器组成,本地终端采集常用手势图像,利用亮度信息和方向梯度直方图(Histogram of Oriented Gradients,HOG)等多模态特征建立训练数据集;云端服务器基于Nvidia Jetson Nano B01 AI平台以迁移学习方式进行卷积神经网络(Convolutional Neural Network,CNN)训练;利用边缘计算技术将手势图像预处理和特征提取等任务放在本地完成,降低对服务器算力依赖。测试结果表明,系统平均处理和延时在2 s左右,满足普通的实时交互需求;CNN模型对45种手势的整体预测精确率为0.99;手势识别结果在本地实现图像-文本-语音转换,增强了交互的便利性和效率;用户数据在本地存储既保证了安全,也拓展了系统的应用场景。
In the human-computer interaction system,computer vision technology and deep learning algorithm are usually used to achieve gesture segmentation,feature extraction and classification recognition.The system consists of local terminal and cloud server,the local terminal collects common gesture images,and the multimodal features such as brightness information and HOG are used to build training datasets;the cloud server performs CNN training by transfer learning mode based on the Nvidia Jetson Nano B01 AI platform;edge computing technology is used to complete the tasks of gesture image preprocessing and feature extraction locally,which reduces the dependence on the server computing power.The test results show that the average processing and delay of the system is about 2 s,which meets the requirements of common real-time interaction;the overall prediction precision of the CNN model for 45 kinds of gestures is 0.99;the gesture recognition results realize image-text-speech conversion locally,which enhance the convenience and efficiency of interaction;the user's private data is stored locally,which not only ensures the security,but also expands the application scenario of the system.
作者
李晓峰
张银慧
李子阳
张文泉
LI Xiaofeng;ZHANG Yinhui;LI Ziyang;ZHANG Wenquan(School of Information and Intelligence Engineering,Tianjin Renai College,Tianjin 301636)
出处
《机械设计》
CSCD
北大核心
2024年第S02期200-204,共5页
Journal of Machine Design
基金
2023年天津市高等学校本科教学质量与教学改革研究计划项目(A231403802)
关键词
人机交互
手势识别
多模态深度学习
计算机视觉
卷积神经网络
human-computer interaction
gesture recognition
multimodal deep learning
computer vision
Convolutional Neural Network