摘要
为了提高人体姿态估计的准确度和检测速度,提出了一种基于深度残差网络的多人姿态估计算法。首先使用现有的先进目标检测算法检测出人体位置,再在人体边界框内作单人姿态估计。对现有模型的残差块进行改造,降低了模型的参数量,加入多尺度监督模块和多尺度回归模块辅助训练,提高了模型的学习效率,并采用新的坐标提取方法进一步提高了模型的泛化能力和推理速度,在流行的MPII、COCO数据集上进行了训练和测试。MPII测试集上PCKh@0.5获得了92.1%的得分,2017 COCO test-dev集上mAP获得了72.4的得分,比Simple Baseline基准模型提高了2.4%。使用一张GTX1080Ti显卡对平均每帧有5个人体实例的图片进行推理时,运行速度达到26帧,推理速度极具竞争力。这充分证明了所提出的算法有效提高了人体关键点的识别精度和速度。
In order to improve the accuracy and detection speed of human pose estimation,a multiperson pose estimation algorithm based on deep residual network was proposed.Firstly,the position of human body is detected by using the existing advanced target detection algorithm,and then the single person pose estimation is done in the bounding box of human body.The residual block of the existing model is modified to reduce the number of parameters,and the multi-scale supervision module and multi-scale regression module are added to the training process to improve the learning efficiency of the model.The generalization ability and inference speed of the model are further improved by using the new coordinate extraction method.The model has been trained and tested on the popular MPII and COCO datasets.PCKh@0.5 received a score of 92.1%on the MPII test set,and 72.4 mAP on 2017 COCO test-dev dataset had an improvement of 2.4%over the benchmark model of Simple Baseline.When using a GTX1080Ti GPU for inference on an average of 5 human body instances per frame,the inference speed reached 26 frames,which is very competitive.It is proved that the proposed algorithm can effectively improve the recognition accuracy and speed of human keypoints.
作者
秦晓飞
郭海洋
陈浩胜
李夏
何致远
QIN Xiaofei;GUO Haiyang;CHEN Haosheng;LI Xia;HE Zhiyuan(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China;School of Mechanical Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
出处
《光学仪器》
2021年第2期39-47,共9页
Optical Instruments
基金
上海市人工智能专项(2019-RGZN-01077)。
关键词
多人姿态估计
编解码
多尺度监督
多尺度回归
multi-person pose estimation
encoding and decoding
multi-scale supervision
multi-scale regression