摘要
针对图像和视频中多人姿态估计存在人体边界框定位不准确、困难关键点检测精度有待提高等问题,设计了一套基于自顶向下框架的实时多人姿态估计模型。首先将深度可分离卷积加入目标检测算法中,提高人体检测器运行速度;然后基于特征金字塔网络结合上下文语义信息,采用在线难例挖掘算法解决困难关键点检测精度低的问题;最后结合空间变换网络与姿态相似度计算,剔除冗余姿态,改善边界框定位准确性。本文提出模型在2017MS COCO Test-dev数据集上的平均检测精度比Mask R-CNN模型提升了14.84%,比RMPE模型提升了2.43%,帧频达到22frame/s。
For multi-person pose estimation in images and videos,it is necessary to address the inaccurate positioning of the human-bounding box and improve the detection accuracy of hard keypoints.This paper designs a real-time multi-person pose-estimation model based on a top-down framework.First,depth-separable convolution is added to the target-detection algorithm to improve the running speed of the human detector;then,by combining the feature pyramid network with context-semantic information,the online hard-example mining algorithm is used to solve the problem of low detection accuracy at hard keypoints.Finally,combining the spatial-transformation network and pose-similarity calculation,the redundant pose is eliminated and the accuracy of the bounding-box positioning is improved.In this paper,the average detection precision of the proposed model on the 2017 MS COCO Test-dev dataset is 14.84%higher than that of the Mask R-CNN model,and 2.43%higher than that of the RMPE model.The frame frequency is 22 frame·s-1.
作者
闫芬婷
王鹏
吕志刚
丁哲
乔梦雨
Yan Fenting;Wang Peng;LüZhigang;Ding Zhe;Qiao Mengyu(School of Electronics and Information Engineering,Xi′an Technological University,Xi'an,Shaanxi 710021,China)
出处
《激光与光电子学进展》
CSCD
北大核心
2020年第2期89-96,共8页
Laser & Optoelectronics Progress
基金
国家自然科学基金(61671362)
陕西省科技厅重点研发计划(2019GY-022)。
关键词
图像处理
多人姿态估计
空间变换网络
语义信息
姿态距离
image processing
multi-person pose estimation
spatial transformer network
semantic information
pose distance