摘要
人体姿态估计尤其是多人姿态估计逐渐渗透到教育、体育等各个方面,精度高、轻量级的多人姿态估计更是当下的研究热点。自下而上的多人姿态估计方法的实时性较强,但是精度一般不高,网络结构也比较庞大。对于自下而上方法中最困难的一步——关键点关联问题,文中提出了一种轻量高效的姿态估计匹配网络。该网络在编码阶段将基础ResNet模块加以改进得到层结构,利用这些结构提取特征能够使得模型的参数量大幅减少;在解码阶段采用了特殊设计的反卷积结构,并在全网络添加了残差连接,这使得网络精度有很大的提升。整个算法能够将所有检测出来的关键点热图正确匹配到每一个人,得出最终的人体关键点估计。所提模型是一个轻便、高效的人体关键点匹配网络,它在COCO数据集地面真值上的mAP值高达89.7,而且参数只有8.01 M。这个结果相比目前最好的自下而上的人姿态估计方法在精度mAP值上提高了0.5,但是参数量仅为其1/10左右。所提网络利用COCO 2017和COCO 2014的地面真值分别进行了训练和验证,都取得了很高的精度,这证明其适合多种人体关键点热图的输入,并能够得到很好的效果。此外,文中针对网络模型的不同层结构设计了多种消融实验,最轻量的结构参数只有1.28兆,精度mAP值能够达到81.8。
Human pose estimation,especially multi-person pose estimation,is gradually penetrating into various aspects,such as education and sports.High-precision and lightweight multi-person pose estimation is a current research hotspot.Generally,bottom-up multi-person pose estimation method has strong real-time performance,however,its accuracy is not high and the network structure is huge.For the key point association problem,this paper proposes a few parameters and efficient pose estimation matching network.This network improves the basic ResNet module in the encoding stage to obtain the layer structure.Using these structures to extract features can greatly reduce the model’s parameter amount.In the decoding stage,a specially designed deconvolution structure is used,and residual connections are added to the entire network,which greatly improves the accuracy of the network.The whole algorithm can correctly match the heat map of key points to everyone,and obtain the final human key point estimate.The proposed model is a portable and efficient human keypoint matching network,because its mAP value on the ground truth of the COCO dataset is as high as 89.7,and the parameters are only 8.01 M.Compared with the current best bottom-up multi-person pose estimation method,the proposed model improves accuracy mAP value by 0.5 and reduces to 1/10 of the original in terms of para-meters.The proposed model uses the COCO 2017 and COCO 2014 datasets to train and verify,and achieves high accuracy.It shows that the proposed model is suitable for the input of heat maps of key points of various human bodies,and can get good results.In addition,this paper designs a variety of ablation experiments for different layer structures of the network model.The lightest structural parameter is only 1.28 M,and the accuracy mAP value can reach 81.8.
作者
杨连平
孙玉波
张红良
李封
张祥德
YANG Lian-ping;SUN Yu-bo;ZHANG Hong-liang;LI Feng;ZHANG Xiang-de(College of Sciences,Northeastern University,Shenyang 110004,China;School of Computer Science and Engineering,Northeastern University,Shenyang 110004,China)
出处
《计算机科学》
CSCD
北大核心
2020年第6期114-120,共7页
Computer Science
基金
中央高校基本科研业务费专项资金(N160504007)
国家自然科学基金联合基金项目(U1811261)。