摘要
为了提高二维复杂场景下多人姿态估计准确度和速度,提出了一种Mobile-YOLOv3模型与多尺度特征融合全卷积网络相结合的自顶向下多人姿态估计方法.利用深度可分离卷积改进YOLOv3网络以作为高效的人体目标检测器.针对网络特征下采样过程中上层高分辨率信息不断遗失问题,在经典U型网络结构中嵌入多尺度特征融合模块,从而使网络中的低尺度特征也包含高分辨率信息,并在特征融合模块中引入通道注意力机制,进一步突出多尺度融合特征图的关键通道信息.试验结果表明:相比于堆叠沙漏网络(Stacked Hourglass Network,SHN)和级联金字塔网络(Cascaded Pyramid Network,CPN),文中所提出的人体姿态估计算法在COCO数据集上的姿态估计平均准确率分别提高了4.7和3.7.
In order to improve the accuracy and speed of multi-person pose estimation in two-dimensional complex scenes,a top-down multi-person pose estimation method is proposed which combines the Mobile-YOLOv3 model with the multi-resolution feature fusion network.The YOLOv3 network is improved by using the depth separable convolution as an efficient human body target detector.And aiming at the problem of the continuous loss of high-resolution information in the upper layer during the process of network feature down-sampling,the multi-scale feature fusion module is embedder in the structure of classic U-shaped network,so that the low-scale features in the network can also contain high-resolution representation information,and the attention mechanism of channel domain is introduced in the feature fusion module to further highlight the key channel information of feature map after multiscale fusion.The experimental results show that compared with the stacked hourglass network(SHN)and cascaded pyramid network(CPN),the average accuracy of the proposed algorithm in the COCO data set is improved by 4.7 and 3.7 respectively.
作者
黄晨
高岩
HUANG Chen;GAO Yan(School of Software Engineering,East China Normal University,Shanghai 200333,China;School of Computer Science,East China Normal University,Shanghai 200333,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2021年第1期142-146,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61972157,61672237)资助。
关键词
多人姿态估计
深度可分离卷积
U型网络
多分辨率特征
通道域注意力
multi-person pose estimation
depth-wise separable convolution
U-net
multi-resolution features
channel domain attention