摘要
针对尺度和视角变化导致的监控视频和图像中的人数估计性能差的问题,提出了一种基于多尺度多列卷积神经网络(MsMCNN)的密集人群计数模型。在使用MsMCNN进行特征提取之前,使用高斯滤波器对数据集进行处理得到图像的真实密度图,并且对数据集进行数据增强。MsMCNN以多列卷积神经网络的结构为主干,首先从具有多尺度的多个列中提取特征图;然后,用MsMCNN在同一列上连接具有相同分辨率的特征图,以生成图像的估计密度图;最后,对估计密度图进行积分来完成人群计数的任务。为了验证所提模型的有效性,在Shanghaitech数据集和UCFCC50数据集上进行了实验,与经典模型Crowdnet、多列卷积神经网络(MCNN)、级联多任务学习(CMTL)方法、尺度自适应卷积神经网络(SaCNN)相比,所提模型在Shanghaitech数据集PartA和UCFCC50数据集上平均绝对误差(MAE)分别至少减小了10.6和24.5,均方误差(MSE)分别至少减小了1.8和29.3;在Shanghaitech数据集PartB上也取得了较好的结果。MsMCNN更注重特征提取过程中的浅层特征的结合以及多尺度特征的结合,可以有效减少尺度和视角变化带来的精确度偏低的影响,提升人群计数的性能。
To improve the bad performance of crowd counting in surveillance videos and images caused by the scale and perspective variation, a crowd counting model, named Multi-scale Multi-column Convolutional Neural Network(MsMCNN) was proposed. Before extracting features with MsMCNN, the dataset was processed with the Gaussian filter to obtain the true density maps of images, and the data augmentation was performed. With the structure of multi-column convolutional neural network as the backbone, MsMCNN firstly extracted feature maps from multiple columns with multiple scales. Then, MsMCNN was used to generate the estimated density map by combining feature maps with the same resolution in the same column. Finally, crowd counting was realized by integrating the estimated density map. To verify the effectiveness of the proposed model, experiments were conducted on Shanghaitech and UCFCC50 datasets. Compared to the classic methods: Crowdnet, Multi-column Convolutional Neural Network(MCNN), Cascaded Multi-Task Learning(CMTL) and Scale-adaptive Convolutional Neural Network(SaCNN), the Mean Absolute Error(MAE) of MsMCNN respectively decreases 10.6 and 24.5 at least on PartA and UCFCC50 of Shanghaitech dataset, and the Mean Squared Error(MSE) of MsMCNN respectively decreases 1.8 and 29.3 at least. Furthermore, MsMCNN also achieves the better result on the PartB of the Shanghaitech dataset. MsMCNN pays more attention to the combination of shallow features and the combination of multi-scale features in the feature extraction process, which can effectively reduce the impact of low accuracy caused by scale and perspective variation, and improve the performance of crowd counting.
作者
陆金刚
张莉
LU Jingang;ZHANG Li(School of Computer Science and Technology,Soochow University,Suzhou Jiangsu 215006,China;Jiangsu Provincial Key Laboratory for Computer Information Processing Technology(Soochow University),Suzhou Jiangsu 215006,China)
出处
《计算机应用》
CSCD
北大核心
2019年第12期3445-3449,共5页
journal of Computer Applications
基金
江苏省“六大人才高峰”高层次人才项目(XYDXX-054)~~
关键词
密集人群计数
密度图
卷积神经网络
多尺度
尺度和视角变化
crowd counting
density map
Convolutional Neural Network(CNN)
multi-scale
perspective and scale variation