密集场景下准确人群计数和定位,对于保障公共安全具有重要的意义。针对密集人群计数与定位易受人群分布不均、背景干扰等因素的影响,导致计数定位不准确的问题,提出一种基于区域感知校准的自适应人群计数与定位方法。通过构建金字塔结...密集场景下准确人群计数和定位,对于保障公共安全具有重要的意义。针对密集人群计数与定位易受人群分布不均、背景干扰等因素的影响,导致计数定位不准确的问题,提出一种基于区域感知校准的自适应人群计数与定位方法。通过构建金字塔结构提取人群图像的多尺度特征,增强特征关联性,并设计可变形几何自适应模块学习不同分布的人群几何特征,以增强对人群分布不均的适应性。在此基础上,提出区域感知和区域校准模块,提取全局上下文特征和区域特征,克服了背景干扰造成的定位与计数不准问题。接着通过双分支卷积预测通路,输出生成点的预测位置和置信度分数,以提高网络的定位与计数精度。最后提出改进二分图最大匹配Hopcroft-Karp算法对真值点与预测点进行匹配校准,从而完成人群定位与计数。实验结果表明,所提方法分别在公开的ShanghaiTech Part A和Part B数据集、NWPU-Crowd数据集、UCF-QNRF数据集上评价指标均优于对比算法,且定位精度较P2Pnet分别提高了3.5%、6.1%、11.3%和8.1%,能够有效提高人群定位与计数的准确度。展开更多
Most existing studies on crowd analysis are limited to the level of counting,which cannot provide the exact location of individuals.This paper proposes a self-attention guidance based crowd localization and counting n...Most existing studies on crowd analysis are limited to the level of counting,which cannot provide the exact location of individuals.This paper proposes a self-attention guidance based crowd localization and counting network(SA-CLCN),which can simultaneously locate and count crowds.We take the form of object detection,using the original point annotations of crowd datasets as supervision to train the network.Ultimately,the center point coordinate of each head as well as the number of crowds are predicted.Specifically,to cope with the spatial and positional variations of the crowd,the proposed method introduces transformer to construct a globallocal feature extractor(GLFE)together with the convolutional structure.It establishes the near-to-far dependency between elements so that the global context and local detail features of the crowd image can be extracted simultaneously.Then,this paper designs a pyramid feature fusion module(PFFM)to fuse the global and local information from high level to low level to obtain a multiscale feature representation.In downstream tasks,this paper predicts candidate point offsets and confidence scores by a simple regression header and classification header.In addition,the Hungarian algorithm is used to match the predicted point set and the labelled point set to facilitate the calculation of losses.The proposed network avoids the errors or higher costs associated with using traditional density maps or bounding box annotations.Importantly,we have conducted extensive experiments on several crowd datasets,and the proposed method has produced competitive results in both counting and localization.展开更多
文摘密集场景下准确人群计数和定位,对于保障公共安全具有重要的意义。针对密集人群计数与定位易受人群分布不均、背景干扰等因素的影响,导致计数定位不准确的问题,提出一种基于区域感知校准的自适应人群计数与定位方法。通过构建金字塔结构提取人群图像的多尺度特征,增强特征关联性,并设计可变形几何自适应模块学习不同分布的人群几何特征,以增强对人群分布不均的适应性。在此基础上,提出区域感知和区域校准模块,提取全局上下文特征和区域特征,克服了背景干扰造成的定位与计数不准问题。接着通过双分支卷积预测通路,输出生成点的预测位置和置信度分数,以提高网络的定位与计数精度。最后提出改进二分图最大匹配Hopcroft-Karp算法对真值点与预测点进行匹配校准,从而完成人群定位与计数。实验结果表明,所提方法分别在公开的ShanghaiTech Part A和Part B数据集、NWPU-Crowd数据集、UCF-QNRF数据集上评价指标均优于对比算法,且定位精度较P2Pnet分别提高了3.5%、6.1%、11.3%和8.1%,能够有效提高人群定位与计数的准确度。
基金supported by National Natural Science Foundation of China(No.62072394)Natural Science Foundation of Hebei Province,China(No.F2021203019)Hebei Key Laboratory Project,China(No.202250701010046).
文摘Most existing studies on crowd analysis are limited to the level of counting,which cannot provide the exact location of individuals.This paper proposes a self-attention guidance based crowd localization and counting network(SA-CLCN),which can simultaneously locate and count crowds.We take the form of object detection,using the original point annotations of crowd datasets as supervision to train the network.Ultimately,the center point coordinate of each head as well as the number of crowds are predicted.Specifically,to cope with the spatial and positional variations of the crowd,the proposed method introduces transformer to construct a globallocal feature extractor(GLFE)together with the convolutional structure.It establishes the near-to-far dependency between elements so that the global context and local detail features of the crowd image can be extracted simultaneously.Then,this paper designs a pyramid feature fusion module(PFFM)to fuse the global and local information from high level to low level to obtain a multiscale feature representation.In downstream tasks,this paper predicts candidate point offsets and confidence scores by a simple regression header and classification header.In addition,the Hungarian algorithm is used to match the predicted point set and the labelled point set to facilitate the calculation of losses.The proposed network avoids the errors or higher costs associated with using traditional density maps or bounding box annotations.Importantly,we have conducted extensive experiments on several crowd datasets,and the proposed method has produced competitive results in both counting and localization.