In a crowd density estimation dataset,the annotation of crowd locations is an extremely laborious task,and they are not taken into the evaluation metrics.In this paper,we aim to reduce the annotation cost of crowd dat...In a crowd density estimation dataset,the annotation of crowd locations is an extremely laborious task,and they are not taken into the evaluation metrics.In this paper,we aim to reduce the annotation cost of crowd datasets,and propose a crowd density estimation method based on weakly-supervised learning,in the absence of crowd position supervision information,which directly reduces the number of crowds by using the number of pedestrians in the image as the supervised information.For this purpose,we design a new training method,which exploits the correlation between global and local image features by incremental learning to train the network.Specifically,we design a parent-child network(PC-Net)focusing on the global and local image respectively,and propose a linear feature calibration structure to train the PC-Net simultaneously,and the child network learns feature transfer factors and feature bias weights,and uses the transfer factors and bias weights to linearly feature calibrate the features extracted from the Parent network,to improve the convergence of the network by using local features hidden in the crowd images.In addition,we use the pyramid vision transformer as the backbone of the PC-Net to extract crowd features at different levels,and design a global-local feature loss function(L2).We combine it with a crowd counting loss(LC)to enhance the sensitivity of the network to crowd features during the training process,which effectively improves the accuracy of crowd density estimation.The experimental results show that the PC-Net significantly reduces the gap between fullysupervised and weakly-supervised crowd density estimation,and outperforms the comparison methods on five datasets of Shanghai Tech Part A,ShanghaiTech Part B,UCF_CC_50,UCF_QNRF and JHU-CROWD++.展开更多
Crowd density is an important factor of crowd stability.Previous crowd density estimation methods are highly dependent on the specific video scene.This paper presented a video scene invariant crowd density estimation ...Crowd density is an important factor of crowd stability.Previous crowd density estimation methods are highly dependent on the specific video scene.This paper presented a video scene invariant crowd density estimation method using Geographic Information Systems(GIS) to monitor crowd size for large areas.The proposed method mapped crowd images to GIS.Then we can estimate crowd density for each camera in GIS using an estimation model obtained by one camera.Test results show that one model obtained by one camera in GIS can be adaptively applied to other cameras in outdoor video scenes.A real-time monitoring system for crowd size in large areas based on scene invariant model has been successfully used in 'Jiangsu Qinhuai Lantern Festival,2012'.It can provide early warning information and scientific basis for safety and security decision making.展开更多
Crowd density estimation in wide areas is a challenging problem for visual surveillance. Because of the high risk of degeneration, the safety of public events involving large crowds has always been a major concern. In...Crowd density estimation in wide areas is a challenging problem for visual surveillance. Because of the high risk of degeneration, the safety of public events involving large crowds has always been a major concern. In this paper, we propose a video-based crowd density analysis and prediction system for wide-area surveillance applications. In monocular image sequences, the Accumulated Mosaic Image Difference (AMID) method is applied to extract crowd areas having irregular motion. The specific number of persons and velocity of a crowd can be adequately estimated by our system from the density of crowded areas. Using a multi-camera network, we can obtain predictions of a crowd's density several minutes in advance. The system has been used in real applications, and numerous experiments conducted in real scenes (station, park, plaza) demonstrate the effectiveness and robustness of the proposed method.展开更多
基金the Humanities and Social Science Fund of the Ministry of Education of China(21YJAZH077)。
文摘In a crowd density estimation dataset,the annotation of crowd locations is an extremely laborious task,and they are not taken into the evaluation metrics.In this paper,we aim to reduce the annotation cost of crowd datasets,and propose a crowd density estimation method based on weakly-supervised learning,in the absence of crowd position supervision information,which directly reduces the number of crowds by using the number of pedestrians in the image as the supervised information.For this purpose,we design a new training method,which exploits the correlation between global and local image features by incremental learning to train the network.Specifically,we design a parent-child network(PC-Net)focusing on the global and local image respectively,and propose a linear feature calibration structure to train the PC-Net simultaneously,and the child network learns feature transfer factors and feature bias weights,and uses the transfer factors and bias weights to linearly feature calibrate the features extracted from the Parent network,to improve the convergence of the network by using local features hidden in the crowd images.In addition,we use the pyramid vision transformer as the backbone of the PC-Net to extract crowd features at different levels,and design a global-local feature loss function(L2).We combine it with a crowd counting loss(LC)to enhance the sensitivity of the network to crowd features during the training process,which effectively improves the accuracy of crowd density estimation.The experimental results show that the PC-Net significantly reduces the gap between fullysupervised and weakly-supervised crowd density estimation,and outperforms the comparison methods on five datasets of Shanghai Tech Part A,ShanghaiTech Part B,UCF_CC_50,UCF_QNRF and JHU-CROWD++.
基金The authors would like to thank the reviewers for their detailed reviews and constructive comments. We are also grateful for Sophie Song's help on the improving English. This work was supported in part by the ‘Fivetwelfh' National Science and Technology Support Program of the Ministry of Science and Technology of China (No. 2012BAH35B02), the National Natural Science Foundation of China (NSFC) (No. 41401107, No. 41201402, and No. 41201417).
文摘Crowd density is an important factor of crowd stability.Previous crowd density estimation methods are highly dependent on the specific video scene.This paper presented a video scene invariant crowd density estimation method using Geographic Information Systems(GIS) to monitor crowd size for large areas.The proposed method mapped crowd images to GIS.Then we can estimate crowd density for each camera in GIS using an estimation model obtained by one camera.Test results show that one model obtained by one camera in GIS can be adaptively applied to other cameras in outdoor video scenes.A real-time monitoring system for crowd size in large areas based on scene invariant model has been successfully used in 'Jiangsu Qinhuai Lantern Festival,2012'.It can provide early warning information and scientific basis for safety and security decision making.
基金supported by the National Natural Science Foundation of China under Grant No. 61175007the National Key Technologies R&D Program under Grant No. 2012BAH07B01the National Key Basic Research Program of China (973 Program) under Grant No. 2012CB316302
文摘Crowd density estimation in wide areas is a challenging problem for visual surveillance. Because of the high risk of degeneration, the safety of public events involving large crowds has always been a major concern. In this paper, we propose a video-based crowd density analysis and prediction system for wide-area surveillance applications. In monocular image sequences, the Accumulated Mosaic Image Difference (AMID) method is applied to extract crowd areas having irregular motion. The specific number of persons and velocity of a crowd can be adequately estimated by our system from the density of crowded areas. Using a multi-camera network, we can obtain predictions of a crowd's density several minutes in advance. The system has been used in real applications, and numerous experiments conducted in real scenes (station, park, plaza) demonstrate the effectiveness and robustness of the proposed method.