Crowd counting is a promising hotspot of computer vision involving crowd intelligence analysis,achieving tremendous success recently with the development of deep learning.However,there have been stillmany challenges i...Crowd counting is a promising hotspot of computer vision involving crowd intelligence analysis,achieving tremendous success recently with the development of deep learning.However,there have been stillmany challenges including crowd multi-scale variations and high network complexity,etc.To tackle these issues,a lightweight Resconnection multi-branch network(LRMBNet)for highly accurate crowd counting and localization is proposed.Specifically,using improved ShuffleNet V2 as the backbone,a lightweight shallow extractor has been designed by employing the channel compression mechanism to reduce enormously the number of network parameters.A light multi-branch structure with different expansion rate convolutions is demonstrated to extract multi-scale features and enlarged receptive fields,where the information transmission and fusion of diverse scale features is enhanced via residual concatenation.In addition,a compound loss function is introduced for training themethod to improve global context information correlation.The proposed method is evaluated on the SHHA,SHHB,UCF-QNRF and UCF_CC_50 public datasets.The accuracy is better than those of many advanced approaches,while the number of parameters is smaller.The experimental results show that the proposed method achieves a good tradeoff between the complexity and accuracy of crowd counting,indicating a lightweight and high-precision method for crowd counting.展开更多
In this paper, a deep learning-based method is proposed for crowdcountingproblems. Specifically, by utilizing the convolution kernel densitymap, the ground truth is generated dynamically to enhance the featureextracti...In this paper, a deep learning-based method is proposed for crowdcountingproblems. Specifically, by utilizing the convolution kernel densitymap, the ground truth is generated dynamically to enhance the featureextractingability of the generator model. Meanwhile, the “cross stage partial”module is integrated into congested scene recognition network (CSRNet) toobtain a lightweight network model. In addition, to compensate for the accuracydrop owing to the lightweight model, we take advantage of “structuredknowledge transfer” to train the model in an end-to-end manner. It aimsto accelerate the fitting speed and enhance the learning ability of the studentmodel. The crowd-counting system solution for edge computing is alsoproposed and implemented on an embedded device equipped with a neuralprocessing unit. Simulations demonstrate the performance improvement ofthe proposed solution in terms of model size, processing speed and accuracy.The performance on the Venice dataset shows that the mean absolute error(MAE) and the root mean squared error (RMSE) of our model drop by32.63% and 39.18% compared with CSRNet. Meanwhile, the performance onthe ShanghaiTech PartB dataset reveals that the MAE and the RMSE of ourmodel are close to those of CSRNet. Therefore, we provide a novel embeddedplatform system scheme for public safety pre-warning applications.展开更多
The analysis of overcrowded areas is essential for flow monitoring,assembly control,and security.Crowd counting’s primary goal is to calculate the population in a given region,which requires real-time analysis of con...The analysis of overcrowded areas is essential for flow monitoring,assembly control,and security.Crowd counting’s primary goal is to calculate the population in a given region,which requires real-time analysis of congested scenes for prompt reactionary actions.The crowd is always unexpected,and the benchmarked available datasets have a lot of variation,which limits the trained models’performance on unseen test data.In this paper,we proposed an end-to-end deep neural network that takes an input image and generates a density map of a crowd scene.The proposed model consists of encoder and decoder networks comprising batch-free normalization layers known as evolving normalization(EvoNorm).This allows our network to be generalized for unseen data because EvoNorm is not using statistics from the training samples.The decoder network uses dilated 2D convolutional layers to provide large receptive fields and fewer parameters,which enables real-time processing and solves the density drift problem due to its large receptive field.Five benchmark datasets are used in this study to assess the proposed model,resulting in the conclusion that it outperforms conventional models.展开更多
With the emergence of the COVID19 virus in late 2019 and the declaration that the virus is a worldwide pandemic,health organizations and governments have begun to implement severe health precautions to reduce the spre...With the emergence of the COVID19 virus in late 2019 and the declaration that the virus is a worldwide pandemic,health organizations and governments have begun to implement severe health precautions to reduce the spread of the virus and preserve human lives.The enforcement of social distancing at work environments and public areas is one of these obligatory precautions.Crowd management is one of the effective measures for social distancing.By reducing the social contacts of individuals,the spread of the disease will be immensely reduced.In this paper,a model for crowd counting in public places of high and low densities is proposed.The model works under various scene conditions and with no prior knowledge.A Deep CNN model(DCNN)is built based on convolutional neural network(CNN)structure with small kernel size and two fronts.To increase the efficiency of the model,a convolutional neural network(CNN)as the front-end and a multi-column layer with Dilated Convolution as the back-end were chosen.Also,the proposed method accepts images of arbitrary sizes/scales as inputs from different cameras.To evaluate the proposed model,a dataset was created from images of Saudi people with traditional and non-traditional Saudi outfits.The model was also trained and tested on some existing datasets.Compared to current counting methods,the results show that the proposed model has significantly improved efficiency and reduced the error rate.We achieve the lowest MAE by 67%,32%.and 15.63%and lowest MSE by around 47%,15%and 8.1%than M-CNN,Cascaded-MTL,and CSRNet respectively.展开更多
Estimating the crowd count and density of highly dense scenes witnessed in Muslim gatherings at religious sites in Makkah and Madinah is critical for developing control strategies and organizing such a large gathering...Estimating the crowd count and density of highly dense scenes witnessed in Muslim gatherings at religious sites in Makkah and Madinah is critical for developing control strategies and organizing such a large gathering.Moreover,since the crowd images in this case can range from low density to high density,detection-based approaches are hard to apply for crowd counting.Recently,deep learning-based regression has become the prominent approach for crowd counting problems,where a density-map is estimated,and its integral is further computed to acquire the final count result.In this paper,we put forward a novel multi-scale network(named 2U-Net)for crowd counting in sparse and dense scenarios.The proposed framework,which employs the U-Net architecture,is straightforward to implement,computationally efficient,and has single-step training.Unpooling layers are used to retrieve the pooling layers’erased information and learn hierarchically pixelwise spatial representation.This helps in obtaining feature values,retaining spatial locations,and maximizing data integrity to avoid data loss.In addition,a modified attention unit is introduced and integrated into the proposed 2UNet model to focus on specific crowd areas.The proposed model concentrates on balancing the number of model parameters,model size,computational cost,and counting accuracy compared with other works,which may involve acquiring one criterion at the expense of other constraints.Experiments on five challenging datasets for density estimation and crowd counting have shown that the proposed model is very effective and outperforms comparable mainstream models.Moreover,it counts very well in both sparse and congested crowd scenes.The 2U-Net model has the lowest MAE in both parts(Part A and Part B)of the ShanghaiTech,UCSD,and Mall benchmarks,with 63.3,7.4,1.5,and 1.6,respectively.Furthermore,it obtains the lowest MSE in the ShanghaiTech-Part B,UCSD,and Mall benchmarks with 12.0,1.9,and 2.1,respectively.展开更多
Since the outbreak of the world-wide novel coronavirus pandemic,crowd counting in public areas,such as in shopping centers and in commercial streets,has gained popularity among public health administrations for preven...Since the outbreak of the world-wide novel coronavirus pandemic,crowd counting in public areas,such as in shopping centers and in commercial streets,has gained popularity among public health administrations for preventing the crowds from gathering.In this paper,we propose a novel adaptive method for crowd counting based on Wi-Fi channel state information(CSI)by using common commercial wireless routers.Compared with previous researches on device-free crowd counting,our proposed method is more adaptive to the change of environ-ment and can achieve high accuracy of crowd count estimation.Because the dis-tance between access point(AP)and monitor point(MP)is typically non-fixed in real-world applications,the strength of received signals varies and makes the tra-ditional amplitude-related models to perform poorly in different environments.In order to achieve adaptivity of the crowd count estimation model,we used convo-lutional neural network(ConvNet)to extract features from correlation coefficient matrix of subcarriers which are insensitive to the change of received signal strength.We conducted experiments in university classroom settings and our model achieved an overall accuracy of 97.79%in estimating a variable number of participants.展开更多
With the rapid progress of deep convolutional neural networks,several applications of crowd counting have been proposed and explored in the literature.In congested scene monitoring,a variety of crowd density estimatin...With the rapid progress of deep convolutional neural networks,several applications of crowd counting have been proposed and explored in the literature.In congested scene monitoring,a variety of crowd density estimating approaches has been developed.The understanding of highly congested scenes for crowd counting during Muslim gatherings of Hajj and Umrah is a challenging task,as a large number of individuals stand nearby and,it is hard for detection techniques to recognize them,as the crowd can vary from low density to high density.To deal with such highly congested scenes,we have proposed the Congested Scene Crowd Counting Network(CSCC-Net)using VGG-16 as a core network with its first ten layers due to its strong and robust transfer learning rate.A hole dilated convolutional neural network is used at the back end to widen the relevant field to extract a large range of information from the image without losing its original resolution.The dilated convolution neural network is mainly chosen to expand the kernel size without changing other parameters.Moreover,several loss functions have been applied to strengthen the evaluation accuracy of the model.Finally,the entire experiments have been evaluated using prominent data sets namely,ShanghaiTech parts A,B,UCF_CC_50,and UCF_QNRF.Our model has achieved remarkable results i.e.,68.0 and 9.0 MAE on ShanghaiTech parts A,B,199.1 MAE on UCF_CC_50,and 99.8 on UCF_QNRF data sets respectively.展开更多
With the popularity and development of indoor WiFi equipment, they have more sensing capability and can be used as a human monitoring device. We can collect the channel state information (CSI) from WiFi device and acq...With the popularity and development of indoor WiFi equipment, they have more sensing capability and can be used as a human monitoring device. We can collect the channel state information (CSI) from WiFi device and acquire the human state based on the measurements. These studies have attracted wide attention and become a hot research topic. This paper concentrated on the crowd counting based on CSI and transfer learning. We utilized the CSI signal fluctuations caused by human motion in WiFi coverage to identify the person count because different person counts would lead to unique signal propagation characteristics. First, this paper presented recent studies of crowd counting based on CSI. Then, we introduced the basic concept of CSI, and described the fundamental principle of CSI-based crowd counting. We also presented the system framework, experiment scenario, and neural network structure transferred from the ResNet. Next, we presented the experiment results and compared the accuracy using different neural network models. The system achieved recognition accuracy of this 100 percent for seven participants using the transfer learning technique. Finally, we concluded the paper by discussing the current problems and future work.展开更多
Crowd counting is a challenging task in computer vision as realistic scenes are al?ways filled with unfavourable factors such as severe occlusions, perspective distortions and di?verse distributions. Recent state-of-t...Crowd counting is a challenging task in computer vision as realistic scenes are al?ways filled with unfavourable factors such as severe occlusions, perspective distortions and di?verse distributions. Recent state-of-the-art methods based on convolutional neural network (CNN) weaken these factors via multi-scale feature fusion or optimal feature selection through a front switch-net. L2 regression is used to regress the density map of the crowd, which is known to lead to an average and blurry result, and affects the accuracy of crowd count and po?sition distribution. To tackle these problems, we take full advantage of the application of gen?erative adversarial networks (GANs) in image generation and propose a novel crowd counting model based on conditional GANs to predict high-quality density maps from crowd images. Furthermore, we innovatively put forward a new regularizer so as to help boost the accuracy of processing extremely crowded scenes. Extensive experiments on four major crowd counting datasets are conducted to demonstrate the better performance of the proposed approach com?pared with recent state-of-the-art methods.展开更多
Estimation of crowd count is becoming crucial nowadays,as it can help in security surveillance,crowd monitoring,and management for different events.It is challenging to determine the approximate crowd size from an ima...Estimation of crowd count is becoming crucial nowadays,as it can help in security surveillance,crowd monitoring,and management for different events.It is challenging to determine the approximate crowd size from an image of the crowd’s density.Therefore in this research study,we proposed a multi-headed convolutional neural network architecture-based model for crowd counting,where we divided our proposed model into two main components:(i)the convolutional neural network,which extracts the feature across the whole image that is given to it as an input,and(ii)the multi-headed layers,which make it easier to evaluate density maps to estimate the number of people in the input image and determine their number in the crowd.We employed the available public benchmark crowd-counting datasets UCF CC 50 and ShanghaiTech parts A and B for model training and testing to validate the model’s performance.To analyze the results,we used two metrics Mean Absolute Error(MAE)and Mean Square Error(MSE),and compared the results of the proposed systems with the state-of-art models of crowd counting.The results show the superiority of the proposed system.展开更多
Crowd counting is recently becoming a hot research topic, which aims to count the number of the people in different crowded scenes. Existing methods are mainly based on training-testing pattern and rely on large data ...Crowd counting is recently becoming a hot research topic, which aims to count the number of the people in different crowded scenes. Existing methods are mainly based on training-testing pattern and rely on large data training, which fails to accurately count the crowd in real-world scenes because of the limitation of model’s generalization capability. To alleviate this issue, a scene-adaptive crowd counting method based on meta-learning with Dual-illumination Merging Network (DMNet) is proposed in this paper. The proposed method based on learning-to-learn and few-shot learning is able to adapt different scenes which only contain a few labeled images. To generate high quality density map and count the crowd in low-lighting scene, the DMNet is proposed, which contains Multi-scale Feature Extraction module and Element-wise Fusion Module. The Multi-scale Feature Extraction module is used to extract the image feature by multi-scale convolutions, which helps to improve network accuracy. The Element-wise Fusion module fuses the low-lighting feature and illumination-enhanced feature, which supplements the missing illumination in low-lighting environments. Experimental results on benchmarks, WorldExpo’10, DISCO, USCD, and Mall, show that the proposed method outperforms the existing state-of-the-art methods in accuracy and gets satisfied results.展开更多
Crowd counting provides an important foundation for public security and urban management.Due to the existence of small targets and large density variations in crowd images,crowd counting is a challenging task.Mainstre...Crowd counting provides an important foundation for public security and urban management.Due to the existence of small targets and large density variations in crowd images,crowd counting is a challenging task.Mainstream methods usually apply convolution neural networks(CNNs)to regress a density map,which requires annotations of individual persons and counts.Weakly-supervised methods can avoid detailed labeling and only require counts as annotations of images,but existing methods fail to achieve satisfactory performance because a global perspective field and multi-level information are usually ignored.We propose a weakly-supervised method,DTCC,which effectively combines multi-level dilated convolution and transformer methods to realize end-to-end crowd counting.Its main components include a recursive swin transformer and a multi-level dilated convolution regression head.The recursive swin transformer combines a pyramid visual transformer with a fine-tuned recursive pyramid structure to capture deep multi-level crowd features,including global features.The multi-level dilated convolution regression head includes multi-level dilated convolution and a linear regression head for the feature extraction module.This module can capture both low-and high-level features simultaneously to enhance the receptive field.In addition,two regression head fusion mechanisms realize dynamic and mean fusion counting.Experiments on four well-known benchmark crowd counting datasets(UCF_CC_50,ShanghaiTech,UCF_QNRF,and JHU-Crowd++)show that DTCC achieves results superior to other weakly-supervised methods and comparable to fully-supervised methods.展开更多
基金Double First-Class Innovation Research Project for People’s Public Security University of China(2023SYL08).
文摘Crowd counting is a promising hotspot of computer vision involving crowd intelligence analysis,achieving tremendous success recently with the development of deep learning.However,there have been stillmany challenges including crowd multi-scale variations and high network complexity,etc.To tackle these issues,a lightweight Resconnection multi-branch network(LRMBNet)for highly accurate crowd counting and localization is proposed.Specifically,using improved ShuffleNet V2 as the backbone,a lightweight shallow extractor has been designed by employing the channel compression mechanism to reduce enormously the number of network parameters.A light multi-branch structure with different expansion rate convolutions is demonstrated to extract multi-scale features and enlarged receptive fields,where the information transmission and fusion of diverse scale features is enhanced via residual concatenation.In addition,a compound loss function is introduced for training themethod to improve global context information correlation.The proposed method is evaluated on the SHHA,SHHB,UCF-QNRF and UCF_CC_50 public datasets.The accuracy is better than those of many advanced approaches,while the number of parameters is smaller.The experimental results show that the proposed method achieves a good tradeoff between the complexity and accuracy of crowd counting,indicating a lightweight and high-precision method for crowd counting.
文摘In this paper, a deep learning-based method is proposed for crowdcountingproblems. Specifically, by utilizing the convolution kernel densitymap, the ground truth is generated dynamically to enhance the featureextractingability of the generator model. Meanwhile, the “cross stage partial”module is integrated into congested scene recognition network (CSRNet) toobtain a lightweight network model. In addition, to compensate for the accuracydrop owing to the lightweight model, we take advantage of “structuredknowledge transfer” to train the model in an end-to-end manner. It aimsto accelerate the fitting speed and enhance the learning ability of the studentmodel. The crowd-counting system solution for edge computing is alsoproposed and implemented on an embedded device equipped with a neuralprocessing unit. Simulations demonstrate the performance improvement ofthe proposed solution in terms of model size, processing speed and accuracy.The performance on the Venice dataset shows that the mean absolute error(MAE) and the root mean squared error (RMSE) of our model drop by32.63% and 39.18% compared with CSRNet. Meanwhile, the performance onthe ShanghaiTech PartB dataset reveals that the MAE and the RMSE of ourmodel are close to those of CSRNet. Therefore, we provide a novel embeddedplatform system scheme for public safety pre-warning applications.
基金This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(No.2021R1I1A1A01055652).
文摘The analysis of overcrowded areas is essential for flow monitoring,assembly control,and security.Crowd counting’s primary goal is to calculate the population in a given region,which requires real-time analysis of congested scenes for prompt reactionary actions.The crowd is always unexpected,and the benchmarked available datasets have a lot of variation,which limits the trained models’performance on unseen test data.In this paper,we proposed an end-to-end deep neural network that takes an input image and generates a density map of a crowd scene.The proposed model consists of encoder and decoder networks comprising batch-free normalization layers known as evolving normalization(EvoNorm).This allows our network to be generalized for unseen data because EvoNorm is not using statistics from the training samples.The decoder network uses dilated 2D convolutional layers to provide large receptive fields and fewer parameters,which enables real-time processing and solves the density drift problem due to its large receptive field.Five benchmark datasets are used in this study to assess the proposed model,resulting in the conclusion that it outperforms conventional models.
基金the Deanship of Scientific Research(DSR),King Abdulaziz University,Jeddah,Saudi Arabia,under grant No.(DF-352-165-1441).The authors,therefore,gratefully acknowledge DSR for their technical and financial support.
文摘With the emergence of the COVID19 virus in late 2019 and the declaration that the virus is a worldwide pandemic,health organizations and governments have begun to implement severe health precautions to reduce the spread of the virus and preserve human lives.The enforcement of social distancing at work environments and public areas is one of these obligatory precautions.Crowd management is one of the effective measures for social distancing.By reducing the social contacts of individuals,the spread of the disease will be immensely reduced.In this paper,a model for crowd counting in public places of high and low densities is proposed.The model works under various scene conditions and with no prior knowledge.A Deep CNN model(DCNN)is built based on convolutional neural network(CNN)structure with small kernel size and two fronts.To increase the efficiency of the model,a convolutional neural network(CNN)as the front-end and a multi-column layer with Dilated Convolution as the back-end were chosen.Also,the proposed method accepts images of arbitrary sizes/scales as inputs from different cameras.To evaluate the proposed model,a dataset was created from images of Saudi people with traditional and non-traditional Saudi outfits.The model was also trained and tested on some existing datasets.Compared to current counting methods,the results show that the proposed model has significantly improved efficiency and reduced the error rate.We achieve the lowest MAE by 67%,32%.and 15.63%and lowest MSE by around 47%,15%and 8.1%than M-CNN,Cascaded-MTL,and CSRNet respectively.
基金This research work is supported by the Deputyship of Research&Innovation,Ministry of Education in Saudi Arabia(Grant Number 758).
文摘Estimating the crowd count and density of highly dense scenes witnessed in Muslim gatherings at religious sites in Makkah and Madinah is critical for developing control strategies and organizing such a large gathering.Moreover,since the crowd images in this case can range from low density to high density,detection-based approaches are hard to apply for crowd counting.Recently,deep learning-based regression has become the prominent approach for crowd counting problems,where a density-map is estimated,and its integral is further computed to acquire the final count result.In this paper,we put forward a novel multi-scale network(named 2U-Net)for crowd counting in sparse and dense scenarios.The proposed framework,which employs the U-Net architecture,is straightforward to implement,computationally efficient,and has single-step training.Unpooling layers are used to retrieve the pooling layers’erased information and learn hierarchically pixelwise spatial representation.This helps in obtaining feature values,retaining spatial locations,and maximizing data integrity to avoid data loss.In addition,a modified attention unit is introduced and integrated into the proposed 2UNet model to focus on specific crowd areas.The proposed model concentrates on balancing the number of model parameters,model size,computational cost,and counting accuracy compared with other works,which may involve acquiring one criterion at the expense of other constraints.Experiments on five challenging datasets for density estimation and crowd counting have shown that the proposed model is very effective and outperforms comparable mainstream models.Moreover,it counts very well in both sparse and congested crowd scenes.The 2U-Net model has the lowest MAE in both parts(Part A and Part B)of the ShanghaiTech,UCSD,and Mall benchmarks,with 63.3,7.4,1.5,and 1.6,respectively.Furthermore,it obtains the lowest MSE in the ShanghaiTech-Part B,UCSD,and Mall benchmarks with 12.0,1.9,and 2.1,respectively.
基金This work was supported by the National Natural Science Foundation of China(Grant No.61802196,url:http://www.nsfc.gov.cn/)Jiangsu Provincial Government Scholarship for Studying Abroad+1 种基金The Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD)NUIST Students’Platform for Innovation and Entrepreneurship Training Program(Grant No.202010300080Y,url:http://sjjx.nuist.edu.cn:81/CXCY/NUIST/).
文摘Since the outbreak of the world-wide novel coronavirus pandemic,crowd counting in public areas,such as in shopping centers and in commercial streets,has gained popularity among public health administrations for preventing the crowds from gathering.In this paper,we propose a novel adaptive method for crowd counting based on Wi-Fi channel state information(CSI)by using common commercial wireless routers.Compared with previous researches on device-free crowd counting,our proposed method is more adaptive to the change of environ-ment and can achieve high accuracy of crowd count estimation.Because the dis-tance between access point(AP)and monitor point(MP)is typically non-fixed in real-world applications,the strength of received signals varies and makes the tra-ditional amplitude-related models to perform poorly in different environments.In order to achieve adaptivity of the crowd count estimation model,we used convo-lutional neural network(ConvNet)to extract features from correlation coefficient matrix of subcarriers which are insensitive to the change of received signal strength.We conducted experiments in university classroom settings and our model achieved an overall accuracy of 97.79%in estimating a variable number of participants.
基金This research is supported by the Ministry of Education Saudi Arabia under Project Number QURDO001.
文摘With the rapid progress of deep convolutional neural networks,several applications of crowd counting have been proposed and explored in the literature.In congested scene monitoring,a variety of crowd density estimating approaches has been developed.The understanding of highly congested scenes for crowd counting during Muslim gatherings of Hajj and Umrah is a challenging task,as a large number of individuals stand nearby and,it is hard for detection techniques to recognize them,as the crowd can vary from low density to high density.To deal with such highly congested scenes,we have proposed the Congested Scene Crowd Counting Network(CSCC-Net)using VGG-16 as a core network with its first ten layers due to its strong and robust transfer learning rate.A hole dilated convolutional neural network is used at the back end to widen the relevant field to extract a large range of information from the image without losing its original resolution.The dilated convolution neural network is mainly chosen to expand the kernel size without changing other parameters.Moreover,several loss functions have been applied to strengthen the evaluation accuracy of the model.Finally,the entire experiments have been evaluated using prominent data sets namely,ShanghaiTech parts A,B,UCF_CC_50,and UCF_QNRF.Our model has achieved remarkable results i.e.,68.0 and 9.0 MAE on ShanghaiTech parts A,B,199.1 MAE on UCF_CC_50,and 99.8 on UCF_QNRF data sets respectively.
文摘With the popularity and development of indoor WiFi equipment, they have more sensing capability and can be used as a human monitoring device. We can collect the channel state information (CSI) from WiFi device and acquire the human state based on the measurements. These studies have attracted wide attention and become a hot research topic. This paper concentrated on the crowd counting based on CSI and transfer learning. We utilized the CSI signal fluctuations caused by human motion in WiFi coverage to identify the person count because different person counts would lead to unique signal propagation characteristics. First, this paper presented recent studies of crowd counting based on CSI. Then, we introduced the basic concept of CSI, and described the fundamental principle of CSI-based crowd counting. We also presented the system framework, experiment scenario, and neural network structure transferred from the ResNet. Next, we presented the experiment results and compared the accuracy using different neural network models. The system achieved recognition accuracy of this 100 percent for seven participants using the transfer learning technique. Finally, we concluded the paper by discussing the current problems and future work.
基金This work was supported by ZTE Industry⁃University⁃Institute Coopera⁃tion Funds.
文摘Crowd counting is a challenging task in computer vision as realistic scenes are al?ways filled with unfavourable factors such as severe occlusions, perspective distortions and di?verse distributions. Recent state-of-the-art methods based on convolutional neural network (CNN) weaken these factors via multi-scale feature fusion or optimal feature selection through a front switch-net. L2 regression is used to regress the density map of the crowd, which is known to lead to an average and blurry result, and affects the accuracy of crowd count and po?sition distribution. To tackle these problems, we take full advantage of the application of gen?erative adversarial networks (GANs) in image generation and propose a novel crowd counting model based on conditional GANs to predict high-quality density maps from crowd images. Furthermore, we innovatively put forward a new regularizer so as to help boost the accuracy of processing extremely crowded scenes. Extensive experiments on four major crowd counting datasets are conducted to demonstrate the better performance of the proposed approach com?pared with recent state-of-the-art methods.
基金funded by Naif Arab University for Security Sciences under grant No.NAUSS-23-R10.
文摘Estimation of crowd count is becoming crucial nowadays,as it can help in security surveillance,crowd monitoring,and management for different events.It is challenging to determine the approximate crowd size from an image of the crowd’s density.Therefore in this research study,we proposed a multi-headed convolutional neural network architecture-based model for crowd counting,where we divided our proposed model into two main components:(i)the convolutional neural network,which extracts the feature across the whole image that is given to it as an input,and(ii)the multi-headed layers,which make it easier to evaluate density maps to estimate the number of people in the input image and determine their number in the crowd.We employed the available public benchmark crowd-counting datasets UCF CC 50 and ShanghaiTech parts A and B for model training and testing to validate the model’s performance.To analyze the results,we used two metrics Mean Absolute Error(MAE)and Mean Square Error(MSE),and compared the results of the proposed systems with the state-of-art models of crowd counting.The results show the superiority of the proposed system.
基金supported by the National Natural Science Foundation of China(Grant Nos.62076117 and 61762061)the Natural Science Foundation of Jiangxi Province,China(20161ACB20004)Jiangxi Key Laboratory of Smart City(20192BCD40002).
文摘Crowd counting is recently becoming a hot research topic, which aims to count the number of the people in different crowded scenes. Existing methods are mainly based on training-testing pattern and rely on large data training, which fails to accurately count the crowd in real-world scenes because of the limitation of model’s generalization capability. To alleviate this issue, a scene-adaptive crowd counting method based on meta-learning with Dual-illumination Merging Network (DMNet) is proposed in this paper. The proposed method based on learning-to-learn and few-shot learning is able to adapt different scenes which only contain a few labeled images. To generate high quality density map and count the crowd in low-lighting scene, the DMNet is proposed, which contains Multi-scale Feature Extraction module and Element-wise Fusion Module. The Multi-scale Feature Extraction module is used to extract the image feature by multi-scale convolutions, which helps to improve network accuracy. The Element-wise Fusion module fuses the low-lighting feature and illumination-enhanced feature, which supplements the missing illumination in low-lighting environments. Experimental results on benchmarks, WorldExpo’10, DISCO, USCD, and Mall, show that the proposed method outperforms the existing state-of-the-art methods in accuracy and gets satisfied results.
基金This research project was partially supported by the National Natural Science Foundation of China(Grant Nos.62072015,U19B2039,U1811463)the National Key R&D Program of China(Grant No.2018YFB1600903).
文摘Crowd counting provides an important foundation for public security and urban management.Due to the existence of small targets and large density variations in crowd images,crowd counting is a challenging task.Mainstream methods usually apply convolution neural networks(CNNs)to regress a density map,which requires annotations of individual persons and counts.Weakly-supervised methods can avoid detailed labeling and only require counts as annotations of images,but existing methods fail to achieve satisfactory performance because a global perspective field and multi-level information are usually ignored.We propose a weakly-supervised method,DTCC,which effectively combines multi-level dilated convolution and transformer methods to realize end-to-end crowd counting.Its main components include a recursive swin transformer and a multi-level dilated convolution regression head.The recursive swin transformer combines a pyramid visual transformer with a fine-tuned recursive pyramid structure to capture deep multi-level crowd features,including global features.The multi-level dilated convolution regression head includes multi-level dilated convolution and a linear regression head for the feature extraction module.This module can capture both low-and high-level features simultaneously to enhance the receptive field.In addition,two regression head fusion mechanisms realize dynamic and mean fusion counting.Experiments on four well-known benchmark crowd counting datasets(UCF_CC_50,ShanghaiTech,UCF_QNRF,and JHU-Crowd++)show that DTCC achieves results superior to other weakly-supervised methods and comparable to fully-supervised methods.