Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model ...Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention(APSA)module compared to the Attenuated Spatial Pyramid module,which can increase the receptive field and enhance the information,and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results,and the model effectively improves the segmentation accuracy of small data sets.The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks,the algorithm has a better segmentation effect and segmentation accuracy,and its mIOU reaches 75.85%.Moreover,to verify the generality of the model and the effectiveness of the APSA module,experiments were conducted on the VOC 2012 data set,and the APSA module improved mIOU by about 12.2%.展开更多
Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion s...Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion scenarios. However, while continuously improving cross-modal feature extraction and fusion, ensuring the model’s detection speed is also a challenging issue. We have devised a deep learning network model for cross-modal pedestrian detection based on Resnet50, aiming to focus on more reliable features and enhance the model’s detection efficiency. This model employs a spatial attention mechanism to reweight the input visible light and infrared image data, enhancing the model’s focus on different spatial positions and sharing the weighted feature data across different modalities, thereby reducing the interference of multi-modal features. Subsequently, lightweight modules with depthwise separable convolution are incorporated to reduce the model’s parameter count and computational load through channel-wise and point-wise convolutions. The network model algorithm proposed in this paper was experimentally validated on the publicly available KAIST dataset and compared with other existing methods. The experimental results demonstrate that our approach achieves favorable performance in various complex environments, affirming the effectiveness of the multispectral pedestrian detection technology proposed in this paper.展开更多
Top-down attention mechanisms require the selection of specificobjects or locations;however,the brain mechanism involved when attention is allocated across different modalities is not well understood.The aim of this s...Top-down attention mechanisms require the selection of specificobjects or locations;however,the brain mechanism involved when attention is allocated across different modalities is not well understood.The aim of this study was to use functional magnetic resonance imaging to define the neural mechanisms underlyingdivided and selective spatial attention.A concurrent audiovisual stimulus was used,and subjects were prompted to focus on a visual,auditory and audiovisual stimulus in a Posner paradigm.Ourbehavioral results confirmed the better performance of selectiveattention compared to devided attention.We found differences in the activation level of the frontoparietal network,visual/auditorycortex,the putamen and the salience network under differentattention conditions.We further used Granger causality(GC)toexplore effective connectivity differences between tasks.Differences in GC connectivity between visual and auditory selective tasksreflected the visual dominance effect under spatial attention.In addition,our results supported the role of the putamen inredistributing attention and the functional separation of the saliencenetwork.In summary,we explored the audiovisual top-down allocation of attention and observed the differences in neuralmechanisms under endogenous attention modes,which revealedthe differences in cross-modal expression in visual and auditory attention under attentional modulation.展开更多
The separation of individual pigs from the pigpen scenes is crucial for precision farming,and the technology based on convolutional neural networks can provide a low-cost,non-contact,non-invasive method of pig image s...The separation of individual pigs from the pigpen scenes is crucial for precision farming,and the technology based on convolutional neural networks can provide a low-cost,non-contact,non-invasive method of pig image segmentation.However,two factors limit the development of this field.On the one hand,the individual pigs are easy to stick together,and the occlusion of debris such as pigpens can easily make the model misjudgment.On the other hand,manual labeling of group-raised pig data is time-consuming and labor-intensive and is prone to labeling errors.Therefore,it is urgent for an individual pig image segmentation model that can perform well in individual scenarios and can be easily migrated to a group-raised environment.In order to solve the above problems,taking individual pigs as research objects,an individual pig image segmentation dataset containing 2066 images was constructed,and a series of algorithms based on fully convolutional networks were proposed to solve the pig image segmentation problem.In order to capture the long-range dependencies and weaken the background information such as pigpens while enhancing the information of individual parts of pigs,the channel and spatial attention blocks were introduced into the best-performing decoders UNet and LinkNet.Experiments show that using ResNext50 as the encoder and Unet as the decoder as the basic model,adding two attention blocks at the same time achieves 98.30%and 96.71%on the F1 and IOU metrics,respectively.Compared with the model adding channel attention block alone,the two metrics are improved by 0.13%and 0.22%,respectively.The experiment of introducing channel and spatial attention alone shows that spatial attention is more effective than channel attention.Taking VGG16-LinkNet as an example,compared with channel attention,spatial attention improves the F1 and IOU metrics by 0.16%and 0.30%,respectively.Furthermore,the heatmap of the feature of different layers of the decoder after adding different attention information proves that with the increase of layers,the boundary of pig image segmentation is clearer.In order to verify the effectiveness of the individual pig image segmentation model in group-raised scenes,the transfer performance of the model is verified in three scenarios of high separation,deep adhesion,and pigpen occlusion.The experiments show that the segmentation results of adding attention information,especially the simultaneous fusion of channel and spatial attention blocks,are more refined and complete.The attention-based individual pig image segmentation model can be effectively transferred to the field of group-raised pigs and can provide a reference for its pre-segmentation.展开更多
With the metaverse being the development direction of the next generation Internet,the popularity of intelligent devices,and the maturity of various emerging technologies,more and more intelligent devices try to conne...With the metaverse being the development direction of the next generation Internet,the popularity of intelligent devices,and the maturity of various emerging technologies,more and more intelligent devices try to connect to the Internet,which poses a major threat to the management and security protection of network equipment.At present,the mainstream method of network equipment identification in the metaverse is to obtain the network traffic data generated in the process of device communication,extract the device features through analysis and processing,and identify the device based on a variety of learning algorithms.Such methods often require manual participation,and it is difficult to capture the small differences between similar devices,leading to identification errors.Therefore,we propose a deep learning device recognition method based on a spatial attention mechanism.Firstly,we extract the required feature fields from the acquired network traffic data.Then,we normalize the data and convert it into grayscale images.After that,we add a spatial attention mechanism to CNN and MLP respectively to increase the difference between similar network devices and further improve the recognition accuracy.Finally,we identify devices based on the deep learning model.A large number of experiments were carried out on 31 types of network devices such as web cameras,wireless routers,and smartwatches.The results show that the accuracy of the proposed recognition method based on the spatial attention mechanism is increased by 0.8%and 2.0%,respectively,compared with the recognition method based only on the deep learning model under the CNN and MLP models.The method proposed in this paper is significantly superior to the existing method of device-type recognition based only on a deep learning model.展开更多
Object detection has made a significant leap forward in recent years.However,the detection of small objects continues to be a great difficulty for various reasons,such as they have a very small size and they are susce...Object detection has made a significant leap forward in recent years.However,the detection of small objects continues to be a great difficulty for various reasons,such as they have a very small size and they are susceptible to missed detection due to background noise.Additionally,small object information is affected due to the downsampling operations.Deep learning-based detection methods have been utilized to address the challenge posed by small objects.In this work,we propose a novel method,the Multi-Convolutional Block Attention Network(MCBAN),to increase the detection accuracy of minute objects aiming to overcome the challenge of information loss during the downsampling process.The multi-convolutional attention block(MCAB);channel attention and spatial attention module(SAM)that make up MCAB,have been crafted to accomplish small object detection with higher precision.We have carried out the experiments on the Karlsruhe Institute of Technology and Toyota Technological Institute(KITTI)and Pattern Analysis,Statical Modeling and Computational Learning(PASCAL)Visual Object Classes(VOC)datasets and have followed a step-wise process to analyze the results.These experiment results demonstrate that significant gains in performance are achieved,such as 97.75%for KITTI and 88.97%for PASCAL VOC.The findings of this study assert quite unequivocally the fact that MCBAN is much more efficient in the small object detection domain as compared to other existing approaches.展开更多
The detection of foreign object intrusion is crucial for ensuring the safety of railway operations.To address challenges such as low efficiency,suboptimal detection accuracy,and slow detection speed inherent in conven...The detection of foreign object intrusion is crucial for ensuring the safety of railway operations.To address challenges such as low efficiency,suboptimal detection accuracy,and slow detection speed inherent in conventional comprehensive video monitoring systems for railways,a railway foreign object intrusion recognition and detection system is conceived and implemented using edge computing and deep learning technologies.In a bid to raise detection accuracy,the convolutional block attention module(CBAM),including spatial and channel attention modules,is seamlessly integrated into the YOLOv5 model,giving rise to the CBAM-YOLOv5 model.Furthermore,the distance intersection-over-union_non-maximum suppression(DIo U_NMS)algorithm is employed in lieu of the weighted nonmaximum suppression algorithm,resulting in improved detection performance for intrusive targets.To accelerate detection speed,the model undergoes pruning based on the batch normalization(BN)layer,and Tensor RT inference acceleration techniques are employed,culminating in the successful deployment of the algorithm on edge devices.The CBAM-YOLOv5 model exhibits a notable 2.1%enhancement in detection accuracy when evaluated on a selfconstructed railway dataset,achieving 95.0%for mean average precision(m AP).Furthermore,the inference speed on edge devices attains a commendable 15 frame/s.展开更多
Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to ach...Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks.Analysis of all features may cause information redundancy and heavy computational burden.Attention mechanism is a wise way to solve this problem.However,using single attention mechanism may cause incomplete concern of features.This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention mechanism method.In the case that the attention mechanism will cause the loss of the original features,a small portion of image features were added as compensation.For the attention mechanism of text features,a selfattention mechanism was introduced,and the internal structural features of sentences were strengthened to improve the overall model.The results show that attention mechanism and feature compensation add 6.1%accuracy to multimodal low-rank bilinear pooling network.展开更多
Network intrusion detection systems(NIDS)based on deep learning have continued to make significant advances.However,the following challenges remain:on the one hand,simply applying only Temporal Convolutional Networks(...Network intrusion detection systems(NIDS)based on deep learning have continued to make significant advances.However,the following challenges remain:on the one hand,simply applying only Temporal Convolutional Networks(TCNs)can lead to models that ignore the impact of network traffic features at different scales on the detection performance.On the other hand,some intrusion detection methods considermulti-scale information of traffic data,but considering only forward network traffic information can lead to deficiencies in capturing multi-scale temporal features.To address both of these issues,we propose a hybrid Convolutional Neural Network that supports a multi-output strategy(BONUS)for industrial internet intrusion detection.First,we create a multiscale Temporal Convolutional Network by stacking TCN of different scales to capture the multiscale information of network traffic.Meanwhile,we propose a bi-directional structure and dynamically set the weights to fuse the forward and backward contextual information of network traffic at each scale to enhance the model’s performance in capturing the multi-scale temporal features of network traffic.In addition,we introduce a gated network for each of the two branches in the proposed method to assist the model in learning the feature representation of each branch.Extensive experiments reveal the effectiveness of the proposed approach on two publicly available traffic intrusion detection datasets named UNSW-NB15 and NSL-KDD with F1 score of 85.03% and 99.31%,respectively,which also validates the effectiveness of enhancing the model’s ability to capture multi-scale temporal features of traffic data on detection performance.展开更多
Generative adversarial networks(GANs)with gaming abilities have been widely applied in image generation.However,gamistic generators and discriminators may reduce the robustness of the obtained GANs in image generation...Generative adversarial networks(GANs)with gaming abilities have been widely applied in image generation.However,gamistic generators and discriminators may reduce the robustness of the obtained GANs in image generation under varying scenes.Enhancing the relation of hierarchical information in a generation network and enlarging differences of different network architectures can facilitate more structural information to improve the generation effect for image generation.In this paper,we propose an enhanced GAN via improving a generator for image generation(EIGGAN).EIGGAN applies a spatial attention to a generator to extract salient information to enhance the truthfulness of the generated images.Taking into relation the context account,parallel residual operations are fused into a generation network to extract more structural information from the different layers.Finally,a mixed loss function in a GAN is exploited to make a tradeoff between speed and accuracy to generate more realistic images.Experimental results show that the proposed method is superior to popular methods,i.e.,Wasserstein GAN with gradient penalty(WGAN-GP)in terms of many indexes,i.e.,Frechet Inception Distance,Learned Perceptual Image Patch Similarity,Multi-Scale Structural Similarity Index Measure,Kernel Inception Distance,Number of Statistically-Different Bins,Inception Score and some visual images for image generation.展开更多
Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonom...Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonomous driving due to the unpredictable entry and exit of vehicles,susceptibility to traffic flow bottlenecks,and imperfect data in perceiving environmental information,rendering them a vital issue in the practical application of autonomous driving.To address the traffic challenges,this work focused on complex roundabouts with multi-lane and proposed a Perception EnhancedDeepDeterministic Policy Gradient(PE-DDPG)for AutonomousDriving in the Roundabouts.Specifically,themodel incorporates an enhanced variational autoencoder featuring an integrated spatial attention mechanism alongside the Deep Deterministic Policy Gradient framework,enhancing the vehicle’s capability to comprehend complex roundabout environments and make decisions.Furthermore,the PE-DDPG model combines a dynamic path optimization strategy for roundabout scenarios,effectively mitigating traffic bottlenecks and augmenting throughput efficiency.Extensive experiments were conducted with the collaborative simulation platform of CARLA and SUMO,and the experimental results show that the proposed PE-DDPG outperforms the baseline methods in terms of the convergence capacity of the training process,the smoothness of driving and the traffic efficiency with diverse traffic flow patterns and penetration rates of autonomous vehicles(AVs).Generally,the proposed PE-DDPGmodel could be employed for autonomous driving in complex scenarios with imperfect data.展开更多
Deepfake-generated fake faces,commonly utilized in identity-related activities such as political propaganda,celebrity impersonations,evidence forgery,and familiar fraud,pose new societal threats.Although current deepf...Deepfake-generated fake faces,commonly utilized in identity-related activities such as political propaganda,celebrity impersonations,evidence forgery,and familiar fraud,pose new societal threats.Although current deepfake generators strive for high realism in visual effects,they do not replicate biometric signals indicative of cardiac activity.Addressing this gap,many researchers have developed detection methods focusing on biometric characteristics.These methods utilize classification networks to analyze both temporal and spectral domain features of the remote photoplethysmography(rPPG)signal,resulting in high detection accuracy.However,in the spectral analysis,existing approaches often only consider the power spectral density and neglect the amplitude spectrum—both crucial for assessing cardiac activity.We introduce a novel method that extracts rPPG signals from multiple regions of interest through remote photoplethysmography and processes them using Fast Fourier Transform(FFT).The resultant time-frequency domain signal samples are organized into matrices to create Matrix Visualization Heatmaps(MVHM),which are then utilized to train an image classification network.Additionally,we explored various combinations of time-frequency domain representations of rPPG signals and the impact of attention mechanisms.Our experimental results show that our algorithm achieves a remarkable detection accuracy of 99.22%in identifying fake videos,significantly outperforming mainstream algorithms and demonstrating the effectiveness of Fourier Transform and attention mechanisms in detecting fake faces.展开更多
Existing almost deep learning methods rely on a large amount of annotated data, so they are inappropriate for forest fire smoke detection with limited data. In this paper, a novel hybrid attention-based few-shot learn...Existing almost deep learning methods rely on a large amount of annotated data, so they are inappropriate for forest fire smoke detection with limited data. In this paper, a novel hybrid attention-based few-shot learning method, named Attention-Based Prototypical Network, is proposed for forest fire smoke detection. Specifically, feature extraction network, which consists of convolutional block attention module, could extract high-level and discriminative features and further decrease the false alarm rate resulting from suspected smoke areas. Moreover, we design a metalearning module to alleviate the overfitting issue caused by limited smoke images, and the meta-learning network enables achieving effective detection via comparing the distance between the class prototype of support images and the features of query images. A series of experiments on forest fire smoke datasets and miniImageNet dataset testify that the proposed method is superior to state-of-the-art few-shot learning approaches.展开更多
Image inpainting based on deep learning has been greatly improved.The original purpose of image inpainting was to repair some broken photos, suchas inpainting artifacts. However, it may also be used for malicious oper...Image inpainting based on deep learning has been greatly improved.The original purpose of image inpainting was to repair some broken photos, suchas inpainting artifacts. However, it may also be used for malicious operations,such as destroying evidence. Therefore, detection and localization of imageinpainting operations are essential. Recent research shows that high-pass filteringfull convolutional network (HPFCN) is applied to image inpainting detection andachieves good results. However, those methods did not consider the spatial location and channel information of the feature map. To solve these shortcomings, weintroduce the squeezed excitation blocks (SE) and propose a high-pass filter attention full convolutional network (HPACN). In feature extraction, we apply concurrent spatial and channel attention (scSE) to enhance feature extraction and obtainmore information. Channel attention (cSE) is introduced in upsampling toenhance detection and localization. The experimental results show that the proposed method can achieve improvement on ImageNet.展开更多
Recent applications of convolutional neural networks(CNNs)in single image super-resolution(SISR)have achieved unprecedented performance.However,existing CNN-based SISR network structure design consider mostly only cha...Recent applications of convolutional neural networks(CNNs)in single image super-resolution(SISR)have achieved unprecedented performance.However,existing CNN-based SISR network structure design consider mostly only channel or spatial information,and cannot make full use of both channel and spatial information to improve SISR performance further.The present work addresses this problem by proposing a mixed attention densely residual network architecture that can make full and simultaneous use of both channel and spatial information.Specifically,we propose a residual in dense network structure composed of dense connections between multiple dense residual groups to form a very deep network.This structure allows each dense residual group to apply a local residual skip connection and enables the cascading of multiple residual blocks to reuse previous features.A mixed attention module is inserted into each dense residual group,to enable the algorithm to fuse channel attention with laplacian spatial attention effectively,and thereby more adaptively focus on valuable feature learning.The qualitative and quantitative results of extensive experiments have demonstrate that the proposed method has a comparable performance with other stateof-the-art methods.展开更多
Image fusion aims to integrate complementary information in source images to synthesize a fused image comprehensively characterizing the imaging scene. However, existing image fusion algorithms are only applicable to ...Image fusion aims to integrate complementary information in source images to synthesize a fused image comprehensively characterizing the imaging scene. However, existing image fusion algorithms are only applicable to strictly aligned source images and cause severe artifacts in the fusion results when input images have slight shifts or deformations. In addition,the fusion results typically only have good visual effect, but neglect the semantic requirements of high-level vision tasks.This study incorporates image registration, image fusion, and semantic requirements of high-level vision tasks into a single framework and proposes a novel image registration and fusion method, named Super Fusion. Specifically, we design a registration network to estimate bidirectional deformation fields to rectify geometric distortions of input images under the supervision of both photometric and end-point constraints. The registration and fusion are combined in a symmetric scheme, in which while mutual promotion can be achieved by optimizing the naive fusion loss, it is further enhanced by the mono-modal consistent constraint on symmetric fusion outputs. In addition, the image fusion network is equipped with the global spatial attention mechanism to achieve adaptive feature integration. Moreover, the semantic constraint based on the pre-trained segmentation model and Lovasz-Softmax loss is deployed to guide the fusion network to focus more on the semantic requirements of high-level vision tasks. Extensive experiments on image registration, image fusion,and semantic segmentation tasks demonstrate the superiority of our Super Fusion compared to the state-of-the-art alternatives.The source code and pre-trained model are publicly available at https://github.com/Linfeng-Tang/Super Fusion.展开更多
With the improvement of the national economic level,the number of vehicles is still increasing year by year.According to the statistics of National Bureau of Statics,the number is approximately up to 327 million in Ch...With the improvement of the national economic level,the number of vehicles is still increasing year by year.According to the statistics of National Bureau of Statics,the number is approximately up to 327 million in China by the end of 2018,which makes urban traffic pressure continues to rise so that the negative impact of urban traffic order is growing.Illegal parking-the common problem in the field of transportation security is urgent to be solved and traditional methods to address it are mainly based on ground loop and manual supervision,which may miss detection and cost much manpower.Due to the rapidly developing deep learning sweeping the world in recent years,object detection methods relying on background segmentation cannot meet the requirements of complex and various scenes on speed and precision.Thus,an improved Single Shot MultiBox Detector(SSD)based on deep learning is proposed in our study,we introduce attention mechanism by spatial transformer module which gives neural networks the ability to actively spatially transform feature maps and add contextual information transmission in specified layer.Finally,we found out the best connection layer in the detection model by repeated experiments especially for small objects and increased the precision by 1.5%than the baseline SSD without extra training cost.Meanwhile,we designed an illegal parking vehicle detection method by the improved SSD,reaching a high precision up to 97.3%and achieving a speed of 40FPS,superior to most of vehicle detection methods,will make contributions to relieving the negative impact of illegal parking.展开更多
Background:It has been suggested that older adults show a reduced attentional field compared to younger adults.This may be attributed to a poorer utilization of peripheral vision(i.e.,peripheral attentional allocation...Background:It has been suggested that older adults show a reduced attentional field compared to younger adults.This may be attributed to a poorer utilization of peripheral vision(i.e.,peripheral attentional allocation)and a higher reliance on central vision compared to younger adults.To test this,we examined the importance of central,peri-foveal and near periphery information in younger and older adults by comparing their visual search performance while their central vision was blocked,in the presence of different sized artificial central scotomas.We tested participants in two versions of visual search,pop-out and serial search,because they require a different use of central and peripheral attention.Pop-out search relies on processing of the entire visual scene(i.e.,global processing)whereas serial search requires processing of each feature serially(i.e.,local processing).Methods:Thirteen healthy younger(M=21.8,SD=1.5)and 15 older adults(M=69.1 years,SD=7.3)performed a pop-out and a serial version of a visual search task in the presence of different sized gaze-contingent artificial central scotomas(no scotoma,3°diameter,5°and 7°).Participants were asked to indicate as quickly as possible whether a target was present or not among distractors whose number varied(16,32 or 64 objects).Results:We found evidence for a greater decline in peripheral processing in older adults compared to younger in pop-out but not in serial search.For the pop-out condition with no scotoma,we found that the further the target in the periphery,the longer the search time,and that this increase was proportionally greater for older adults compared to younger adults.Further,increases in scotoma size were associated with a greater increase in reaction times for older adults compared to younger participants.For the serial condition,both groups showed similar increases in reaction times with target distance from center and scotoma size.We surmise that this may be due to task difficulty in serial search;central vision is necessary for both groups.Conclusions:In conclusion,these findings suggest that,in global processing,older adults distribute more resources towards central vision compared to younger adults.展开更多
Background:Research suggests that the analysis of facial expressions by a healthy brain would take place approximately 170 ms after the presentation of a facial expression in the superior temporal sulcus and the fusif...Background:Research suggests that the analysis of facial expressions by a healthy brain would take place approximately 170 ms after the presentation of a facial expression in the superior temporal sulcus and the fusiform gyrus,mostly in the right hemisphere.Some researchers argue that a fast pathway through the amygdala would allow automatic and early emotional treatment around 90 ms after stimulation.This treatment would be done subconsciously,even before this stimulus is perceived and could be approximated by presenting the stimuli quickly on the periphery of the fovea.The present study aimed to identify the neural correlates of a peripheral and simultaneous presentation of emotional expressions through a frequency tagging paradigm.Methods:The presentation of emotional facial expressions at a specific frequency induces in the visual cortex a stable and precise response to the presentation frequency[i.e.,a steady-state visual evoked potential(ssVEP)]that can be used as a frequency tag(i.e.,a frequency-tag to follow the cortical treatment of this stimulus.Here,the use of different specific stimulation frequencies allowed us to label the different facial expressions presented simultaneously and to obtain a reliable cortical response being associated with(I)each of the emotions and(II)the different times of presentations repeated(1/0.170 ms=~5.8 Hz,1/0.090 ms=~10.8 Hz).To identify the regions involved in emotional discrimination,we subtracted the brain activity induced by the rapid presentation of six emotional expressions of the activity induced by the presentation of the same emotion(reduced by neural adaptation).The results were compared to the hemisphere in which attention was sought,emotion and frequency of stimulation.Results:The signal-to-noise ratio of the cerebral oscillations referring to the treatment of the expression of fear was stronger in the regions specific to the emotional treatment when they were presented in the subjects peripheral vision,unbeknownst to them.In addition,the peripheral emotional treatment of fear at 10.8 Hz was associated with greater activation within the Gamma 1 and 2 frequency bands in the expected regions(frontotemporal and T6),as well as desynchronization in the Alpha frequency bands for the temporal regions.This modulation of the spectral power is independent of the attentional request.Conclusions:These results suggest that the emotional stimulation of fear presented in the peripheral vision and outside the attentional framework elicit an increase in brain activity,especially in the temporal lobe.The localization of this activity as well as the optimal stimulation frequency found for this facial expression suggests that it is treated by the fast pathway of the magnocellular layers.展开更多
Visual object tracking is an important issue that has received long-term attention in computer vision.The ability to effectively handle occlusion,especially severe occlusion,is an important aspect of evaluating the pe...Visual object tracking is an important issue that has received long-term attention in computer vision.The ability to effectively handle occlusion,especially severe occlusion,is an important aspect of evaluating the performance of object tracking algorithms in long-term tracking,and is of great significance to improving the robustness of object tracking algorithms.However,most object tracking algorithms lack a processing mechanism specifically for occlusion.In the case of occlusion,due to the lack of target information,it is necessary to predict the target position based on the motion trajectory.Kalman filtering and particle filtering can effectively predict the target motion state based on the historical motion information.A single object tracking method,called probabilistic discriminative model prediction(PrDiMP),is based on the spatial attention mechanism in complex scenes and occlusions.In order to improve the performance of PrDiMP,Kalman filtering,particle filtering and linear filtering are introduced.First,for the occlusion situation,Kalman filtering and particle filtering are respectively introduced to predict the object position,thereby replacing the detection result of the original tracking algorithm and stopping recursion of target model.Second,for detection-jump problem of similar objects in complex scenes,a linear filtering window is added.The evaluation results on the three datasets,including GOT-10k,UAV123 and LaSOT,and the visualization results on several videos,show that our algorithms have improved tracking performance under occlusion and the detection-jump is effectively suppressed.展开更多
基金Ministry of Science and Technology Basic Resources Survey Special Project,Grant/Award Number:2019FY100900High-level Hospital Construction Project,Grant/Award Number:DFJH2019015+2 种基金National Natural Science Foundation of China,Grant/Award Number:61871021Guangdong Natural Science Foundation,Grant/Award Number:2019A1515011676Beijing Key Laboratory of Robotics Bionic and Functional Research。
文摘Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention(APSA)module compared to the Attenuated Spatial Pyramid module,which can increase the receptive field and enhance the information,and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results,and the model effectively improves the segmentation accuracy of small data sets.The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks,the algorithm has a better segmentation effect and segmentation accuracy,and its mIOU reaches 75.85%.Moreover,to verify the generality of the model and the effectiveness of the APSA module,experiments were conducted on the VOC 2012 data set,and the APSA module improved mIOU by about 12.2%.
基金supported by the Henan Provincial Science and Technology Research Project under Grants 232102211006,232102210044,232102211017,232102210055 and 222102210214the Science and Technology Innovation Project of Zhengzhou University of Light Industry under Grant 23XNKJTD0205+1 种基金the Undergraduate Universities Smart Teaching Special Research Project of Henan Province under Grant Jiao Gao[2021]No.489-29the Doctor Natural Science Foundation of Zhengzhou University of Light Industry under Grants 2021BSJJ025 and 2022BSJJZK13.
文摘Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion scenarios. However, while continuously improving cross-modal feature extraction and fusion, ensuring the model’s detection speed is also a challenging issue. We have devised a deep learning network model for cross-modal pedestrian detection based on Resnet50, aiming to focus on more reliable features and enhance the model’s detection efficiency. This model employs a spatial attention mechanism to reweight the input visible light and infrared image data, enhancing the model’s focus on different spatial positions and sharing the weighted feature data across different modalities, thereby reducing the interference of multi-modal features. Subsequently, lightweight modules with depthwise separable convolution are incorporated to reduce the model’s parameter count and computational load through channel-wise and point-wise convolutions. The network model algorithm proposed in this paper was experimentally validated on the publicly available KAIST dataset and compared with other existing methods. The experimental results demonstrate that our approach achieves favorable performance in various complex environments, affirming the effectiveness of the multispectral pedestrian detection technology proposed in this paper.
基金The study was supported by the National Natural Science Foundation of China(Grant Nos.62171300,61727807).
文摘Top-down attention mechanisms require the selection of specificobjects or locations;however,the brain mechanism involved when attention is allocated across different modalities is not well understood.The aim of this study was to use functional magnetic resonance imaging to define the neural mechanisms underlyingdivided and selective spatial attention.A concurrent audiovisual stimulus was used,and subjects were prompted to focus on a visual,auditory and audiovisual stimulus in a Posner paradigm.Ourbehavioral results confirmed the better performance of selectiveattention compared to devided attention.We found differences in the activation level of the frontoparietal network,visual/auditorycortex,the putamen and the salience network under differentattention conditions.We further used Granger causality(GC)toexplore effective connectivity differences between tasks.Differences in GC connectivity between visual and auditory selective tasksreflected the visual dominance effect under spatial attention.In addition,our results supported the role of the putamen inredistributing attention and the functional separation of the saliencenetwork.In summary,we explored the audiovisual top-down allocation of attention and observed the differences in neuralmechanisms under endogenous attention modes,which revealedthe differences in cross-modal expression in visual and auditory attention under attentional modulation.
基金supported by the National Natural Science Foundation of China(Grant No.31671571)the Shanxi Province Basic Research Program Project(Free Exploration)(No.20210302124523,20210302123408,202103021224149,and 202103021223141)the Youth Agricultural Science and Technology Innovation Fund of Shanxi Agricultural University(Grant No.2019027)。
文摘The separation of individual pigs from the pigpen scenes is crucial for precision farming,and the technology based on convolutional neural networks can provide a low-cost,non-contact,non-invasive method of pig image segmentation.However,two factors limit the development of this field.On the one hand,the individual pigs are easy to stick together,and the occlusion of debris such as pigpens can easily make the model misjudgment.On the other hand,manual labeling of group-raised pig data is time-consuming and labor-intensive and is prone to labeling errors.Therefore,it is urgent for an individual pig image segmentation model that can perform well in individual scenarios and can be easily migrated to a group-raised environment.In order to solve the above problems,taking individual pigs as research objects,an individual pig image segmentation dataset containing 2066 images was constructed,and a series of algorithms based on fully convolutional networks were proposed to solve the pig image segmentation problem.In order to capture the long-range dependencies and weaken the background information such as pigpens while enhancing the information of individual parts of pigs,the channel and spatial attention blocks were introduced into the best-performing decoders UNet and LinkNet.Experiments show that using ResNext50 as the encoder and Unet as the decoder as the basic model,adding two attention blocks at the same time achieves 98.30%and 96.71%on the F1 and IOU metrics,respectively.Compared with the model adding channel attention block alone,the two metrics are improved by 0.13%and 0.22%,respectively.The experiment of introducing channel and spatial attention alone shows that spatial attention is more effective than channel attention.Taking VGG16-LinkNet as an example,compared with channel attention,spatial attention improves the F1 and IOU metrics by 0.16%and 0.30%,respectively.Furthermore,the heatmap of the feature of different layers of the decoder after adding different attention information proves that with the increase of layers,the boundary of pig image segmentation is clearer.In order to verify the effectiveness of the individual pig image segmentation model in group-raised scenes,the transfer performance of the model is verified in three scenarios of high separation,deep adhesion,and pigpen occlusion.The experiments show that the segmentation results of adding attention information,especially the simultaneous fusion of channel and spatial attention blocks,are more refined and complete.The attention-based individual pig image segmentation model can be effectively transferred to the field of group-raised pigs and can provide a reference for its pre-segmentation.
基金supported by the National Key Research and Development Program of China(No.2022YFB3102900)the National Natural Science Foundation of China(No.U1804263,62172435 and 62002386)the Zhongyuan Science and Technology Innovation Leading Talent Project,China(No.214200510019)
文摘With the metaverse being the development direction of the next generation Internet,the popularity of intelligent devices,and the maturity of various emerging technologies,more and more intelligent devices try to connect to the Internet,which poses a major threat to the management and security protection of network equipment.At present,the mainstream method of network equipment identification in the metaverse is to obtain the network traffic data generated in the process of device communication,extract the device features through analysis and processing,and identify the device based on a variety of learning algorithms.Such methods often require manual participation,and it is difficult to capture the small differences between similar devices,leading to identification errors.Therefore,we propose a deep learning device recognition method based on a spatial attention mechanism.Firstly,we extract the required feature fields from the acquired network traffic data.Then,we normalize the data and convert it into grayscale images.After that,we add a spatial attention mechanism to CNN and MLP respectively to increase the difference between similar network devices and further improve the recognition accuracy.Finally,we identify devices based on the deep learning model.A large number of experiments were carried out on 31 types of network devices such as web cameras,wireless routers,and smartwatches.The results show that the accuracy of the proposed recognition method based on the spatial attention mechanism is increased by 0.8%and 2.0%,respectively,compared with the recognition method based only on the deep learning model under the CNN and MLP models.The method proposed in this paper is significantly superior to the existing method of device-type recognition based only on a deep learning model.
基金funded by Yayasan UTP FRG(YUTP-FRG),grant number 015LC0-280 and Computer and Information Science Department of Universiti Teknologi PETRONAS.
文摘Object detection has made a significant leap forward in recent years.However,the detection of small objects continues to be a great difficulty for various reasons,such as they have a very small size and they are susceptible to missed detection due to background noise.Additionally,small object information is affected due to the downsampling operations.Deep learning-based detection methods have been utilized to address the challenge posed by small objects.In this work,we propose a novel method,the Multi-Convolutional Block Attention Network(MCBAN),to increase the detection accuracy of minute objects aiming to overcome the challenge of information loss during the downsampling process.The multi-convolutional attention block(MCAB);channel attention and spatial attention module(SAM)that make up MCAB,have been crafted to accomplish small object detection with higher precision.We have carried out the experiments on the Karlsruhe Institute of Technology and Toyota Technological Institute(KITTI)and Pattern Analysis,Statical Modeling and Computational Learning(PASCAL)Visual Object Classes(VOC)datasets and have followed a step-wise process to analyze the results.These experiment results demonstrate that significant gains in performance are achieved,such as 97.75%for KITTI and 88.97%for PASCAL VOC.The findings of this study assert quite unequivocally the fact that MCBAN is much more efficient in the small object detection domain as compared to other existing approaches.
基金supported in part by the Science and Technology Innovation Project of CHN Energy Shuo Huang Railway Development Company Ltd(No.SHTL-22-28)the Beijing Natural Science Foundation Fengtai Urban Rail Transit Frontier Research Joint Fund(No.L231002)the Major Project of China State Railway Group Co.,Ltd.(No.K2023T003)。
文摘The detection of foreign object intrusion is crucial for ensuring the safety of railway operations.To address challenges such as low efficiency,suboptimal detection accuracy,and slow detection speed inherent in conventional comprehensive video monitoring systems for railways,a railway foreign object intrusion recognition and detection system is conceived and implemented using edge computing and deep learning technologies.In a bid to raise detection accuracy,the convolutional block attention module(CBAM),including spatial and channel attention modules,is seamlessly integrated into the YOLOv5 model,giving rise to the CBAM-YOLOv5 model.Furthermore,the distance intersection-over-union_non-maximum suppression(DIo U_NMS)algorithm is employed in lieu of the weighted nonmaximum suppression algorithm,resulting in improved detection performance for intrusive targets.To accelerate detection speed,the model undergoes pruning based on the batch normalization(BN)layer,and Tensor RT inference acceleration techniques are employed,culminating in the successful deployment of the algorithm on edge devices.The CBAM-YOLOv5 model exhibits a notable 2.1%enhancement in detection accuracy when evaluated on a selfconstructed railway dataset,achieving 95.0%for mean average precision(m AP).Furthermore,the inference speed on edge devices attains a commendable 15 frame/s.
基金This work was supported by the Sichuan Science and Technology Program(2021YFQ0003).
文摘Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks.Analysis of all features may cause information redundancy and heavy computational burden.Attention mechanism is a wise way to solve this problem.However,using single attention mechanism may cause incomplete concern of features.This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention mechanism method.In the case that the attention mechanism will cause the loss of the original features,a small portion of image features were added as compensation.For the attention mechanism of text features,a selfattention mechanism was introduced,and the internal structural features of sentences were strengthened to improve the overall model.The results show that attention mechanism and feature compensation add 6.1%accuracy to multimodal low-rank bilinear pooling network.
基金sponsored by the Autonomous Region Key R&D Task Special(2022B01008)the National Key R&D Program of China(SQ2022AAA010308-5).
文摘Network intrusion detection systems(NIDS)based on deep learning have continued to make significant advances.However,the following challenges remain:on the one hand,simply applying only Temporal Convolutional Networks(TCNs)can lead to models that ignore the impact of network traffic features at different scales on the detection performance.On the other hand,some intrusion detection methods considermulti-scale information of traffic data,but considering only forward network traffic information can lead to deficiencies in capturing multi-scale temporal features.To address both of these issues,we propose a hybrid Convolutional Neural Network that supports a multi-output strategy(BONUS)for industrial internet intrusion detection.First,we create a multiscale Temporal Convolutional Network by stacking TCN of different scales to capture the multiscale information of network traffic.Meanwhile,we propose a bi-directional structure and dynamically set the weights to fuse the forward and backward contextual information of network traffic at each scale to enhance the model’s performance in capturing the multi-scale temporal features of network traffic.In addition,we introduce a gated network for each of the two branches in the proposed method to assist the model in learning the feature representation of each branch.Extensive experiments reveal the effectiveness of the proposed approach on two publicly available traffic intrusion detection datasets named UNSW-NB15 and NSL-KDD with F1 score of 85.03% and 99.31%,respectively,which also validates the effectiveness of enhancing the model’s ability to capture multi-scale temporal features of traffic data on detection performance.
基金supported in part by the Science and Technology Development Fund,Macao S.A.R(FDCT)0028/2023/RIA1,in part by Leading Talents in Gusu Innovation and Entrepreneurship Grant ZXL2023170in part by the TCL Science and Technology Innovation Fund under Grant D5140240118in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2021A1515110079.
文摘Generative adversarial networks(GANs)with gaming abilities have been widely applied in image generation.However,gamistic generators and discriminators may reduce the robustness of the obtained GANs in image generation under varying scenes.Enhancing the relation of hierarchical information in a generation network and enlarging differences of different network architectures can facilitate more structural information to improve the generation effect for image generation.In this paper,we propose an enhanced GAN via improving a generator for image generation(EIGGAN).EIGGAN applies a spatial attention to a generator to extract salient information to enhance the truthfulness of the generated images.Taking into relation the context account,parallel residual operations are fused into a generation network to extract more structural information from the different layers.Finally,a mixed loss function in a GAN is exploited to make a tradeoff between speed and accuracy to generate more realistic images.Experimental results show that the proposed method is superior to popular methods,i.e.,Wasserstein GAN with gradient penalty(WGAN-GP)in terms of many indexes,i.e.,Frechet Inception Distance,Learned Perceptual Image Patch Similarity,Multi-Scale Structural Similarity Index Measure,Kernel Inception Distance,Number of Statistically-Different Bins,Inception Score and some visual images for image generation.
基金supported in part by the projects of the National Natural Science Foundation of China(62376059,41971340)Fujian Provincial Department of Science and Technology(2023XQ008,2023I0024,2021Y4019),Fujian Provincial Department of Finance(GY-Z230007,GYZ23012)Fujian Key Laboratory of Automotive Electronics and Electric Drive(KF-19-22001).
文摘Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonomous driving due to the unpredictable entry and exit of vehicles,susceptibility to traffic flow bottlenecks,and imperfect data in perceiving environmental information,rendering them a vital issue in the practical application of autonomous driving.To address the traffic challenges,this work focused on complex roundabouts with multi-lane and proposed a Perception EnhancedDeepDeterministic Policy Gradient(PE-DDPG)for AutonomousDriving in the Roundabouts.Specifically,themodel incorporates an enhanced variational autoencoder featuring an integrated spatial attention mechanism alongside the Deep Deterministic Policy Gradient framework,enhancing the vehicle’s capability to comprehend complex roundabout environments and make decisions.Furthermore,the PE-DDPG model combines a dynamic path optimization strategy for roundabout scenarios,effectively mitigating traffic bottlenecks and augmenting throughput efficiency.Extensive experiments were conducted with the collaborative simulation platform of CARLA and SUMO,and the experimental results show that the proposed PE-DDPG outperforms the baseline methods in terms of the convergence capacity of the training process,the smoothness of driving and the traffic efficiency with diverse traffic flow patterns and penetration rates of autonomous vehicles(AVs).Generally,the proposed PE-DDPGmodel could be employed for autonomous driving in complex scenarios with imperfect data.
基金supported by the National Nature Science Foundation of China(Grant Number:61962010).
文摘Deepfake-generated fake faces,commonly utilized in identity-related activities such as political propaganda,celebrity impersonations,evidence forgery,and familiar fraud,pose new societal threats.Although current deepfake generators strive for high realism in visual effects,they do not replicate biometric signals indicative of cardiac activity.Addressing this gap,many researchers have developed detection methods focusing on biometric characteristics.These methods utilize classification networks to analyze both temporal and spectral domain features of the remote photoplethysmography(rPPG)signal,resulting in high detection accuracy.However,in the spectral analysis,existing approaches often only consider the power spectral density and neglect the amplitude spectrum—both crucial for assessing cardiac activity.We introduce a novel method that extracts rPPG signals from multiple regions of interest through remote photoplethysmography and processes them using Fast Fourier Transform(FFT).The resultant time-frequency domain signal samples are organized into matrices to create Matrix Visualization Heatmaps(MVHM),which are then utilized to train an image classification network.Additionally,we explored various combinations of time-frequency domain representations of rPPG signals and the impact of attention mechanisms.Our experimental results show that our algorithm achieves a remarkable detection accuracy of 99.22%in identifying fake videos,significantly outperforming mainstream algorithms and demonstrating the effectiveness of Fourier Transform and attention mechanisms in detecting fake faces.
基金The work was supported by the National Key R&D Program of China(Grant No.2020YFC1511601)Fundamental Research Funds for the Central Universities(Grant No.2019SHFWLC01).
文摘Existing almost deep learning methods rely on a large amount of annotated data, so they are inappropriate for forest fire smoke detection with limited data. In this paper, a novel hybrid attention-based few-shot learning method, named Attention-Based Prototypical Network, is proposed for forest fire smoke detection. Specifically, feature extraction network, which consists of convolutional block attention module, could extract high-level and discriminative features and further decrease the false alarm rate resulting from suspected smoke areas. Moreover, we design a metalearning module to alleviate the overfitting issue caused by limited smoke images, and the meta-learning network enables achieving effective detection via comparing the distance between the class prototype of support images and the features of query images. A series of experiments on forest fire smoke datasets and miniImageNet dataset testify that the proposed method is superior to state-of-the-art few-shot learning approaches.
基金supported by the National Natural Science Foundation of China under Grant 62172059,61972057 and 62072055Hunan Provincial Natural Science Foundations of China under Grant 2020JJ4626+1 种基金Scientific Research Fund of Hunan Provincial Education Department of China under Grant 19B004Postgraduate Scientific Research Innovation Project of Hunan Province under Grant CX20210811.
文摘Image inpainting based on deep learning has been greatly improved.The original purpose of image inpainting was to repair some broken photos, suchas inpainting artifacts. However, it may also be used for malicious operations,such as destroying evidence. Therefore, detection and localization of imageinpainting operations are essential. Recent research shows that high-pass filteringfull convolutional network (HPFCN) is applied to image inpainting detection andachieves good results. However, those methods did not consider the spatial location and channel information of the feature map. To solve these shortcomings, weintroduce the squeezed excitation blocks (SE) and propose a high-pass filter attention full convolutional network (HPACN). In feature extraction, we apply concurrent spatial and channel attention (scSE) to enhance feature extraction and obtainmore information. Channel attention (cSE) is introduced in upsampling toenhance detection and localization. The experimental results show that the proposed method can achieve improvement on ImageNet.
基金This work was supported in part by the Natural Science Foundation of China under Grant 62063004 and 61762033in part by the Hainan Provincial Natural Science Foundation of China under Grant 2019RC018 and 619QN246by the Postdoctoral Science Foundation under Grant 2020TQ0293.
文摘Recent applications of convolutional neural networks(CNNs)in single image super-resolution(SISR)have achieved unprecedented performance.However,existing CNN-based SISR network structure design consider mostly only channel or spatial information,and cannot make full use of both channel and spatial information to improve SISR performance further.The present work addresses this problem by proposing a mixed attention densely residual network architecture that can make full and simultaneous use of both channel and spatial information.Specifically,we propose a residual in dense network structure composed of dense connections between multiple dense residual groups to form a very deep network.This structure allows each dense residual group to apply a local residual skip connection and enables the cascading of multiple residual blocks to reuse previous features.A mixed attention module is inserted into each dense residual group,to enable the algorithm to fuse channel attention with laplacian spatial attention effectively,and thereby more adaptively focus on valuable feature learning.The qualitative and quantitative results of extensive experiments have demonstrate that the proposed method has a comparable performance with other stateof-the-art methods.
基金supported by the National Natural Science Foundation of China(62276192,62075169,62061160370)the Key Research and Development Program of Hubei Province(2020BAB113)。
文摘Image fusion aims to integrate complementary information in source images to synthesize a fused image comprehensively characterizing the imaging scene. However, existing image fusion algorithms are only applicable to strictly aligned source images and cause severe artifacts in the fusion results when input images have slight shifts or deformations. In addition,the fusion results typically only have good visual effect, but neglect the semantic requirements of high-level vision tasks.This study incorporates image registration, image fusion, and semantic requirements of high-level vision tasks into a single framework and proposes a novel image registration and fusion method, named Super Fusion. Specifically, we design a registration network to estimate bidirectional deformation fields to rectify geometric distortions of input images under the supervision of both photometric and end-point constraints. The registration and fusion are combined in a symmetric scheme, in which while mutual promotion can be achieved by optimizing the naive fusion loss, it is further enhanced by the mono-modal consistent constraint on symmetric fusion outputs. In addition, the image fusion network is equipped with the global spatial attention mechanism to achieve adaptive feature integration. Moreover, the semantic constraint based on the pre-trained segmentation model and Lovasz-Softmax loss is deployed to guide the fusion network to focus more on the semantic requirements of high-level vision tasks. Extensive experiments on image registration, image fusion,and semantic segmentation tasks demonstrate the superiority of our Super Fusion compared to the state-of-the-art alternatives.The source code and pre-trained model are publicly available at https://github.com/Linfeng-Tang/Super Fusion.
基金This research has been supported by NSFC(61672495)Scientific Research Fund of Hunan Provincial Education Department(16A208)+1 种基金Project of Hunan Provincial Science and Technology Department(2017SK2405)in part by the construct program of the key discipline in Hunan Province and the CERNET Innovation Project(NGII20170715).
文摘With the improvement of the national economic level,the number of vehicles is still increasing year by year.According to the statistics of National Bureau of Statics,the number is approximately up to 327 million in China by the end of 2018,which makes urban traffic pressure continues to rise so that the negative impact of urban traffic order is growing.Illegal parking-the common problem in the field of transportation security is urgent to be solved and traditional methods to address it are mainly based on ground loop and manual supervision,which may miss detection and cost much manpower.Due to the rapidly developing deep learning sweeping the world in recent years,object detection methods relying on background segmentation cannot meet the requirements of complex and various scenes on speed and precision.Thus,an improved Single Shot MultiBox Detector(SSD)based on deep learning is proposed in our study,we introduce attention mechanism by spatial transformer module which gives neural networks the ability to actively spatially transform feature maps and add contextual information transmission in specified layer.Finally,we found out the best connection layer in the detection model by repeated experiments especially for small objects and increased the precision by 1.5%than the baseline SSD without extra training cost.Meanwhile,we designed an illegal parking vehicle detection method by the improved SSD,reaching a high precision up to 97.3%and achieving a speed of 40FPS,superior to most of vehicle detection methods,will make contributions to relieving the negative impact of illegal parking.
文摘Background:It has been suggested that older adults show a reduced attentional field compared to younger adults.This may be attributed to a poorer utilization of peripheral vision(i.e.,peripheral attentional allocation)and a higher reliance on central vision compared to younger adults.To test this,we examined the importance of central,peri-foveal and near periphery information in younger and older adults by comparing their visual search performance while their central vision was blocked,in the presence of different sized artificial central scotomas.We tested participants in two versions of visual search,pop-out and serial search,because they require a different use of central and peripheral attention.Pop-out search relies on processing of the entire visual scene(i.e.,global processing)whereas serial search requires processing of each feature serially(i.e.,local processing).Methods:Thirteen healthy younger(M=21.8,SD=1.5)and 15 older adults(M=69.1 years,SD=7.3)performed a pop-out and a serial version of a visual search task in the presence of different sized gaze-contingent artificial central scotomas(no scotoma,3°diameter,5°and 7°).Participants were asked to indicate as quickly as possible whether a target was present or not among distractors whose number varied(16,32 or 64 objects).Results:We found evidence for a greater decline in peripheral processing in older adults compared to younger in pop-out but not in serial search.For the pop-out condition with no scotoma,we found that the further the target in the periphery,the longer the search time,and that this increase was proportionally greater for older adults compared to younger adults.Further,increases in scotoma size were associated with a greater increase in reaction times for older adults compared to younger participants.For the serial condition,both groups showed similar increases in reaction times with target distance from center and scotoma size.We surmise that this may be due to task difficulty in serial search;central vision is necessary for both groups.Conclusions:In conclusion,these findings suggest that,in global processing,older adults distribute more resources towards central vision compared to younger adults.
文摘Background:Research suggests that the analysis of facial expressions by a healthy brain would take place approximately 170 ms after the presentation of a facial expression in the superior temporal sulcus and the fusiform gyrus,mostly in the right hemisphere.Some researchers argue that a fast pathway through the amygdala would allow automatic and early emotional treatment around 90 ms after stimulation.This treatment would be done subconsciously,even before this stimulus is perceived and could be approximated by presenting the stimuli quickly on the periphery of the fovea.The present study aimed to identify the neural correlates of a peripheral and simultaneous presentation of emotional expressions through a frequency tagging paradigm.Methods:The presentation of emotional facial expressions at a specific frequency induces in the visual cortex a stable and precise response to the presentation frequency[i.e.,a steady-state visual evoked potential(ssVEP)]that can be used as a frequency tag(i.e.,a frequency-tag to follow the cortical treatment of this stimulus.Here,the use of different specific stimulation frequencies allowed us to label the different facial expressions presented simultaneously and to obtain a reliable cortical response being associated with(I)each of the emotions and(II)the different times of presentations repeated(1/0.170 ms=~5.8 Hz,1/0.090 ms=~10.8 Hz).To identify the regions involved in emotional discrimination,we subtracted the brain activity induced by the rapid presentation of six emotional expressions of the activity induced by the presentation of the same emotion(reduced by neural adaptation).The results were compared to the hemisphere in which attention was sought,emotion and frequency of stimulation.Results:The signal-to-noise ratio of the cerebral oscillations referring to the treatment of the expression of fear was stronger in the regions specific to the emotional treatment when they were presented in the subjects peripheral vision,unbeknownst to them.In addition,the peripheral emotional treatment of fear at 10.8 Hz was associated with greater activation within the Gamma 1 and 2 frequency bands in the expected regions(frontotemporal and T6),as well as desynchronization in the Alpha frequency bands for the temporal regions.This modulation of the spectral power is independent of the attentional request.Conclusions:These results suggest that the emotional stimulation of fear presented in the peripheral vision and outside the attentional framework elicit an increase in brain activity,especially in the temporal lobe.The localization of this activity as well as the optimal stimulation frequency found for this facial expression suggests that it is treated by the fast pathway of the magnocellular layers.
基金the National Natural Science Foundation of China (No.61673269)。
文摘Visual object tracking is an important issue that has received long-term attention in computer vision.The ability to effectively handle occlusion,especially severe occlusion,is an important aspect of evaluating the performance of object tracking algorithms in long-term tracking,and is of great significance to improving the robustness of object tracking algorithms.However,most object tracking algorithms lack a processing mechanism specifically for occlusion.In the case of occlusion,due to the lack of target information,it is necessary to predict the target position based on the motion trajectory.Kalman filtering and particle filtering can effectively predict the target motion state based on the historical motion information.A single object tracking method,called probabilistic discriminative model prediction(PrDiMP),is based on the spatial attention mechanism in complex scenes and occlusions.In order to improve the performance of PrDiMP,Kalman filtering,particle filtering and linear filtering are introduced.First,for the occlusion situation,Kalman filtering and particle filtering are respectively introduced to predict the object position,thereby replacing the detection result of the original tracking algorithm and stopping recursion of target model.Second,for detection-jump problem of similar objects in complex scenes,a linear filtering window is added.The evaluation results on the three datasets,including GOT-10k,UAV123 and LaSOT,and the visualization results on several videos,show that our algorithms have improved tracking performance under occlusion and the detection-jump is effectively suppressed.