Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion s...Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion scenarios. However, while continuously improving cross-modal feature extraction and fusion, ensuring the model’s detection speed is also a challenging issue. We have devised a deep learning network model for cross-modal pedestrian detection based on Resnet50, aiming to focus on more reliable features and enhance the model’s detection efficiency. This model employs a spatial attention mechanism to reweight the input visible light and infrared image data, enhancing the model’s focus on different spatial positions and sharing the weighted feature data across different modalities, thereby reducing the interference of multi-modal features. Subsequently, lightweight modules with depthwise separable convolution are incorporated to reduce the model’s parameter count and computational load through channel-wise and point-wise convolutions. The network model algorithm proposed in this paper was experimentally validated on the publicly available KAIST dataset and compared with other existing methods. The experimental results demonstrate that our approach achieves favorable performance in various complex environments, affirming the effectiveness of the multispectral pedestrian detection technology proposed in this paper.展开更多
The semantic segmentation of very high spatial resolution remote sensing images is difficult due to the complexity of interpreting the interactions between the objects in the scene. Indeed, effective segmentation requ...The semantic segmentation of very high spatial resolution remote sensing images is difficult due to the complexity of interpreting the interactions between the objects in the scene. Indeed, effective segmentation requires considering spatial local context and long-term dependencies. To address this problem, the proposed approach is inspired by the MAC-UNet network which is an extension of U-Net, densely connected combined with channel attention. The advantages of this solution are as follows: 4) The new model introduces a new attention called propagate attention to build an attention-based encoder. 2) The fusion of multi-scale information is achieved by a weighted linear combination of the attentions whose coefficients are learned during the training phase. 3) Introducing in the decoder, the Spatial-Channel-Global-Local block which is an attention layer that uniquely combines channel attention and spatial attention locally and globally. The performances of the model are evaluated on 2 datasets WHDLD and DLRSD and show results of mean intersection over union (mIoU) index in progress between 1.54% and 10.47% for DLRSD and between 1.04% and 4.37% for WHDLD compared with the most efficient algorithms with attention mechanisms like MAU-Net and transformers like TMNet.展开更多
Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model ...Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention(APSA)module compared to the Attenuated Spatial Pyramid module,which can increase the receptive field and enhance the information,and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results,and the model effectively improves the segmentation accuracy of small data sets.The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks,the algorithm has a better segmentation effect and segmentation accuracy,and its mIOU reaches 75.85%.Moreover,to verify the generality of the model and the effectiveness of the APSA module,experiments were conducted on the VOC 2012 data set,and the APSA module improved mIOU by about 12.2%.展开更多
Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variati...Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variations inUAV flight altitude,differences in object scales,as well as factors like flight speed and motion blur.To enhancethe detection efficacy of small targets in drone aerial imagery,we propose an enhanced You Only Look Onceversion 7(YOLOv7)algorithm based on multi-scale spatial context.We build the MSC-YOLO model,whichincorporates an additional prediction head,denoted as P2,to improve adaptability for small objects.We replaceconventional downsampling with a Spatial-to-Depth Convolutional Combination(CSPDC)module to mitigatethe loss of intricate feature details related to small objects.Furthermore,we propose a Spatial Context Pyramidwith Multi-Scale Attention(SCPMA)module,which captures spatial and channel-dependent features of smalltargets acrossmultiple scales.This module enhances the perception of spatial contextual features and the utilizationof multiscale feature information.On the Visdrone2023 and UAVDT datasets,MSC-YOLO achieves remarkableresults,outperforming the baseline method YOLOv7 by 3.0%in terms ofmean average precision(mAP).The MSCYOLOalgorithm proposed in this paper has demonstrated satisfactory performance in detecting small targets inUAV aerial photography,providing strong support for practical applications.展开更多
Recent studies describe a number of difficulties associated with attention deficit in children with reading disabilities. Information about visual-spatial attention mainly arises from studies using event-related poten...Recent studies describe a number of difficulties associated with attention deficit in children with reading disabilities. Information about visual-spatial attention mainly arises from studies using event-related potentials (ERPs) during Posner’s spatial cueing paradigm. This study aims to use neurofeedback with a special protocol for treating children with reading disabilities, and moreo-ver, to evaluate visual-spatial attention ability by means of Posner paradigm task and ERPs. The study was conducted in a single subject design in 20 sessions. Participants were 2 male children, aged between 10 - 12 years old, who completed twelve 30-min neurofeedback sessions. Repeated measurements were performed during the baseline, treatment, and post treatment phases. Results showed some improvement in Posner paradigm parameters (correct response, valid and invalid reaction times). Furthermore, grand average ERPs for both of the participants in each of the four conditions (Valid-right, Invalid-right, Valid-left and Invalid-left) were analyzed. The analysis of P3 component showed a reduction in latency, indicating an improvement in the timing of cognitive processes. In addition, the graphs showed a decrease in amplitude level, which meant easier processing than before.展开更多
车道线检测是保证自动驾驶安全性与稳定性的关键,为提高车道线检测的准确性,本文基于UFLD(Ultra Fast Structure-aware Deep Lane Detection)算法,结合DenseNet-121网络和空间注意力(Spatial Attention)机制,设计了一种DSA-UFLD模型实...车道线检测是保证自动驾驶安全性与稳定性的关键,为提高车道线检测的准确性,本文基于UFLD(Ultra Fast Structure-aware Deep Lane Detection)算法,结合DenseNet-121网络和空间注意力(Spatial Attention)机制,设计了一种DSA-UFLD模型实现车道线检测。在图像增强方面,使用图像亮度自适应增强算法提高欠曝图像的清晰度;在网络优化方面,用迁移学习模型DenseNet-121代替ResNet18提取图像特征,利用密集连接加强特征重用,并引入空间注意力机制提取图像的关键信息,其次在上采样中用转置卷积代替双线性插值,通过学习参数,更好地实现解码;在损失函数方面,通过改进结构损失,将车道线约束为二次曲线,改善了弯道场景下车道线的检测效果。实验结果表明,DSA-UFLD算法在保证检测速度的同时,提高了车道线的识别准确率,具有一定的应用价值。展开更多
Visual question answering(VQA)requires a deep understanding of images and their corresponding textual questions to answer questions about images more accurately.However,existing models tend to ignore the implicit know...Visual question answering(VQA)requires a deep understanding of images and their corresponding textual questions to answer questions about images more accurately.However,existing models tend to ignore the implicit knowledge in the images and focus only on the visual information in the images,which limits the understanding depth of the image content.The images contain more than just visual objects,some images contain textual information about the scene,and slightly more complex images contain relationships between individual visual objects.Firstly,this paper proposes a model using image description for feature enhancement.This model encodes images and their descriptions separately based on the question-guided coattention mechanism.This mechanism increases the feature representation of the model,enhancing the model’s ability for reasoning.In addition,this paper improves the bottom-up attention model by obtaining two image region features.After obtaining the two visual features and the spatial position information corresponding to each feature,concatenating the two features as the final image feature can better represent an image.Finally,the obtained spatial position information is processed to enable the model to perceive the size and relative position of each object in the image.Our best single model delivers a 74.16%overall accuracy on the VQA 2.0 dataset,our model even outperforms some multi-modal pre-training models with fewer images and a shorter time.展开更多
提高电气设备紫外图像分割精确度对设备放电程度的准确评估具有重要意义。由于存在噪声干扰与紫外光斑形状、大小不规则等问题,目标分割区域存在过分割和欠分割现象,因此提出一种基于多模块的VSA-UNet(VGG16Net, Improved SENet, and AS...提高电气设备紫外图像分割精确度对设备放电程度的准确评估具有重要意义。由于存在噪声干扰与紫外光斑形状、大小不规则等问题,目标分割区域存在过分割和欠分割现象,因此提出一种基于多模块的VSA-UNet(VGG16Net, Improved SENet, and ASPP based U-Net)分割网络。为强化网络特征提取能力,减少过分割现象,使用VGG16Net的卷积层代替U-Net网络的编码部分;将编码部分末端卷积层替换成空洞空间金字塔池化(Atrous Spatial Pyramid Pooling, ASPP)模块,获取紫外图像的多尺度信息,解决大区域的欠分割问题;在跳跃连接部分加入改进SENet模块,加强有用信息的提取,补充细节损失,提升整体网络性能。基于自建紫外图像数据集的实验表明,改进网络在分割紫外图像时平均交并比(Mean Intersection over Union, MIoU)达到81.78%,平均精确率为95.97%。与U-Net网络相比,提出的VSA-UNet模型明显提升了紫外图像分割的准确性。展开更多
Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to ach...Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks.Analysis of all features may cause information redundancy and heavy computational burden.Attention mechanism is a wise way to solve this problem.However,using single attention mechanism may cause incomplete concern of features.This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention mechanism method.In the case that the attention mechanism will cause the loss of the original features,a small portion of image features were added as compensation.For the attention mechanism of text features,a selfattention mechanism was introduced,and the internal structural features of sentences were strengthened to improve the overall model.The results show that attention mechanism and feature compensation add 6.1%accuracy to multimodal low-rank bilinear pooling network.展开更多
基金supported by the Henan Provincial Science and Technology Research Project under Grants 232102211006,232102210044,232102211017,232102210055 and 222102210214the Science and Technology Innovation Project of Zhengzhou University of Light Industry under Grant 23XNKJTD0205+1 种基金the Undergraduate Universities Smart Teaching Special Research Project of Henan Province under Grant Jiao Gao[2021]No.489-29the Doctor Natural Science Foundation of Zhengzhou University of Light Industry under Grants 2021BSJJ025 and 2022BSJJZK13.
文摘Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion scenarios. However, while continuously improving cross-modal feature extraction and fusion, ensuring the model’s detection speed is also a challenging issue. We have devised a deep learning network model for cross-modal pedestrian detection based on Resnet50, aiming to focus on more reliable features and enhance the model’s detection efficiency. This model employs a spatial attention mechanism to reweight the input visible light and infrared image data, enhancing the model’s focus on different spatial positions and sharing the weighted feature data across different modalities, thereby reducing the interference of multi-modal features. Subsequently, lightweight modules with depthwise separable convolution are incorporated to reduce the model’s parameter count and computational load through channel-wise and point-wise convolutions. The network model algorithm proposed in this paper was experimentally validated on the publicly available KAIST dataset and compared with other existing methods. The experimental results demonstrate that our approach achieves favorable performance in various complex environments, affirming the effectiveness of the multispectral pedestrian detection technology proposed in this paper.
文摘The semantic segmentation of very high spatial resolution remote sensing images is difficult due to the complexity of interpreting the interactions between the objects in the scene. Indeed, effective segmentation requires considering spatial local context and long-term dependencies. To address this problem, the proposed approach is inspired by the MAC-UNet network which is an extension of U-Net, densely connected combined with channel attention. The advantages of this solution are as follows: 4) The new model introduces a new attention called propagate attention to build an attention-based encoder. 2) The fusion of multi-scale information is achieved by a weighted linear combination of the attentions whose coefficients are learned during the training phase. 3) Introducing in the decoder, the Spatial-Channel-Global-Local block which is an attention layer that uniquely combines channel attention and spatial attention locally and globally. The performances of the model are evaluated on 2 datasets WHDLD and DLRSD and show results of mean intersection over union (mIoU) index in progress between 1.54% and 10.47% for DLRSD and between 1.04% and 4.37% for WHDLD compared with the most efficient algorithms with attention mechanisms like MAU-Net and transformers like TMNet.
文摘由于低照度图像具有对比度低、细节丢失严重、噪声大等缺点,现有的目标检测算法对低照度图像的检测效果不理想.为此,本文提出一种结合空间感知注意力机制和多尺度特征融合(Spatial-aware Attention Mechanism and Multi-Scale Feature Fusion,SAM-MSFF)的低照度目标检测方法 .该方法首先通过多尺度交互内存金字塔融合多尺度特征,增强低照度图像特征中的有效信息,并设置内存向量存储样本的特征,捕获样本之间的潜在关联性;然后,引入空间感知注意力机制获取特征在空间域的长距离上下文信息和局部信息,从而增强低照度图像中的目标特征,抑制背景信息和噪声的干扰;最后,利用多感受野增强模块扩张特征的感受野,对具有不同感受野的特征进行分组重加权计算,使检测网络根据输入的多尺度信息自适应地调整感受野的大小.在ExDark数据集上进行实验,本文方法的平均精度(mean Average Precision,mAP)达到77.04%,比现有的主流目标检测方法提高2.6%~14.34%.
基金Ministry of Science and Technology Basic Resources Survey Special Project,Grant/Award Number:2019FY100900High-level Hospital Construction Project,Grant/Award Number:DFJH2019015+2 种基金National Natural Science Foundation of China,Grant/Award Number:61871021Guangdong Natural Science Foundation,Grant/Award Number:2019A1515011676Beijing Key Laboratory of Robotics Bionic and Functional Research。
文摘Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention(APSA)module compared to the Attenuated Spatial Pyramid module,which can increase the receptive field and enhance the information,and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results,and the model effectively improves the segmentation accuracy of small data sets.The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks,the algorithm has a better segmentation effect and segmentation accuracy,and its mIOU reaches 75.85%.Moreover,to verify the generality of the model and the effectiveness of the APSA module,experiments were conducted on the VOC 2012 data set,and the APSA module improved mIOU by about 12.2%.
基金the Key Research and Development Program of Hainan Province(Grant Nos.ZDYF2023GXJS163,ZDYF2024GXJS014)National Natural Science Foundation of China(NSFC)(Grant Nos.62162022,62162024)+2 种基金the Major Science and Technology Project of Hainan Province(Grant No.ZDKJ2020012)Hainan Provincial Natural Science Foundation of China(Grant No.620MS021)Youth Foundation Project of Hainan Natural Science Foundation(621QN211).
文摘Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variations inUAV flight altitude,differences in object scales,as well as factors like flight speed and motion blur.To enhancethe detection efficacy of small targets in drone aerial imagery,we propose an enhanced You Only Look Onceversion 7(YOLOv7)algorithm based on multi-scale spatial context.We build the MSC-YOLO model,whichincorporates an additional prediction head,denoted as P2,to improve adaptability for small objects.We replaceconventional downsampling with a Spatial-to-Depth Convolutional Combination(CSPDC)module to mitigatethe loss of intricate feature details related to small objects.Furthermore,we propose a Spatial Context Pyramidwith Multi-Scale Attention(SCPMA)module,which captures spatial and channel-dependent features of smalltargets acrossmultiple scales.This module enhances the perception of spatial contextual features and the utilizationof multiscale feature information.On the Visdrone2023 and UAVDT datasets,MSC-YOLO achieves remarkableresults,outperforming the baseline method YOLOv7 by 3.0%in terms ofmean average precision(mAP).The MSCYOLOalgorithm proposed in this paper has demonstrated satisfactory performance in detecting small targets inUAV aerial photography,providing strong support for practical applications.
文摘Recent studies describe a number of difficulties associated with attention deficit in children with reading disabilities. Information about visual-spatial attention mainly arises from studies using event-related potentials (ERPs) during Posner’s spatial cueing paradigm. This study aims to use neurofeedback with a special protocol for treating children with reading disabilities, and moreo-ver, to evaluate visual-spatial attention ability by means of Posner paradigm task and ERPs. The study was conducted in a single subject design in 20 sessions. Participants were 2 male children, aged between 10 - 12 years old, who completed twelve 30-min neurofeedback sessions. Repeated measurements were performed during the baseline, treatment, and post treatment phases. Results showed some improvement in Posner paradigm parameters (correct response, valid and invalid reaction times). Furthermore, grand average ERPs for both of the participants in each of the four conditions (Valid-right, Invalid-right, Valid-left and Invalid-left) were analyzed. The analysis of P3 component showed a reduction in latency, indicating an improvement in the timing of cognitive processes. In addition, the graphs showed a decrease in amplitude level, which meant easier processing than before.
文摘车道线检测是保证自动驾驶安全性与稳定性的关键,为提高车道线检测的准确性,本文基于UFLD(Ultra Fast Structure-aware Deep Lane Detection)算法,结合DenseNet-121网络和空间注意力(Spatial Attention)机制,设计了一种DSA-UFLD模型实现车道线检测。在图像增强方面,使用图像亮度自适应增强算法提高欠曝图像的清晰度;在网络优化方面,用迁移学习模型DenseNet-121代替ResNet18提取图像特征,利用密集连接加强特征重用,并引入空间注意力机制提取图像的关键信息,其次在上采样中用转置卷积代替双线性插值,通过学习参数,更好地实现解码;在损失函数方面,通过改进结构损失,将车道线约束为二次曲线,改善了弯道场景下车道线的检测效果。实验结果表明,DSA-UFLD算法在保证检测速度的同时,提高了车道线的识别准确率,具有一定的应用价值。
基金supported in part by the National Natural Science Foundation of China under Grant U1911401.
文摘Visual question answering(VQA)requires a deep understanding of images and their corresponding textual questions to answer questions about images more accurately.However,existing models tend to ignore the implicit knowledge in the images and focus only on the visual information in the images,which limits the understanding depth of the image content.The images contain more than just visual objects,some images contain textual information about the scene,and slightly more complex images contain relationships between individual visual objects.Firstly,this paper proposes a model using image description for feature enhancement.This model encodes images and their descriptions separately based on the question-guided coattention mechanism.This mechanism increases the feature representation of the model,enhancing the model’s ability for reasoning.In addition,this paper improves the bottom-up attention model by obtaining two image region features.After obtaining the two visual features and the spatial position information corresponding to each feature,concatenating the two features as the final image feature can better represent an image.Finally,the obtained spatial position information is processed to enable the model to perceive the size and relative position of each object in the image.Our best single model delivers a 74.16%overall accuracy on the VQA 2.0 dataset,our model even outperforms some multi-modal pre-training models with fewer images and a shorter time.
文摘提高电气设备紫外图像分割精确度对设备放电程度的准确评估具有重要意义。由于存在噪声干扰与紫外光斑形状、大小不规则等问题,目标分割区域存在过分割和欠分割现象,因此提出一种基于多模块的VSA-UNet(VGG16Net, Improved SENet, and ASPP based U-Net)分割网络。为强化网络特征提取能力,减少过分割现象,使用VGG16Net的卷积层代替U-Net网络的编码部分;将编码部分末端卷积层替换成空洞空间金字塔池化(Atrous Spatial Pyramid Pooling, ASPP)模块,获取紫外图像的多尺度信息,解决大区域的欠分割问题;在跳跃连接部分加入改进SENet模块,加强有用信息的提取,补充细节损失,提升整体网络性能。基于自建紫外图像数据集的实验表明,改进网络在分割紫外图像时平均交并比(Mean Intersection over Union, MIoU)达到81.78%,平均精确率为95.97%。与U-Net网络相比,提出的VSA-UNet模型明显提升了紫外图像分割的准确性。
基金This work was supported by the Sichuan Science and Technology Program(2021YFQ0003).
文摘Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks.Analysis of all features may cause information redundancy and heavy computational burden.Attention mechanism is a wise way to solve this problem.However,using single attention mechanism may cause incomplete concern of features.This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention mechanism method.In the case that the attention mechanism will cause the loss of the original features,a small portion of image features were added as compensation.For the attention mechanism of text features,a selfattention mechanism was introduced,and the internal structural features of sentences were strengthened to improve the overall model.The results show that attention mechanism and feature compensation add 6.1%accuracy to multimodal low-rank bilinear pooling network.