Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion s...Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion scenarios. However, while continuously improving cross-modal feature extraction and fusion, ensuring the model’s detection speed is also a challenging issue. We have devised a deep learning network model for cross-modal pedestrian detection based on Resnet50, aiming to focus on more reliable features and enhance the model’s detection efficiency. This model employs a spatial attention mechanism to reweight the input visible light and infrared image data, enhancing the model’s focus on different spatial positions and sharing the weighted feature data across different modalities, thereby reducing the interference of multi-modal features. Subsequently, lightweight modules with depthwise separable convolution are incorporated to reduce the model’s parameter count and computational load through channel-wise and point-wise convolutions. The network model algorithm proposed in this paper was experimentally validated on the publicly available KAIST dataset and compared with other existing methods. The experimental results demonstrate that our approach achieves favorable performance in various complex environments, affirming the effectiveness of the multispectral pedestrian detection technology proposed in this paper.展开更多
Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model ...Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention(APSA)module compared to the Attenuated Spatial Pyramid module,which can increase the receptive field and enhance the information,and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results,and the model effectively improves the segmentation accuracy of small data sets.The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks,the algorithm has a better segmentation effect and segmentation accuracy,and its mIOU reaches 75.85%.Moreover,to verify the generality of the model and the effectiveness of the APSA module,experiments were conducted on the VOC 2012 data set,and the APSA module improved mIOU by about 12.2%.展开更多
Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to ach...Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks.Analysis of all features may cause information redundancy and heavy computational burden.Attention mechanism is a wise way to solve this problem.However,using single attention mechanism may cause incomplete concern of features.This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention mechanism method.In the case that the attention mechanism will cause the loss of the original features,a small portion of image features were added as compensation.For the attention mechanism of text features,a selfattention mechanism was introduced,and the internal structural features of sentences were strengthened to improve the overall model.The results show that attention mechanism and feature compensation add 6.1%accuracy to multimodal low-rank bilinear pooling network.展开更多
In order to reduce the risk of non-performing loans, losses, and improve the loan approval efficiency, it is necessary to establish an intelligent loan risk and approval prediction system. A hybrid deep learning model...In order to reduce the risk of non-performing loans, losses, and improve the loan approval efficiency, it is necessary to establish an intelligent loan risk and approval prediction system. A hybrid deep learning model with 1DCNN-attention network and the enhanced preprocessing techniques is proposed for loan approval prediction. Our proposed model consists of the enhanced data preprocessing and stacking of multiple hybrid modules. Initially, the enhanced data preprocessing techniques using a combination of methods such as standardization, SMOTE oversampling, feature construction, recursive feature elimination (RFE), information value (IV) and principal component analysis (PCA), which not only eliminates the effects of data jitter and non-equilibrium, but also removes redundant features while improving the representation of features. Subsequently, a hybrid module that combines a 1DCNN with an attention mechanism is proposed to extract local and global spatio-temporal features. Finally, the comprehensive experiments conducted validate that the proposed model surpasses state-of-the-art baseline models across various performance metrics, including accuracy, precision, recall, F1 score, and AUC. Our proposed model helps to automate the loan approval process and provides scientific guidance to financial institutions for loan risk control.展开更多
Top-down attention mechanisms require the selection of specificobjects or locations;however,the brain mechanism involved when attention is allocated across different modalities is not well understood.The aim of this s...Top-down attention mechanisms require the selection of specificobjects or locations;however,the brain mechanism involved when attention is allocated across different modalities is not well understood.The aim of this study was to use functional magnetic resonance imaging to define the neural mechanisms underlyingdivided and selective spatial attention.A concurrent audiovisual stimulus was used,and subjects were prompted to focus on a visual,auditory and audiovisual stimulus in a Posner paradigm.Ourbehavioral results confirmed the better performance of selectiveattention compared to devided attention.We found differences in the activation level of the frontoparietal network,visual/auditorycortex,the putamen and the salience network under differentattention conditions.We further used Granger causality(GC)toexplore effective connectivity differences between tasks.Differences in GC connectivity between visual and auditory selective tasksreflected the visual dominance effect under spatial attention.In addition,our results supported the role of the putamen inredistributing attention and the functional separation of the saliencenetwork.In summary,we explored the audiovisual top-down allocation of attention and observed the differences in neuralmechanisms under endogenous attention modes,which revealedthe differences in cross-modal expression in visual and auditory attention under attentional modulation.展开更多
针对多视图三维重建任务中点云完整性欠佳的问题,提出一种基于空间传播的多视图深度估计网络(SPMVSNet)。引入空间传播思想用于复杂条件下的稠密点云重建,并分别设计基于空间传播的混合深度假设策略和空间感知优化模块。混合深度假设策...针对多视图三维重建任务中点云完整性欠佳的问题,提出一种基于空间传播的多视图深度估计网络(SPMVSNet)。引入空间传播思想用于复杂条件下的稠密点云重建,并分别设计基于空间传播的混合深度假设策略和空间感知优化模块。混合深度假设策略采用由粗糙到精细的深度推理方式,将深度估计视为多标签分类任务,对正则化概率体执行交叉熵损失以约束代价体,从而避免回归方法过拟合和收敛速度过慢的问题。空间感知优化模块从包含高级语义特征表示的特征图中获得引导,在进行置信度检查后采用卷积空间传播网络,通过构建亲和矩阵来细化最终的深度图。同时,为解决大多数方法存在的对不满足多视图一致性的不可靠区域重建质量较低的问题,进一步结合注意力机制设计具有样本自适应能力的动态特征提取网络,用于增强模型的局部感知能力。实验结果表明,在DTU数据集上,SP-MVSNet的重建完整性相比于CVP-MVSNet提升32.8%,整体质量提升11.4%。在Tanks and Temples基准和Blended MVS数据集上,SP-MVSNet的表现也优于大多数已知方法,取得了良好的三维重建效果。展开更多
基金supported by the Henan Provincial Science and Technology Research Project under Grants 232102211006,232102210044,232102211017,232102210055 and 222102210214the Science and Technology Innovation Project of Zhengzhou University of Light Industry under Grant 23XNKJTD0205+1 种基金the Undergraduate Universities Smart Teaching Special Research Project of Henan Province under Grant Jiao Gao[2021]No.489-29the Doctor Natural Science Foundation of Zhengzhou University of Light Industry under Grants 2021BSJJ025 and 2022BSJJZK13.
文摘Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion scenarios. However, while continuously improving cross-modal feature extraction and fusion, ensuring the model’s detection speed is also a challenging issue. We have devised a deep learning network model for cross-modal pedestrian detection based on Resnet50, aiming to focus on more reliable features and enhance the model’s detection efficiency. This model employs a spatial attention mechanism to reweight the input visible light and infrared image data, enhancing the model’s focus on different spatial positions and sharing the weighted feature data across different modalities, thereby reducing the interference of multi-modal features. Subsequently, lightweight modules with depthwise separable convolution are incorporated to reduce the model’s parameter count and computational load through channel-wise and point-wise convolutions. The network model algorithm proposed in this paper was experimentally validated on the publicly available KAIST dataset and compared with other existing methods. The experimental results demonstrate that our approach achieves favorable performance in various complex environments, affirming the effectiveness of the multispectral pedestrian detection technology proposed in this paper.
基金Ministry of Science and Technology Basic Resources Survey Special Project,Grant/Award Number:2019FY100900High-level Hospital Construction Project,Grant/Award Number:DFJH2019015+2 种基金National Natural Science Foundation of China,Grant/Award Number:61871021Guangdong Natural Science Foundation,Grant/Award Number:2019A1515011676Beijing Key Laboratory of Robotics Bionic and Functional Research。
文摘Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention(APSA)module compared to the Attenuated Spatial Pyramid module,which can increase the receptive field and enhance the information,and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results,and the model effectively improves the segmentation accuracy of small data sets.The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks,the algorithm has a better segmentation effect and segmentation accuracy,and its mIOU reaches 75.85%.Moreover,to verify the generality of the model and the effectiveness of the APSA module,experiments were conducted on the VOC 2012 data set,and the APSA module improved mIOU by about 12.2%.
基金This work was supported by the Sichuan Science and Technology Program(2021YFQ0003).
文摘Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks.Analysis of all features may cause information redundancy and heavy computational burden.Attention mechanism is a wise way to solve this problem.However,using single attention mechanism may cause incomplete concern of features.This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention mechanism method.In the case that the attention mechanism will cause the loss of the original features,a small portion of image features were added as compensation.For the attention mechanism of text features,a selfattention mechanism was introduced,and the internal structural features of sentences were strengthened to improve the overall model.The results show that attention mechanism and feature compensation add 6.1%accuracy to multimodal low-rank bilinear pooling network.
文摘In order to reduce the risk of non-performing loans, losses, and improve the loan approval efficiency, it is necessary to establish an intelligent loan risk and approval prediction system. A hybrid deep learning model with 1DCNN-attention network and the enhanced preprocessing techniques is proposed for loan approval prediction. Our proposed model consists of the enhanced data preprocessing and stacking of multiple hybrid modules. Initially, the enhanced data preprocessing techniques using a combination of methods such as standardization, SMOTE oversampling, feature construction, recursive feature elimination (RFE), information value (IV) and principal component analysis (PCA), which not only eliminates the effects of data jitter and non-equilibrium, but also removes redundant features while improving the representation of features. Subsequently, a hybrid module that combines a 1DCNN with an attention mechanism is proposed to extract local and global spatio-temporal features. Finally, the comprehensive experiments conducted validate that the proposed model surpasses state-of-the-art baseline models across various performance metrics, including accuracy, precision, recall, F1 score, and AUC. Our proposed model helps to automate the loan approval process and provides scientific guidance to financial institutions for loan risk control.
基金The study was supported by the National Natural Science Foundation of China(Grant Nos.62171300,61727807).
文摘Top-down attention mechanisms require the selection of specificobjects or locations;however,the brain mechanism involved when attention is allocated across different modalities is not well understood.The aim of this study was to use functional magnetic resonance imaging to define the neural mechanisms underlyingdivided and selective spatial attention.A concurrent audiovisual stimulus was used,and subjects were prompted to focus on a visual,auditory and audiovisual stimulus in a Posner paradigm.Ourbehavioral results confirmed the better performance of selectiveattention compared to devided attention.We found differences in the activation level of the frontoparietal network,visual/auditorycortex,the putamen and the salience network under differentattention conditions.We further used Granger causality(GC)toexplore effective connectivity differences between tasks.Differences in GC connectivity between visual and auditory selective tasksreflected the visual dominance effect under spatial attention.In addition,our results supported the role of the putamen inredistributing attention and the functional separation of the saliencenetwork.In summary,we explored the audiovisual top-down allocation of attention and observed the differences in neuralmechanisms under endogenous attention modes,which revealedthe differences in cross-modal expression in visual and auditory attention under attentional modulation.
文摘针对多视图三维重建任务中点云完整性欠佳的问题,提出一种基于空间传播的多视图深度估计网络(SPMVSNet)。引入空间传播思想用于复杂条件下的稠密点云重建,并分别设计基于空间传播的混合深度假设策略和空间感知优化模块。混合深度假设策略采用由粗糙到精细的深度推理方式,将深度估计视为多标签分类任务,对正则化概率体执行交叉熵损失以约束代价体,从而避免回归方法过拟合和收敛速度过慢的问题。空间感知优化模块从包含高级语义特征表示的特征图中获得引导,在进行置信度检查后采用卷积空间传播网络,通过构建亲和矩阵来细化最终的深度图。同时,为解决大多数方法存在的对不满足多视图一致性的不可靠区域重建质量较低的问题,进一步结合注意力机制设计具有样本自适应能力的动态特征提取网络,用于增强模型的局部感知能力。实验结果表明,在DTU数据集上,SP-MVSNet的重建完整性相比于CVP-MVSNet提升32.8%,整体质量提升11.4%。在Tanks and Temples基准和Blended MVS数据集上,SP-MVSNet的表现也优于大多数已知方法,取得了良好的三维重建效果。
文摘由于低照度图像具有对比度低、细节丢失严重、噪声大等缺点,现有的目标检测算法对低照度图像的检测效果不理想.为此,本文提出一种结合空间感知注意力机制和多尺度特征融合(Spatial-aware Attention Mechanism and Multi-Scale Feature Fusion,SAM-MSFF)的低照度目标检测方法 .该方法首先通过多尺度交互内存金字塔融合多尺度特征,增强低照度图像特征中的有效信息,并设置内存向量存储样本的特征,捕获样本之间的潜在关联性;然后,引入空间感知注意力机制获取特征在空间域的长距离上下文信息和局部信息,从而增强低照度图像中的目标特征,抑制背景信息和噪声的干扰;最后,利用多感受野增强模块扩张特征的感受野,对具有不同感受野的特征进行分组重加权计算,使检测网络根据输入的多尺度信息自适应地调整感受野的大小.在ExDark数据集上进行实验,本文方法的平均精度(mean Average Precision,mAP)达到77.04%,比现有的主流目标检测方法提高2.6%~14.34%.