期刊文献+
共找到2,405篇文章
< 1 2 121 >
每页显示 20 50 100
Attention Guided Multi Scale Feature Fusion Network for Automatic Prostate Segmentation
1
作者 Yuchun Li Mengxing Huang +1 位作者 Yu Zhang Zhiming Bai 《Computers, Materials & Continua》 SCIE EI 2024年第2期1649-1668,共20页
The precise and automatic segmentation of prostate magnetic resonance imaging(MRI)images is vital for assisting doctors in diagnosing prostate diseases.In recent years,many advanced methods have been applied to prosta... The precise and automatic segmentation of prostate magnetic resonance imaging(MRI)images is vital for assisting doctors in diagnosing prostate diseases.In recent years,many advanced methods have been applied to prostate segmentation,but due to the variability caused by prostate diseases,automatic segmentation of the prostate presents significant challenges.In this paper,we propose an attention-guided multi-scale feature fusion network(AGMSF-Net)to segment prostate MRI images.We propose an attention mechanism for extracting multi-scale features,and introduce a 3D transformer module to enhance global feature representation by adding it during the transition phase from encoder to decoder.In the decoder stage,a feature fusion module is proposed to obtain global context information.We evaluate our model on MRI images of the prostate acquired from a local hospital.The relative volume difference(RVD)and dice similarity coefficient(DSC)between the results of automatic prostate segmentation and ground truth were 1.21%and 93.68%,respectively.To quantitatively evaluate prostate volume on MRI,which is of significant clinical significance,we propose a unique AGMSF-Net.The essential performance evaluation and validation experiments have demonstrated the effectiveness of our method in automatic prostate segmentation. 展开更多
关键词 Prostate segmentation multi-scale attention 3D Transformer feature fusion MRI
下载PDF
Attention Guided Food Recognition via Multi-Stage Local Feature Fusion
2
作者 Gonghui Deng Dunzhi Wu Weizhen Chen 《Computers, Materials & Continua》 SCIE EI 2024年第8期1985-2003,共19页
The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregula... The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregular and multi-scale nature of food images.Addressing these complexities,our study introduces an advanced model that leverages multiple attention mechanisms and multi-stage local fusion,grounded in the ConvNeXt architecture.Our model employs hybrid attention(HA)mechanisms to pinpoint critical discriminative regions within images,substantially mitigating the influence of background noise.Furthermore,it introduces a multi-stage local fusion(MSLF)module,fostering long-distance dependencies between feature maps at varying stages.This approach facilitates the assimilation of complementary features across scales,significantly bolstering the model’s capacity for feature extraction.Furthermore,we constructed a dataset named Roushi60,which consists of 60 different categories of common meat dishes.Empirical evaluation of the ETH Food-101,ChineseFoodNet,and Roushi60 datasets reveals that our model achieves recognition accuracies of 91.12%,82.86%,and 92.50%,respectively.These figures not only mark an improvement of 1.04%,3.42%,and 1.36%over the foundational ConvNeXt network but also surpass the performance of most contemporary food image recognition methods.Such advancements underscore the efficacy of our proposed model in navigating the intricate landscape of food image recognition,setting a new benchmark for the field. 展开更多
关键词 Fine-grained image recognition food image recognition attention mechanism local feature fusion
下载PDF
Deep Global Multiple-Scale and Local Patches Attention Dual-Branch Network for Pose-Invariant Facial Expression Recognition
3
作者 Chaoji Liu Xingqiao Liu +1 位作者 Chong Chen Kang Zhou 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第4期405-440,共36页
Pose-invariant facial expression recognition(FER)is an active but challenging research topic in computer vision.Especially with the involvement of diverse observation angles,FER makes the training parameter models inc... Pose-invariant facial expression recognition(FER)is an active but challenging research topic in computer vision.Especially with the involvement of diverse observation angles,FER makes the training parameter models inconsistent from one view to another.This study develops a deep global multiple-scale and local patches attention(GMS-LPA)dual-branch network for pose-invariant FER to weaken the influence of pose variation and selfocclusion on recognition accuracy.In this research,the designed GMS-LPA network contains four main parts,i.e.,the feature extraction module,the global multiple-scale(GMS)module,the local patches attention(LPA)module,and the model-level fusion model.The feature extraction module is designed to extract and normalize texture information to the same size.The GMS model can extract deep global features with different receptive fields,releasing the sensitivity of deeper convolution layers to pose-variant and self-occlusion.The LPA module is built to force the network to focus on local salient features,which can lower the effect of pose variation and self-occlusion on recognition results.Subsequently,the extracted features are fused with a model-level strategy to improve recognition accuracy.Extensive experimentswere conducted on four public databases,and the recognition results demonstrated the feasibility and validity of the proposed methods. 展开更多
关键词 Pose-invariant FER global multiple-scale(GMS) local patches attention(LPA) model-level fusion
下载PDF
Fusion of Convolutional Self-Attention and Cross-Dimensional Feature Transformationfor Human Posture Estimation
4
作者 Anzhan Liu Yilu Ding Xiangyang Lu 《Journal of Beijing Institute of Technology》 EI CAS 2024年第4期346-360,共15页
Human posture estimation is a prominent research topic in the fields of human-com-puter interaction,motion recognition,and other intelligent applications.However,achieving highaccuracy in key point localization,which ... Human posture estimation is a prominent research topic in the fields of human-com-puter interaction,motion recognition,and other intelligent applications.However,achieving highaccuracy in key point localization,which is crucial for intelligent applications,contradicts the lowdetection accuracy of human posture detection models in practical scenarios.To address this issue,a human pose estimation network called AT-HRNet has been proposed,which combines convolu-tional self-attention and cross-dimensional feature transformation.AT-HRNet captures significantfeature information from various regions in an adaptive manner,aggregating them through convolu-tional operations within the local receptive domain.The residual structures TripNeck and Trip-Block of the high-resolution network are designed to further refine the key point locations,wherethe attention weight is adjusted by a cross-dimensional interaction to obtain more features.To vali-date the effectiveness of this network,AT-HRNet was evaluated using the COCO2017 dataset.Theresults show that AT-HRNet outperforms HRNet by improving 3.2%in mAP,4.0%in AP75,and3.9%in AP^(M).This suggests that AT-HRNet can offer more beneficial solutions for human posture estimation. 展开更多
关键词 human posture estimation adaptive fusion method cross-dimensional interaction attention module high-resolution network
下载PDF
A Lightweight Multiscale Feature Fusion Network for Solar Cell Defect Detection
5
作者 Xiaoyun Chen Lanyao Zhang +3 位作者 Xiaoling Chen Yigang Cen Linna Zhang Fugui Zhang 《Computers, Materials & Continua》 SCIE EI 2025年第1期521-542,共22页
Solar cell defect detection is crucial for quality inspection in photovoltaic power generation modules.In the production process,defect samples occur infrequently and exhibit random shapes and sizes,which makes it cha... Solar cell defect detection is crucial for quality inspection in photovoltaic power generation modules.In the production process,defect samples occur infrequently and exhibit random shapes and sizes,which makes it challenging to collect defective samples.Additionally,the complex surface background of polysilicon cell wafers complicates the accurate identification and localization of defective regions.This paper proposes a novel Lightweight Multiscale Feature Fusion network(LMFF)to address these challenges.The network comprises a feature extraction network,a multi-scale feature fusion module(MFF),and a segmentation network.Specifically,a feature extraction network is proposed to obtain multi-scale feature outputs,and a multi-scale feature fusion module(MFF)is used to fuse multi-scale feature information effectively.In order to capture finer-grained multi-scale information from the fusion features,we propose a multi-scale attention module(MSA)in the segmentation network to enhance the network’s ability for small target detection.Moreover,depthwise separable convolutions are introduced to construct depthwise separable residual blocks(DSR)to reduce the model’s parameter number.Finally,to validate the proposed method’s defect segmentation and localization performance,we constructed three solar cell defect detection datasets:SolarCells,SolarCells-S,and PVEL-S.SolarCells and SolarCells-S are monocrystalline silicon datasets,and PVEL-S is a polycrystalline silicon dataset.Experimental results show that the IOU of our method on these three datasets can reach 68.5%,51.0%,and 92.7%,respectively,and the F1-Score can reach 81.3%,67.5%,and 96.2%,respectively,which surpasses other commonly usedmethods and verifies the effectiveness of our LMFF network. 展开更多
关键词 Defect segmentation multi-scale feature fusion multi-scale attention depthwise separable residual block
下载PDF
基于MCB-FAH-YOLOv8的钢材表面缺陷检测算法 被引量:9
6
作者 崔克彬 焦静颐 《图学学报》 CSCD 北大核心 2024年第1期112-125,共14页
针对现有基于深度学习的钢材表面缺陷检测算法存在误检、漏检和检测精度低等问题,提出一种基于改进CBAM(modified CBAM,MCB)和可替换四头ASFF预测头(four-head ASFF prediction head,FAH)的YOLOv8钢材表面缺陷检测算法,简记为MCB-FAH-YO... 针对现有基于深度学习的钢材表面缺陷检测算法存在误检、漏检和检测精度低等问题,提出一种基于改进CBAM(modified CBAM,MCB)和可替换四头ASFF预测头(four-head ASFF prediction head,FAH)的YOLOv8钢材表面缺陷检测算法,简记为MCB-FAH-YOLOv8。通过加入改进后的卷积注意力机制模块(CBAM)对密集目标更好的确定;通过将FPN结构改为BiFPN更加高效的提取上下文信息;通过增加自适应特征融合(ASFF)自动找出最适合的融合特征;通过将SPPF模块替换为精度更高的SimCSPSPPF模块。同时,针对微小物体检测,提出了四头ASFF预测头,可根据数据集特点进行替换。实验结果表明,MCB-FAH-YOLOv8算法在VOC2007数据集上检测精度(mAP)达到了88.8%,在NEU-DET钢铁缺陷检测数据集上检测精度(mAP)达到了81.8%,较基准模型分别提高了5.1%和3.4%,该算法在牺牲较少检测速度的情况下取得较高的检测精度,很好的平衡了算法的精度和速度。 展开更多
关键词 MCB-faH-YOLOv8 缺陷检测 注意力机制 四头ASFF预测头 特征融合
下载PDF
多尺度特征和极化自注意力的Faster-RCNN水漂垃圾识别
7
作者 蒋占军 吴佰靖 +1 位作者 马龙 廉敬 《计算机应用》 CSCD 北大核心 2024年第3期938-944,共7页
针对小目标水漂垃圾形态多变、分辨率低且信息有限,导致检测效果不理想的问题,提出一种改进的Faster-RCNN(Faster Regions with Convolutional Neural Network)水漂垃圾检测算法MP-Faster-RCNN(Faster-RCNN with Multi-scale feature an... 针对小目标水漂垃圾形态多变、分辨率低且信息有限,导致检测效果不理想的问题,提出一种改进的Faster-RCNN(Faster Regions with Convolutional Neural Network)水漂垃圾检测算法MP-Faster-RCNN(Faster-RCNN with Multi-scale feature and Polarized self-attention)。首先,建立黄河兰州段小目标水漂垃圾数据集,将空洞卷积结合ResNet-50代替原来的VGG-16(Visual Geometry Group 16)作为主干特征提取网络,扩大感受野以提取更多小目标特征;其次,在区域生成网络(RPN)利用多尺度特征,设置3×3和1×1的两层卷积,补偿单一滑动窗口造成的特征丢失;最后,在RPN前加入极化自注意力,进一步利用多尺度和通道特征提取更细粒度的多尺度空间信息和通道间依赖关系,生成具有全局特征的特征图,实现更精确的目标框定位。实验结果表明,MP-Faster-RCNN能有效提高水漂垃圾检测精度,与原始Faster-RCNN相比,平均精度均值(mAP)提高了6.37个百分点,模型大小从521 MB降到了108 MB,且在同一训练批次下收敛更快。 展开更多
关键词 目标检测 水漂垃圾 faster-RCNN 空洞卷积 多尺度特征融合 极化自注意力
下载PDF
基于改进Centerfusion的自动驾驶3D目标检测模型
8
作者 黄俊 刘家森 《无线电工程》 2024年第2期507-514,共8页
针对自动驾驶路面上目标漏检和错检的问题,提出一种基于改进Centerfusion的自动驾驶3D目标检测模型。该模型通过将相机信息和雷达特征融合,构成多通道特征数据输入,从而增强目标检测网络的鲁棒性,减少漏检问题;为了能够得到更加准确丰富... 针对自动驾驶路面上目标漏检和错检的问题,提出一种基于改进Centerfusion的自动驾驶3D目标检测模型。该模型通过将相机信息和雷达特征融合,构成多通道特征数据输入,从而增强目标检测网络的鲁棒性,减少漏检问题;为了能够得到更加准确丰富的3D目标检测信息,引入了改进的注意力机制,用于增强视锥网格中的雷达点云和视觉信息融合;使用改进的损失函数优化边框预测的准确度。在Nuscenes数据集上进行模型验证和对比,实验结果表明,相较于传统的Centerfusion模型,提出的模型平均检测精度均值(mean Average Precision,mAP)提高了1.3%,Nuscenes检测分数(Nuscenes Detection Scores,NDS)提高了1.2%。 展开更多
关键词 传感器融合 3D目标检测 注意力机制 毫米波雷达
下载PDF
AF-Net:A Medical Image Segmentation Network Based on Attention Mechanism and Feature Fusion 被引量:4
9
作者 Guimin Hou Jiaohua Qin +2 位作者 Xuyu Xiang Yun Tan Neal N.Xiong 《Computers, Materials & Continua》 SCIE EI 2021年第11期1877-1891,共15页
Medical image segmentation is an important application field of computer vision in medical image processing.Due to the close location and high similarity of different organs in medical images,the current segmentation ... Medical image segmentation is an important application field of computer vision in medical image processing.Due to the close location and high similarity of different organs in medical images,the current segmentation algorithms have problems with mis-segmentation and poor edge segmentation.To address these challenges,we propose a medical image segmentation network(AF-Net)based on attention mechanism and feature fusion,which can effectively capture global information while focusing the network on the object area.In this approach,we add dual attention blocks(DA-block)to the backbone network,which comprises parallel channels and spatial attention branches,to adaptively calibrate and weigh features.Secondly,the multi-scale feature fusion block(MFF-block)is proposed to obtain feature maps of different receptive domains and get multi-scale information with less computational consumption.Finally,to restore the locations and shapes of organs,we adopt the global feature fusion blocks(GFF-block)to fuse high-level and low-level information,which can obtain accurate pixel positioning.We evaluate our method on multiple datasets(the aorta and lungs dataset),and the experimental results achieve 94.0%in mIoU and 96.3%in DICE,showing that our approach performs better than U-Net and other state-of-art methods. 展开更多
关键词 Deep learning medical image segmentation feature fusion attention mechanism
下载PDF
基于改进蜣螂算法优化CNN-BiLSTM-Attention的串联电弧故障检测方法
10
作者 李海波 《电器与能效管理技术》 2024年第8期57-68,共12页
针对故障电弧特征提取不足、检测精度不高等问题,提出一种多特征融合的改进蜣螂算法(IDBO)优化融合注意力(Attention)机制的卷积神经网络(CNN)和双向长短期记忆(BiLSTM)神经网络的串联电弧故障检测方法。通过实验平台提取电流的时域、... 针对故障电弧特征提取不足、检测精度不高等问题,提出一种多特征融合的改进蜣螂算法(IDBO)优化融合注意力(Attention)机制的卷积神经网络(CNN)和双向长短期记忆(BiLSTM)神经网络的串联电弧故障检测方法。通过实验平台提取电流的时域、频域、时频域以及信号自回归参数模型特征;利用核主成分分析(KPCA)对特征进行降维融合,并将求取的特征向量作为CNN-BiLSTM-Attention的输入向量;引入Cubic混沌映射、螺旋搜索策略、动态权重系数、高斯柯西变异策略对蜣螂算法进行改进,利用改进蜣螂算法对CNN-BiLSTM-Attention超参数优化实现串联电弧故障诊断。结果表明,所提方法故障电弧检测准确率达到97.92%,可高效识别串联电弧故障。 展开更多
关键词 电弧故障 改进蜣螂算法 多特征融合 CNN-BiLSTM-attention
下载PDF
基于Attention-BiTCN的网络入侵检测方法 被引量:5
11
作者 孙红哲 王坚 +1 位作者 王鹏 安雨龙 《信息网络安全》 CSCD 北大核心 2024年第2期309-318,共10页
为解决网络入侵检测领域多分类准确率不高的问题,文章根据网络流量数据具有时序特征的特点,提出一种基于注意力机制和双向时间卷积神经网络(BiDirectional Temporal Convolutional Network,BiTCN)的网络入侵检测模型。首先,该模型对数... 为解决网络入侵检测领域多分类准确率不高的问题,文章根据网络流量数据具有时序特征的特点,提出一种基于注意力机制和双向时间卷积神经网络(BiDirectional Temporal Convolutional Network,BiTCN)的网络入侵检测模型。首先,该模型对数据集进行独热编码和归一化处置等预处理,解决网络流量数据离散性强和标度不统一的问题;其次,将预处理好的数据经双向滑窗法生成双向序列,并同步输入Attention-Bi TCN模型中;然后,提取双向时序特征并通过加性方式融合,得到时序信息被增强后的融合特征;最后,使用Softmax函数对融合特征进行多种攻击行为检测识别。文章所提模型在NSL-KDD和UNSW-NB15数据集上进行实验验证,多分类准确率分别达到99.70%和84.07%,优于传统网络入侵检测算法,且比其他深度学习模型在检测性能上有显著提升。 展开更多
关键词 入侵检测 注意力机制 BiTCN 双向滑窗法 融合特征
下载PDF
Multimodal Sentiment Analysis Using BiGRU and Attention-Based Hybrid Fusion Strategy 被引量:1
12
作者 Zhizhong Liu Bin Zhou +1 位作者 Lingqiang Meng Guangyu Huang 《Intelligent Automation & Soft Computing》 SCIE 2023年第8期1963-1981,共19页
Recently,multimodal sentiment analysis has increasingly attracted attention with the popularity of complementary data streams,which has great potential to surpass unimodal sentiment analysis.One challenge of multimoda... Recently,multimodal sentiment analysis has increasingly attracted attention with the popularity of complementary data streams,which has great potential to surpass unimodal sentiment analysis.One challenge of multimodal sentiment analysis is how to design an efficient multimodal feature fusion strategy.Unfortunately,existing work always considers feature-level fusion or decision-level fusion,and few research works focus on hybrid fusion strategies that contain feature-level fusion and decision-level fusion.To improve the performance of multimodal sentiment analysis,we present a novel multimodal sentiment analysis model using BiGRU and attention-based hybrid fusion strategy(BAHFS).Firstly,we apply BiGRU to learn the unimodal features of text,audio and video.Then we fuse the unimodal features into bimodal features using the bimodal attention fusion module.Next,BAHFS feeds the unimodal features and bimodal features into the trimodal attention fusion module and the trimodal concatenation fusion module simultaneously to get two sets of trimodal features.Finally,BAHFS makes a classification with the two sets of trimodal features respectively and gets the final analysis results with decision-level fusion.Based on the CMU-MOSI and CMU-MOSEI datasets,extensive experiments have been carried out to verify BAHFS’s superiority. 展开更多
关键词 Multimdoal sentiment analysis BiGRU attention mechanism features-level fusion hybrid fusion strategy
下载PDF
Bridge Crack Segmentation Method Based on Parallel Attention Mechanism and Multi-Scale Features Fusion 被引量:1
13
作者 Jianwei Yuan Xinli Song +2 位作者 Huaijian Pu Zhixiong Zheng Ziyang Niu 《Computers, Materials & Continua》 SCIE EI 2023年第3期6485-6503,共19页
Regular inspection of bridge cracks is crucial to bridge maintenance and repair.The traditional manual crack detection methods are timeconsuming,dangerous and subjective.At the same time,for the existing mainstream vi... Regular inspection of bridge cracks is crucial to bridge maintenance and repair.The traditional manual crack detection methods are timeconsuming,dangerous and subjective.At the same time,for the existing mainstream vision-based automatic crack detection algorithms,it is challenging to detect fine cracks and balance the detection accuracy and speed.Therefore,this paper proposes a new bridge crack segmentationmethod based on parallel attention mechanism and multi-scale features fusion on top of the DeeplabV3+network framework.First,the improved lightweight MobileNetv2 network and dilated separable convolution are integrated into the original DeeplabV3+network to improve the original backbone network Xception and atrous spatial pyramid pooling(ASPP)module,respectively,dramatically reducing the number of parameters in the network and accelerates the training and prediction speed of the model.Moreover,we introduce the parallel attention mechanism into the encoding and decoding stages.The attention to the crack regions can be enhanced from the aspects of both channel and spatial parts and significantly suppress the interference of various noises.Finally,we further improve the detection performance of the model for fine cracks by introducing a multi-scale features fusion module.Our research results are validated on the self-made dataset.The experiments show that our method is more accurate than other methods.Its intersection of union(IoU)and F1-score(F1)are increased to 77.96%and 87.57%,respectively.In addition,the number of parameters is only 4.10M,which is much smaller than the original network;also,the frames per second(FPS)is increased to 15 frames/s.The results prove that the proposed method fits well the requirements of rapid and accurate detection of bridge cracks and is superior to other methods. 展开更多
关键词 Crack detection DeeplabV3+ parallel attention mechanism feature fusion
下载PDF
Residual Feature Attentional Fusion Network for Lightweight Chest CT Image Super-Resolution 被引量:1
14
作者 Kun Yang Lei Zhao +4 位作者 Xianghui Wang Mingyang Zhang Linyan Xue Shuang Liu Kun Liu 《Computers, Materials & Continua》 SCIE EI 2023年第6期5159-5176,共18页
The diagnosis of COVID-19 requires chest computed tomography(CT).High-resolution CT images can provide more diagnostic information to help doctors better diagnose the disease,so it is of clinical importance to study s... The diagnosis of COVID-19 requires chest computed tomography(CT).High-resolution CT images can provide more diagnostic information to help doctors better diagnose the disease,so it is of clinical importance to study super-resolution(SR)algorithms applied to CT images to improve the reso-lution of CT images.However,most of the existing SR algorithms are studied based on natural images,which are not suitable for medical images;and most of these algorithms improve the reconstruction quality by increasing the network depth,which is not suitable for machines with limited resources.To alleviate these issues,we propose a residual feature attentional fusion network for lightweight chest CT image super-resolution(RFAFN).Specifically,we design a contextual feature extraction block(CFEB)that can extract CT image features more efficiently and accurately than ordinary residual blocks.In addition,we propose a feature-weighted cascading strategy(FWCS)based on attentional feature fusion blocks(AFFB)to utilize the high-frequency detail information extracted by CFEB as much as possible via selectively fusing adjacent level feature information.Finally,we suggest a global hierarchical feature fusion strategy(GHFFS),which can utilize the hierarchical features more effectively than dense concatenation by progressively aggregating the feature information at various levels.Numerous experiments show that our method performs better than most of the state-of-the-art(SOTA)methods on the COVID-19 chest CT dataset.In detail,the peak signal-to-noise ratio(PSNR)is 0.11 dB and 0.47 dB higher on CTtest1 and CTtest2 at×3 SR compared to the suboptimal method,but the number of parameters and multi-adds are reduced by 22K and 0.43G,respectively.Our method can better recover chest CT image quality with fewer computational resources and effectively assist in COVID-19. 展开更多
关键词 SUPER-RESOLUTION COVID-19 chest CT lightweight network contextual feature extraction attentional feature fusion
下载PDF
基于改进Faster R-CNN的红外目标检测算法 被引量:2
15
作者 汪西晨 彭富伦 +1 位作者 李业勋 张俊举 《应用光学》 CAS 北大核心 2024年第2期346-353,共8页
为提升红外目标的检测精度,提出了一种引入频域注意力机制的Faster R-CNN红外目标检测算法。首先,针对红外图像边缘模糊和噪声问题,设计了一种并行的图像增强预处理结构;其次,在Faster R-CNN中引入频域注意力机制,设计了一种新型红外目... 为提升红外目标的检测精度,提出了一种引入频域注意力机制的Faster R-CNN红外目标检测算法。首先,针对红外图像边缘模糊和噪声问题,设计了一种并行的图像增强预处理结构;其次,在Faster R-CNN中引入频域注意力机制,设计了一种新型红外目标检测主干网络;最后,引入路径增强金字塔结构,融合多尺度特征进行预测,利用底层网络丰富的位置信息,提升检测精度。在红外飞机的数据集上进行实验,结果表明,改进后的Faster R-CNN目标检测框架比以ResNet50为主干的算法的AP提升了7.6%。此外,与目前主流算法对比,本文算法提高了红外目标的检测精度,验证了算法改进的有效性。 展开更多
关键词 红外目标检测 图像增强 faster R-CNN 频域注意力机制 多尺度特征融合
下载PDF
Multi-Feature Fusion-Guided Multiscale Bidirectional Attention Networks for Logistics Pallet Segmentation 被引量:1
16
作者 Weiwei Cai Yaping Song +2 位作者 Huan Duan Zhenwei Xia Zhanguo Wei 《Computer Modeling in Engineering & Sciences》 SCIE EI 2022年第6期1539-1555,共17页
In the smart logistics industry,unmanned forklifts that intelligently identify logistics pallets can improve work efficiency in warehousing and transportation and are better than traditional manual forklifts driven by... In the smart logistics industry,unmanned forklifts that intelligently identify logistics pallets can improve work efficiency in warehousing and transportation and are better than traditional manual forklifts driven by humans.Therefore,they play a critical role in smart warehousing,and semantics segmentation is an effective method to realize the intelligent identification of logistics pallets.However,most current recognition algorithms are ineffective due to the diverse types of pallets,their complex shapes,frequent blockades in production environments,and changing lighting conditions.This paper proposes a novel multi-feature fusion-guided multiscale bidirectional attention(MFMBA)neural network for logistics pallet segmentation.To better predict the foreground category(the pallet)and the background category(the cargo)of a pallet image,our approach extracts three types of features(grayscale,texture,and Hue,Saturation,Value features)and fuses them.The multiscale architecture deals with the problem that the size and shape of the pallet may appear different in the image in the actual,complex environment,which usually makes feature extraction difficult.Our study proposes a multiscale architecture that can extract additional semantic features.Also,since a traditional attention mechanism only assigns attention rights from a single direction,we designed a bidirectional attention mechanism that assigns cross-attention weights to each feature from two directions,horizontally and vertically,significantly improving segmentation.Finally,comparative experimental results show that the precision of the proposed algorithm is 0.53%–8.77%better than that of other methods we compared. 展开更多
关键词 Logistics pallet segmentation image segmentation multi-feature fusion multiscale network bidirectional attention mechanism HSV neural networks deep learning
下载PDF
Multimodal Social Media Fake News Detection Based on Similarity Inference and Adversarial Networks 被引量:1
17
作者 Fangfang Shan Huifang Sun Mengyi Wang 《Computers, Materials & Continua》 SCIE EI 2024年第4期581-605,共25页
As social networks become increasingly complex, contemporary fake news often includes textual descriptionsof events accompanied by corresponding images or videos. Fake news in multiple modalities is more likely tocrea... As social networks become increasingly complex, contemporary fake news often includes textual descriptionsof events accompanied by corresponding images or videos. Fake news in multiple modalities is more likely tocreate a misleading perception among users. While early research primarily focused on text-based features forfake news detection mechanisms, there has been relatively limited exploration of learning shared representationsin multimodal (text and visual) contexts. To address these limitations, this paper introduces a multimodal modelfor detecting fake news, which relies on similarity reasoning and adversarial networks. The model employsBidirectional Encoder Representation from Transformers (BERT) and Text Convolutional Neural Network (Text-CNN) for extracting textual features while utilizing the pre-trained Visual Geometry Group 19-layer (VGG-19) toextract visual features. Subsequently, the model establishes similarity representations between the textual featuresextracted by Text-CNN and visual features through similarity learning and reasoning. Finally, these features arefused to enhance the accuracy of fake news detection, and adversarial networks have been employed to investigatethe relationship between fake news and events. This paper validates the proposed model using publicly availablemultimodal datasets from Weibo and Twitter. Experimental results demonstrate that our proposed approachachieves superior performance on Twitter, with an accuracy of 86%, surpassing traditional unimodalmodalmodelsand existing multimodal models. In contrast, the overall better performance of our model on the Weibo datasetsurpasses the benchmark models across multiple metrics. The application of similarity reasoning and adversarialnetworks in multimodal fake news detection significantly enhances detection effectiveness in this paper. However,current research is limited to the fusion of only text and image modalities. Future research directions should aimto further integrate features fromadditionalmodalities to comprehensively represent themultifaceted informationof fake news. 展开更多
关键词 fake news detection attention mechanism image-text similarity multimodal feature fusion
下载PDF
Adaptive multi-modal feature fusion for far and hard object detection
18
作者 LI Yang GE Hongwei 《Journal of Measurement Science and Instrumentation》 CAS CSCD 2021年第2期232-241,共10页
In order to solve difficult detection of far and hard objects due to the sparseness and insufficient semantic information of LiDAR point cloud,a 3D object detection network with multi-modal data adaptive fusion is pro... In order to solve difficult detection of far and hard objects due to the sparseness and insufficient semantic information of LiDAR point cloud,a 3D object detection network with multi-modal data adaptive fusion is proposed,which makes use of multi-neighborhood information of voxel and image information.Firstly,design an improved ResNet that maintains the structure information of far and hard objects in low-resolution feature maps,which is more suitable for detection task.Meanwhile,semantema of each image feature map is enhanced by semantic information from all subsequent feature maps.Secondly,extract multi-neighborhood context information with different receptive field sizes to make up for the defect of sparseness of point cloud which improves the ability of voxel features to represent the spatial structure and semantic information of objects.Finally,propose a multi-modal feature adaptive fusion strategy which uses learnable weights to express the contribution of different modal features to the detection task,and voxel attention further enhances the fused feature expression of effective target objects.The experimental results on the KITTI benchmark show that this method outperforms VoxelNet with remarkable margins,i.e.increasing the AP by 8.78%and 5.49%on medium and hard difficulty levels.Meanwhile,our method achieves greater detection performance compared with many mainstream multi-modal methods,i.e.outperforming the AP by 1%compared with that of MVX-Net on medium and hard difficulty levels. 展开更多
关键词 3D object detection adaptive fusion multi-modal data fusion attention mechanism multi-neighborhood features
下载PDF
深度嵌套注意力下的SlowFast信息融合动作识别网络
19
作者 张起尧 桑海峰 《电子测量与仪器学报》 CSCD 北大核心 2024年第3期159-166,共8页
视频动作识别在视频监控、自动驾驶等多个领域都有着广泛的应用。SlowFast网络是视频动作识别领域经常使用的网络。目前SlowFast相关网络中使用注意力进行相关信息增强,注意力机制与网络的结合方式是将注意力机制嵌套到网络的各个卷积... 视频动作识别在视频监控、自动驾驶等多个领域都有着广泛的应用。SlowFast网络是视频动作识别领域经常使用的网络。目前SlowFast相关网络中使用注意力进行相关信息增强,注意力机制与网络的结合方式是将注意力机制嵌套到网络的各个卷积块之间,如果将注意力机制深层嵌套到卷积块的具体卷积层中,SlowFast网络的信息提取能力将更进一步。首先提出了一种深度嵌套注意力机制,该深度嵌套机制内部包含一种可以提取时空与通道信息的注意力SCTM,使SlowFast网络的3种信息提取能力得到了进一步加强。此外,目前多流网络融合的信息并没有充分的交互与处理。提出了一种基于交叉注意力与ConvLSTM的多流时空信息融合网络,使多流网络中每个流的信息充分交互。改进后的SlowFast网络在UCF101数据集上的Top-1准确率已达到98.5%,在HMDB51数据集中的准确率达到了80.1%。均优于目前已有的模型,比原始SlowFast网络提高了2.64%,且鉴于上述数据,深度嵌套注意力的SlowFast时空信息融合网络在信息提取与融合方面具有优越性能。 展开更多
关键词 视频动作识别 Slowfast 注意力深层嵌套 信息融合网络 时空通道注意力
下载PDF
Facial Expression Recognition Based on the Fusion of Infrared and Visible Image
20
作者 Jiancheng Zou Jiaxin Li +2 位作者 Juncun Wei Zhengzheng Li Xin Yang 《Journal on Artificial Intelligence》 2021年第3期123-134,共12页
Facial expression recognition is a research hot spot in the fields of computer vision and pattern recognition.However,the existing facial expression recognition models are mainly concentrated in the visible light envi... Facial expression recognition is a research hot spot in the fields of computer vision and pattern recognition.However,the existing facial expression recognition models are mainly concentrated in the visible light environment.They have insufficient generalization ability and low recognition accuracy,and are vulnerable to environmental changes such as illumination and distance.In order to solve these problems,we combine the advantages of the infrared and visible images captured simultaneously by array equipment our developed with two infrared and two visible lens,so that the fused image not only has the texture information of visible image,but also has the contrast information of infrared image.On the other hand,we improved the WGAN by adding SSIM and LBP loss functions to ensure the structural similarity between the fused image and infrared image,and also the texture similarity between the fused image and visible image respectively.Finally,a facial expression recognition model Pyconv-SE18 with pyramid convolution and attention mechanism module is designed to extract the important feature information of facial expression in multiple scales.We add cosine distance loss function to reduce the feature difference within the class.Experiment results show that the robustness of expression recognition algorithm to illumination is improved based on the fused images.The accuracy of this model on FER2013 and CK+public data sets are 69.3%and 94.6%,respectively. 展开更多
关键词 Image fusion expression recognition pyramid convolution attention mechanism
下载PDF
上一页 1 2 121 下一页 到第
使用帮助 返回顶部