期刊文献+
共找到379篇文章
< 1 2 19 >
每页显示 20 50 100
Improved multi-scale inverse bottleneck residual network based on triplet parallel attention for apple leaf disease identification
1
作者 Lei Tang Jizheng Yi Xiaoyao Li 《Journal of Integrative Agriculture》 SCIE CAS CSCD 2024年第3期901-922,共22页
Accurate diagnosis of apple leaf diseases is crucial for improving the quality of apple production and promoting the development of the apple industry. However, apple leaf diseases do not differ significantly from ima... Accurate diagnosis of apple leaf diseases is crucial for improving the quality of apple production and promoting the development of the apple industry. However, apple leaf diseases do not differ significantly from image texture and structural information. The difficulties in disease feature extraction in complex backgrounds slow the related research progress. To address the problems, this paper proposes an improved multi-scale inverse bottleneck residual network model based on a triplet parallel attention mechanism, which is built upon ResNet-50, while improving and combining the inception module and ResNext inverse bottleneck blocks, to recognize seven types of apple leaf(including six diseases of alternaria leaf spot, brown spot, grey spot, mosaic, rust, scab, and one healthy). First, the 3×3 convolutions in some of the residual modules are replaced by multi-scale residual convolutions, the convolution kernels of different sizes contained in each branch of the multi-scale convolution are applied to extract feature maps of different sizes, and the outputs of these branches are multi-scale fused by summing to enrich the output features of the images. Second, the global layer-wise dynamic coordinated inverse bottleneck structure is used to reduce the network feature loss. The inverse bottleneck structure makes the image information less lossy when transforming from different dimensional feature spaces. The fusion of multi-scale and layer-wise dynamic coordinated inverse bottlenecks makes the model effectively balances computational efficiency and feature representation capability, and more robust with a combination of horizontal and vertical features in the fine identification of apple leaf diseases. Finally, after each improved module, a triplet parallel attention module is integrated with cross-dimensional interactions among channels through rotations and residual transformations, which improves the parallel search efficiency of important features and the recognition rate of the network with relatively small computational costs while the dimensional dependencies are improved. To verify the validity of the model in this paper, we uniformly enhance apple leaf disease images screened from the public data sets of Plant Village, Baidu Flying Paddle, and the Internet. The final processed image count is 14,000. The ablation study, pre-processing comparison, and method comparison are conducted on the processed datasets. The experimental results demonstrate that the proposed method reaches 98.73% accuracy on the adopted datasets, which is 1.82% higher than the classical ResNet-50 model, and 0.29% better than the apple leaf disease datasets before preprocessing. It also achieves competitive results in apple leaf disease identification compared to some state-ofthe-art methods. 展开更多
关键词 multi-scale module inverse bottleneck structure triplet parallel attention apple leaf disease
下载PDF
Multi-Scale Mixed Attention Tea Shoot Instance Segmentation Model
2
作者 Dongmei Chen Peipei Cao +5 位作者 Lijie Yan Huidong Chen Jia Lin Xin Li Lin Yuan Kaihua Wu 《Phyton-International Journal of Experimental Botany》 SCIE 2024年第2期261-275,共15页
Tea leaf picking is a crucial stage in tea production that directly influences the quality and value of the tea.Traditional tea-picking machines may compromise the quality of the tea leaves.High-quality teas are often... Tea leaf picking is a crucial stage in tea production that directly influences the quality and value of the tea.Traditional tea-picking machines may compromise the quality of the tea leaves.High-quality teas are often handpicked and need more delicate operations in intelligent picking machines.Compared with traditional image processing techniques,deep learning models have stronger feature extraction capabilities,and better generalization and are more suitable for practical tea shoot harvesting.However,current research mostly focuses on shoot detection and cannot directly accomplish end-to-end shoot segmentation tasks.We propose a tea shoot instance segmentation model based on multi-scale mixed attention(Mask2FusionNet)using a dataset from the tea garden in Hangzhou.We further analyzed the characteristics of the tea shoot dataset,where the proportion of small to medium-sized targets is 89.9%.Our algorithm is compared with several mainstream object segmentation algorithms,and the results demonstrate that our model achieves an accuracy of 82%in recognizing the tea shoots,showing a better performance compared to other models.Through ablation experiments,we found that ResNet50,PointRend strategy,and the Feature Pyramid Network(FPN)architecture can improve performance by 1.6%,1.4%,and 2.4%,respectively.These experiments demonstrated that our proposed multi-scale and point selection strategy optimizes the feature extraction capability for overlapping small targets.The results indicate that the proposed Mask2FusionNet model can perform the shoot segmentation in unstructured environments,realizing the individual distinction of tea shoots,and complete extraction of the shoot edge contours with a segmentation accuracy of 82.0%.The research results can provide algorithmic support for the segmentation and intelligent harvesting of premium tea shoots at different scales. 展开更多
关键词 Tea shoots attention mechanism multi-scale feature extraction instance segmentation deep learning
下载PDF
MSSTNet:Multi-scale facial videos pulse extraction network based on separable spatiotemporal convolution and dimension separable attention
3
作者 Changchen ZHAO Hongsheng WANG Yuanjing FENG 《Virtual Reality & Intelligent Hardware》 2023年第2期124-141,共18页
Background The use of remote photoplethysmography(rPPG)to estimate blood volume pulse in a noncontact manner has been an active research topic in recent years.Existing methods are primarily based on a singlescale regi... Background The use of remote photoplethysmography(rPPG)to estimate blood volume pulse in a noncontact manner has been an active research topic in recent years.Existing methods are primarily based on a singlescale region of interest(ROI).However,some noise signals that are not easily separated in a single-scale space can be easily separated in a multi-scale space.Also,existing spatiotemporal networks mainly focus on local spatiotemporal information and do not emphasize temporal information,which is crucial in pulse extraction problems,resulting in insufficient spatiotemporal feature modelling.Methods Here,we propose a multi-scale facial video pulse extraction network based on separable spatiotemporal convolution(SSTC)and dimension separable attention(DSAT).First,to solve the problem of a single-scale ROI,we constructed a multi-scale feature space for initial signal separation.Second,SSTC and DSAT were designed for efficient spatiotemporal correlation modeling,which increased the information interaction between the long-span time and space dimensions;this placed more emphasis on temporal features.Results The signal-to-noise ratio(SNR)of the proposed network reached 9.58dB on the PURE dataset and 6.77dB on the UBFC-rPPG dataset,outperforming state-of-the-art algorithms.Conclusions The results showed that fusing multi-scale signals yielded better results than methods based on only single-scale signals.The proposed SSTC and dimension-separable attention mechanism will contribute to more accurate pulse signal extraction. 展开更多
关键词 Remote photoplethysmography Heart rate Separable spatiotemporal convolution Dimension separable attention multi-scale Neural network
下载PDF
Image Inpainting Technique Incorporating Edge Prior and Attention Mechanism
4
作者 Jinxian Bai Yao Fan +1 位作者 Zhiwei Zhao Lizhi Zheng 《Computers, Materials & Continua》 SCIE EI 2024年第1期999-1025,共27页
Recently,deep learning-based image inpainting methods have made great strides in reconstructing damaged regions.However,these methods often struggle to produce satisfactory results when dealing with missing images wit... Recently,deep learning-based image inpainting methods have made great strides in reconstructing damaged regions.However,these methods often struggle to produce satisfactory results when dealing with missing images with large holes,leading to distortions in the structure and blurring of textures.To address these problems,we combine the advantages of transformers and convolutions to propose an image inpainting method that incorporates edge priors and attention mechanisms.The proposed method aims to improve the results of inpainting large holes in images by enhancing the accuracy of structure restoration and the ability to recover texture details.This method divides the inpainting task into two phases:edge prediction and image inpainting.Specifically,in the edge prediction phase,a transformer architecture is designed to combine axial attention with standard self-attention.This design enhances the extraction capability of global structural features and location awareness.It also balances the complexity of self-attention operations,resulting in accurate prediction of the edge structure in the defective region.In the image inpainting phase,a multi-scale fusion attention module is introduced.This module makes full use of multi-level distant features and enhances local pixel continuity,thereby significantly improving the quality of image inpainting.To evaluate the performance of our method.comparative experiments are conducted on several datasets,including CelebA,Places2,and Facade.Quantitative experiments show that our method outperforms the other mainstream methods.Specifically,it improves Peak Signal-to-Noise Ratio(PSNR)and Structure Similarity Index Measure(SSIM)by 1.141~3.234 db and 0.083~0.235,respectively.Moreover,it reduces Learning Perceptual Image Patch Similarity(LPIPS)and Mean Absolute Error(MAE)by 0.0347~0.1753 and 0.0104~0.0402,respectively.Qualitative experiments reveal that our method excels at reconstructing images with complete structural information and clear texture details.Furthermore,our model exhibits impressive performance in terms of the number of parameters,memory cost,and testing time. 展开更多
关键词 Image inpainting TRANSFORMER edge prior axial attention multi-scale fusion attention
下载PDF
Attention Guided Multi Scale Feature Fusion Network for Automatic Prostate Segmentation
5
作者 Yuchun Li Mengxing Huang +1 位作者 Yu Zhang Zhiming Bai 《Computers, Materials & Continua》 SCIE EI 2024年第2期1649-1668,共20页
The precise and automatic segmentation of prostate magnetic resonance imaging(MRI)images is vital for assisting doctors in diagnosing prostate diseases.In recent years,many advanced methods have been applied to prosta... The precise and automatic segmentation of prostate magnetic resonance imaging(MRI)images is vital for assisting doctors in diagnosing prostate diseases.In recent years,many advanced methods have been applied to prostate segmentation,but due to the variability caused by prostate diseases,automatic segmentation of the prostate presents significant challenges.In this paper,we propose an attention-guided multi-scale feature fusion network(AGMSF-Net)to segment prostate MRI images.We propose an attention mechanism for extracting multi-scale features,and introduce a 3D transformer module to enhance global feature representation by adding it during the transition phase from encoder to decoder.In the decoder stage,a feature fusion module is proposed to obtain global context information.We evaluate our model on MRI images of the prostate acquired from a local hospital.The relative volume difference(RVD)and dice similarity coefficient(DSC)between the results of automatic prostate segmentation and ground truth were 1.21%and 93.68%,respectively.To quantitatively evaluate prostate volume on MRI,which is of significant clinical significance,we propose a unique AGMSF-Net.The essential performance evaluation and validation experiments have demonstrated the effectiveness of our method in automatic prostate segmentation. 展开更多
关键词 Prostate segmentation multi-scale attention 3D Transformer feature fusion MRI
下载PDF
Multi-Scale Attention-Based Deep Neural Network for Brain Disease Diagnosis 被引量:1
6
作者 Yin Liang Gaoxu Xu Sadaqat ur Rehman 《Computers, Materials & Continua》 SCIE EI 2022年第9期4645-4661,共17页
Whole brain functional connectivity(FC)patterns obtained from resting-state functional magnetic resonance imaging(rs-fMRI)have been widely used in the diagnosis of brain disorders such as autism spectrum disorder(ASD)... Whole brain functional connectivity(FC)patterns obtained from resting-state functional magnetic resonance imaging(rs-fMRI)have been widely used in the diagnosis of brain disorders such as autism spectrum disorder(ASD).Recently,an increasing number of studies have focused on employing deep learning techniques to analyze FC patterns for brain disease classification.However,the high dimensionality of the FC features and the interpretation of deep learning results are issues that need to be addressed in the FC-based brain disease classification.In this paper,we proposed a multi-scale attention-based deep neural network(MSA-DNN)model to classify FC patterns for the ASD diagnosis.The model was implemented by adding a flexible multi-scale attention(MSA)module to the auto-encoder based backbone DNN,which can extract multi-scale features of the FC patterns and change the level of attention for different FCs by continuous learning.Our model will reinforce the weights of important FC features while suppress the unimportant FCs to ensure the sparsity of the model weights and enhance the model interpretability.We performed systematic experiments on the large multi-sites ASD dataset with both ten-fold and leaveone-site-out cross-validations.Results showed that our model outperformed classical methods in brain disease classification and revealed robust intersite prediction performance.We also localized important FC features and brain regions associated with ASD classification.Overall,our study further promotes the biomarker detection and computer-aided classification for ASD diagnosis,and the proposed MSA module is flexible and easy to implement in other classification networks. 展开更多
关键词 Autism spectrum disorder diagnosis resting-state fMRI deep neural network functional connectivity multi-scale attention module
下载PDF
基于CNN-Attention-BP的降水发生预测研究 被引量:7
7
作者 吴香华 华亚婕 +2 位作者 官元红 王巍巍 刘端阳 《南京信息工程大学学报(自然科学版)》 CAS 北大核心 2022年第2期148-155,共8页
在综合分析降水统计预测模型特点的基础上,提出一种基于Attention机制、卷积神经网络(CNN)和BP神经网络的CNN-Attention-BP组合模型,并对1961—2020年不同气候类型的长春站、白城站、延吉站夏季降水进行实证分析.首先,运用卷积神经网络... 在综合分析降水统计预测模型特点的基础上,提出一种基于Attention机制、卷积神经网络(CNN)和BP神经网络的CNN-Attention-BP组合模型,并对1961—2020年不同气候类型的长春站、白城站、延吉站夏季降水进行实证分析.首先,运用卷积神经网络对6—8月20—次日20时降水量、平均气压、平均风速、平均气温和平均相对湿度进行特征学习,利用Attention机制来确定气象影响因素对降水预测的权重;然后,使用BP神经网络进行降水发生预测,选用准确率、交叉熵损失函数和F1-score来综合评价CNN-Attention-BP组合模型的性能.最后,将单一的支持向量机、多层感知机和卷积神经网络模型与组合模型进行比较分析.结果表明,CNN-Attention-BP组合模型具有自主学习和关注更重要信息的特征,能够有效提高吉林省夏季降水发生模型的预测能力,在样本越均衡、降水频率越接近于0.5的站点,预测精度越高,准确率最高可达88.4%.CNN-Attention-BP组合模型的准确率相较于其他单一模型最高可以提高近17个百分点. 展开更多
关键词 降水预测 卷积神经网络 attention机制 BP神经网络 交叉熵损失函数
下载PDF
Disease Recognition of Apple Leaf Using Lightweight Multi-Scale Network with ECANet 被引量:3
8
作者 Helong Yu Xianhe Cheng +2 位作者 Ziqing Li Qi Cai Chunguang Bi 《Computer Modeling in Engineering & Sciences》 SCIE EI 2022年第9期711-738,共28页
To solve the problem of difficulty in identifying apple diseases in the natural environment and the low application rate of deep learning recognition networks,a lightweight ResNet(LW-ResNet)model for apple disease rec... To solve the problem of difficulty in identifying apple diseases in the natural environment and the low application rate of deep learning recognition networks,a lightweight ResNet(LW-ResNet)model for apple disease recognition is proposed.Based on the deep residual network(ResNet18),the multi-scale feature extraction layer is constructed by group convolution to realize the compression model and improve the extraction ability of different sizes of lesion features.By improving the identity mapping structure to reduce information loss.By introducing the efficient channel attention module(ECANet)to suppress noise from a complex background.The experimental results show that the average precision,recall and F1-score of the LW-ResNet on the test set are 97.80%,97.92%and 97.85%,respectively.The parameter memory is 2.32 MB,which is 94%less than that of ResNet18.Compared with the classic lightweight networks SqueezeNet and MobileNetV2,LW-ResNet has obvious advantages in recognition performance,speed,parameter memory requirement and time complexity.The proposed model has the advantages of low computational cost,low storage cost,strong real-time performance,high identification accuracy,and strong practicability,which can meet the needs of real-time identification task of apple leaf disease on resource-constrained devices. 展开更多
关键词 Apple disease recognition deep residual network multi-scale feature efficient channel attention module lightweight network
下载PDF
基于双注意力CrossViT的微表情识别方法
9
作者 冉瑞生 石凯 +1 位作者 江小鹏 王宁 《南京信息工程大学学报(自然科学版)》 CAS 北大核心 2023年第5期541-550,共10页
微表情是人们试图隐藏自己真实情绪时不由自主泄露出来的面部表情,是近年来情感计算领域的热点研究领域.微表情是一种细微的面部运动,难以捕捉其细微变化的特征.本文基于交叉注意力多尺度ViT(CrossViT)在图像分类领域的优异性能以及能... 微表情是人们试图隐藏自己真实情绪时不由自主泄露出来的面部表情,是近年来情感计算领域的热点研究领域.微表情是一种细微的面部运动,难以捕捉其细微变化的特征.本文基于交叉注意力多尺度ViT(CrossViT)在图像分类领域的优异性能以及能够捕捉细微特征信息的能力,将CrossViT作为主干网络,对网络中的交叉注意力机制进行改进,提出了DA模块(Dual Attention)以扩展传统交叉注意力机制,确定注意力结果之间的相关性,从而提升了微表情识别精度.本网络从三个光流特征(即光学应变、水平和垂直光流场)中学习,这些特征是由每个微表情序列的起始帧和峰值帧计算得出,最后通过Softmax进行微表情分类.在微表情融合数据集上,UF1和UAR分别达到了0.7275和0.7272,识别精度优于微表情领域的主流算法,验证了本文提出网络的有效性. 展开更多
关键词 微表情识别 crossViT 交叉注意力机制 光流特征
下载PDF
基于跨模态注意力融合的煤炭异物检测方法 被引量:1
10
作者 曹现刚 李虎 +3 位作者 王鹏 吴旭东 向敬芳 丁文韬 《工矿自动化》 CSCD 北大核心 2024年第1期57-65,共9页
为解决原煤智能化洗选过程中煤流中夹杂的异物对比度低、相互遮挡导致异物图像检测时特征提取不充分的问题,提出了一种基于跨模态注意力融合的煤炭异物检测方法。通过引入Depth图像构建RGB图像与Depth图像的双特征金字塔网络(DFPN),采... 为解决原煤智能化洗选过程中煤流中夹杂的异物对比度低、相互遮挡导致异物图像检测时特征提取不充分的问题,提出了一种基于跨模态注意力融合的煤炭异物检测方法。通过引入Depth图像构建RGB图像与Depth图像的双特征金字塔网络(DFPN),采用浅层的特征提取策略提取Depth图像的低级特征,用深度边缘与深度纹理等基础特征辅助RGB图像深层特征,以有效获得2种特征的互补信息,从而丰富异物特征的空间与边缘信息,提高检测精度;构建了基于坐标注意力与改进空间注意力的跨模态注意力融合模块(CAFM),以协同优化并融合RGB特征与Depth特征,增强网络对特征图中被遮挡异物可见部分的关注度,提高被遮挡异物检测精度;使用区域卷积神经网络(R-CNN)输出煤炭异物的分类、回归与分割结果。实验结果表明:在检测精度方面,该方法的AP相较两阶段模型中较优的Mask transfiner高3.9%;在检测效率方面,该方法的单帧检测时间为110.5 ms,能够满足异物检测实时性需求。基于跨模态注意力融合的煤炭异物检测方法能够以空间特征辅助色彩、形状与纹理等特征,准确识别煤炭异物之间及煤炭异物与输送带之间的差异,从而有效提高对复杂特征异物的检测精度,减少误检、漏检现象,实现复杂特征下煤炭异物的精确检测与像素级分割。 展开更多
关键词 煤炭异物检测 实例分割 双特征金字塔网络 跨模态注意力融合 Depth图像 坐标注意力 改进空间注意力
下载PDF
AF-CenterNet:基于交叉注意力机制的毫米波雷达和相机融合的目标检测
11
作者 车俐 吕连辉 蒋留兵 《计算机应用研究》 CSCD 北大核心 2024年第4期1258-1263,共6页
对于自动驾驶领域而言,确保在各种天气和光照条件下精确检测其他车辆目标是至关重要的。针对单个传感器获取信息的局限性,提出一种基于cross-attention注意力机制的融合方法(AF),用于在特征层面上融合毫米波雷达和相机信息。首先,将毫... 对于自动驾驶领域而言,确保在各种天气和光照条件下精确检测其他车辆目标是至关重要的。针对单个传感器获取信息的局限性,提出一种基于cross-attention注意力机制的融合方法(AF),用于在特征层面上融合毫米波雷达和相机信息。首先,将毫米波雷达和相机进行空间对齐,并将对齐后的点云信息投影成点云图像。然后,将点云图像在高度和宽度方向上进行扩展,以提高相机图像和点云图像之间的匹配度。最后,将点云图像和相机图像送入包含AF结构的CenterNet目标检测网络中进行训练,并生成一个空间注意力权重,以增强相机中的关键特征。实验结果表明,AF结构可以提高原网络检测各种大小目标的性能,特别是对小目标的检测提升更为明显,且对系统的实时性影响不大,是提高车辆在多种场景下检测精度的理想选择。 展开更多
关键词 自动驾驶 目标检测 毫米波雷达 交叉注意力融合
下载PDF
基于结构功能交叉神经网络的多模态医学图像融合
12
作者 邸敬 郭文庆 +2 位作者 任莉 杨燕 廉敬 《光学精密工程》 EI CAS CSCD 北大核心 2024年第2期252-267,共16页
针对多模态医学图像融合中存在纹理细节模糊和对比度低的问题,提出了一种结构功能交叉神经网络的多模态医学图像融合方法。首先,根据医学图像的结构信息和功能信息设计了结构功能交叉神经网络模型,不仅有效地提取解剖学和功能学医学图... 针对多模态医学图像融合中存在纹理细节模糊和对比度低的问题,提出了一种结构功能交叉神经网络的多模态医学图像融合方法。首先,根据医学图像的结构信息和功能信息设计了结构功能交叉神经网络模型,不仅有效地提取解剖学和功能学医学图像的结构信息和功能信息,而且能够实现这两种信息之间的交互,从而很好地提取医学图像的纹理细节信息。其次,利用交叉网络通道和空间特征变化构造了一种新的注意力机制,通过不断调整结构信息和功能信息权重来融合图像,提高了融合图像的对比度和轮廓信息。最后,设计了一个从融合图像到源图像的分解过程,由于分解图像的质量直接取决于融合结果,因此分解过程可以使融合图像包含更多的细节信息。通过与近年来提出的7种高水平方法相比,本文方法的AG,EN,SF,MI,QAB/F和CC客观评价指标分别平均提高了22.87%,19.64%,23.02%,12.70%,6.79%,30.35%,说明本文方法能够获得纹理细节更清晰、对比度更好的融合结果,在主观视觉和客观指标上都优于其他对比算法。 展开更多
关键词 多模态医学图像融合 结构功能信息交叉网络 注意力机制 分解网络
下载PDF
基于混合特征提取与跨模态特征预测融合的情感识别模型
13
作者 李牧 杨宇恒 柯熙政 《计算机应用》 CSCD 北大核心 2024年第1期86-93,共8页
为从多模态情感分析中有效挖掘单模态表征信息,并实现多模态信息充分融合,提出一种基于混合特征与跨模态预测融合的情感识别模型(H-MGFCT)。首先,利用Mel频率倒谱系数(MFCC)和Gammatone频率倒谱系数(GFCC)及其一阶动态特征融合得到混合... 为从多模态情感分析中有效挖掘单模态表征信息,并实现多模态信息充分融合,提出一种基于混合特征与跨模态预测融合的情感识别模型(H-MGFCT)。首先,利用Mel频率倒谱系数(MFCC)和Gammatone频率倒谱系数(GFCC)及其一阶动态特征融合得到混合特征参数提取算法(H-MGFCC),解决了语音情感特征丢失的问题;其次,利用基于注意力权重的跨模态预测模型,筛选出与语音特征相关性更高的文本特征;随后,加入对比学习的跨模态注意力机制模型对相关性高的文本特征和语音模态情感特征进行跨模态信息融合;最后,将含有文本−语音的跨模态信息特征与筛选出的相关性低的文本特征相融合,以起到信息补充的作用。实验结果表明,该模型在公开IEMOCAP(Interactive EMotional dyadic MOtion CAPture)、CMU-MOSI(CMU-Multimodal Opinion Emotion Intensity)、CMU-MOSEI(CMU-Multimodal Opinion Sentiment Emotion Intensity)数据集上与加权决策层融合的语音文本情感识别(DLFT)模型相比,准确率分别提高了2.83、2.64和3.05个百分点,验证了该模型情感识别的有效性。 展开更多
关键词 特征提取 多模态融合 情感识别 跨模态融合 注意力机制
下载PDF
基于CLIP和交叉注意力的多模态情感分析模型
14
作者 陈燕 赖宇斌 +2 位作者 肖澳 廖宇翔 陈宁江 《郑州大学学报(工学版)》 CAS 北大核心 2024年第2期42-50,共9页
针对多模态情感分析中存在的标注数据量少、模态间融合不充分以及信息冗余等问题,提出了一种基于对比语言-图片训练(CLIP)和交叉注意力(CA)的多模态情感分析(MSA)模型CLIP-CA-MSA。首先,该模型使用CLIP预训练的BERT模型、PIFT模型来提... 针对多模态情感分析中存在的标注数据量少、模态间融合不充分以及信息冗余等问题,提出了一种基于对比语言-图片训练(CLIP)和交叉注意力(CA)的多模态情感分析(MSA)模型CLIP-CA-MSA。首先,该模型使用CLIP预训练的BERT模型、PIFT模型来提取视频特征向量与文本特征;其次,使用交叉注意力机制将图像特征向量和文本特征向量进行交互,以加强不同模态之间的信息传递;最后,利用不确定性损失特征融合后计算输出最终的情感分类结果。实验结果表明:该模型比其他多模态模型准确率提高5百分点至14百分点,F1值提高3百分点至12百分点,验证了该模型的优越性,并使用消融实验验证该模型各模块的有效性。该模型能够有效地利用多模态数据的互补性和相关性,同时利用不确定性损失来提高模型的鲁棒性和泛化能力。 展开更多
关键词 情感分析 多模态学习 交叉注意力 CLIP模型 TRANSFORMER 特征融合
下载PDF
基于跨模态交叉注意力网络的多模态情感分析方法
15
作者 王旭阳 王常瑞 +1 位作者 张金峰 邢梦怡 《广西师范大学学报(自然科学版)》 CAS 北大核心 2024年第2期84-93,共10页
挖掘不同模态内信息和模态间信息有助于提升多模态情感分析的性能,本文为此提出一种基于跨模态交叉注意力网络的多模态情感分析方法。首先,利用VGG-16网络将多模态数据映射到全局特征空间;同时,利用Swin Transformer网络将多模态数据映... 挖掘不同模态内信息和模态间信息有助于提升多模态情感分析的性能,本文为此提出一种基于跨模态交叉注意力网络的多模态情感分析方法。首先,利用VGG-16网络将多模态数据映射到全局特征空间;同时,利用Swin Transformer网络将多模态数据映射到局部特征空间;其次,构造模态内自注意力和模态间交叉注意力特征;然后,设计一种跨模态交叉注意力融合模块实现不同模态内和模态间特征的深度融合,提升多模态特征表达的可靠性;最后,通过Softmax获得最终预测结果。在2个开源数据集CMU-MOSI和CMU-MSOEI上进行测试,本文模型在七分类任务上获得45.9%和54.1%的准确率,相比当前MCGMF模型,提升了0.66%和2.46%,综合性能提升显著。 展开更多
关键词 情感分析 多模态 跨模态交叉注意力 自注意力 局部和全局特征
下载PDF
基于双分支点流语义先验的路面病害分割模型
16
作者 庞荣 杨燕 +2 位作者 冷雄进 张朋 刘言 《智能系统学报》 CSCD 北大核心 2024年第1期153-164,共12页
针对基于深度学习的真实路面病害图像识别算法主要面临的复杂道路背景与病害前景比例不同、病害尺度小等导致的类别严重不平衡、路面病害与道路的几何结构特征对比不明显导致其不易识别等问题,本文提出一种基于双分支语义先验网络,用于... 针对基于深度学习的真实路面病害图像识别算法主要面临的复杂道路背景与病害前景比例不同、病害尺度小等导致的类别严重不平衡、路面病害与道路的几何结构特征对比不明显导致其不易识别等问题,本文提出一种基于双分支语义先验网络,用于指导自注意力骨干特征网络挖掘背景与病害前景的复杂关系,运用高效自注意力机制和互协方差自注意力机制分别对二维空间和特征通道进行语义特征提取,并引入语义局部增强模块提高局部特征聚合能力。本文提出了一种新的稀疏主体点流模块,并与传统特征金字塔网络相结合,进一步缓解路面病害的类别不平衡问题;构建了一个真实场景的道路病害分割数据集,并在该数据集和公开数据集上与多个基线模型进行对比实验,实验结果验证了本模型的有效性。 展开更多
关键词 语义先验信息 高效注意力机制 互协方差注意力机制 稀疏主体点流 类别不平衡 语义分割 路面病害 深度学习
下载PDF
注意力引导的多尺度红外行人车辆实时检测
17
作者 张印辉 计凯 +1 位作者 何自芬 陈光晨 《红外与激光工程》 EI CSCD 北大核心 2024年第5期229-239,共11页
红外成像技术通过捕捉目标热辐射特征进行成像,能实现复杂道路场景下的目标监测和道路冗杂信息滤除。针对红外行人和车辆目标检测模型参数量大、依赖高性能GPU资源和检测速度慢等问题,提出了一种注意力引导的多尺度红外行人车辆实时检... 红外成像技术通过捕捉目标热辐射特征进行成像,能实现复杂道路场景下的目标监测和道路冗杂信息滤除。针对红外行人和车辆目标检测模型参数量大、依赖高性能GPU资源和检测速度慢等问题,提出了一种注意力引导的多尺度红外行人车辆实时检测模型。首先,为精确匹配校准红外行人和车辆目标尺度与锚框尺寸,利用K-Means++算法对红外行人和车辆目标尺度进行先验框预置参数重聚类生成,并设计128×128精细尺度检测层;其次,设计注意力引导广域特征提取模块增强模型特征提取能力和空间及通道信息聚焦能力;随后,构建跨空间感知模块引入空间信息感知,强化不同尺度空间下的目标的特征表达能力;最后,针对资源受限设备,通过4倍通道剪枝方法降低模型参数量,增强移动端算法部署适应性。实验结果表明:所提IRDet算法与基准方法相比,模型平均检测精度提升4.3%,达到87.4%,模型权重值压缩60.4%,降至5.7 MB。 展开更多
关键词 红外交通检测 先验框匹配 注意力引导 跨空间感知 模型剪枝
下载PDF
基于复合跨模态交互网络的时序多模态情感分析
18
作者 杨力 钟俊弘 +1 位作者 张赟 宋欣渝 《计算机科学与探索》 CSCD 北大核心 2024年第5期1318-1327,共10页
针对多模态情感分析中存在的不同模态间语义特征差异性导致模态融合不充分、交互性弱等问题,通过研究分析不同模态之间存在的潜在关联性,搭建一种基于复合跨模态交互网络的时序多模态情感分析(CCIN-SA)模型。该模型首先使用双向门控循... 针对多模态情感分析中存在的不同模态间语义特征差异性导致模态融合不充分、交互性弱等问题,通过研究分析不同模态之间存在的潜在关联性,搭建一种基于复合跨模态交互网络的时序多模态情感分析(CCIN-SA)模型。该模型首先使用双向门控循环单元和多头注意力机制提取具有上下文语义信息的文本、视觉和语音模态时序特征;然后,设计跨模态注意力交互层,利用辅助模态的低阶信号不断强化目标模态,使得目标模态学习到辅助模态的信息,捕获模态间的潜在适应性;再将增强后的特征输入到复合特征融合层,通过条件向量进一步捕获不同模态间的相似性,增强重要特征的关联程度,挖掘模态间更深层次的交互性;最后,利用多头注意力机制将复合跨模态强化后的特征与低阶信号做拼接融合,提高模态内部重要特征的权重,保留初始模态独有的特征信息,将得到的多模态融合特征进行最终的情感分类任务。在CMU-MOSI和CMUMOSEI数据集上进行模型评估,结果表明,CCIN-SA模型相比其他现有模型在准确率和F1指标上均有提高,能够有效挖掘不同模态间的关联性,做出更加准确的情感判断。 展开更多
关键词 跨模态交互 注意力机制 特征融合 复合融合层 多模态情感分析
下载PDF
特征融合的装修案例跨模态检索方法
19
作者 亢洁 刘威 《智能系统学报》 CSCD 北大核心 2024年第2期429-437,共9页
目前家装客服系统中主要依靠人工方式进行装修案例检索,导致该系统不能满足用户对咨询服务快捷、及时的需求而且人力成本高,故提出一种基于特征融合的装修案例跨模态检索算法。针对多模态数据的语义信息挖掘不充分,模型检索精度低等问题... 目前家装客服系统中主要依靠人工方式进行装修案例检索,导致该系统不能满足用户对咨询服务快捷、及时的需求而且人力成本高,故提出一种基于特征融合的装修案例跨模态检索算法。针对多模态数据的语义信息挖掘不充分,模型检索精度低等问题,对现有的风格聚合模块进行改进,在原始模块中引入通道注意力机制,以此来为每组装修案例中不同图片的特征向量添加合适的权重,从而增强包含更多有用信息的重要特征并削弱其他不重要的特征。同时,为充分利用多模态信息,设计一种适用于检索场景下的多模态特征融合模块,该模块能够自适应地控制2种不同模态的特征向量进行一系列的融合操作,以实现跨模态数据间的知识流动与共享,从而生成语义更丰富、表达能力更强的特征向量,进一步提升模型的检索性能。在自建的装修案例多模态数据集上将该方法与其他方法进行比较,试验结果表明本文方法在装修案例检索上具有更优越的性能。 展开更多
关键词 家装客服系统 装修案例检索 跨模态检索 风格聚合 多模态 特征融合 通道注意力机制 语义信息
下载PDF
基于视觉注意力的图文跨模态情感分析
20
作者 王法玉 郝攀征 《计算机工程与设计》 北大核心 2024年第2期601-607,共7页
针对单模态情感分析无法完全捕获情感信息的问题,提出一种图像和文本跨模态情感分析模型(BERT-VistaNet),该模型没有直接使用视觉信息作为特征,而是利用视觉信息作为对齐方式,使用注意力机制指出文本中重要的句子,得到基于视觉注意力的... 针对单模态情感分析无法完全捕获情感信息的问题,提出一种图像和文本跨模态情感分析模型(BERT-VistaNet),该模型没有直接使用视觉信息作为特征,而是利用视觉信息作为对齐方式,使用注意力机制指出文本中重要的句子,得到基于视觉注意力的文档表示。对于视觉注意力无法完全覆盖的文本内容,使用BERT模型对文本进行情感分析,得到基于文本的文档表示,将特征进行融合应用于情感分类任务。在Yelp公开餐厅数据集上,该模型相比基线模型TFN-aVGG,准确率提高了43%,相比VistaNet模型准确率提高了1.4%。 展开更多
关键词 情感分析 视觉注意力机制 跨模态 深度学习 特征融合 预训练模型 双向门控单元
下载PDF
上一页 1 2 19 下一页 到第
使用帮助 返回顶部