期刊文献+
共找到523篇文章
< 1 2 27 >
每页显示 20 50 100
Unsupervised multi-modal image translation based on the squeeze-and-excitation mechanism and feature attention module
1
作者 胡振涛 HU Chonghao +1 位作者 YANG Haoran SHUAI Weiwei 《High Technology Letters》 EI CAS 2024年第1期23-30,共8页
The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-genera... The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable. 展开更多
关键词 multi-modal image translation generative adversarial network(GAN) squeezeand-excitation(SE)mechanism feature attention(FA)module
下载PDF
Multi-scale attention encoder for street-to-aerial image geo-localization 被引量:2
2
作者 Songlian Li Zhigang Tu +1 位作者 Yujin Chen Tan Yu 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第1期166-176,共11页
The goal of street-to-aerial cross-view image geo-localization is to determine the location of the query street-view image by retrieving the aerial-view image from the same place.The drastic viewpoint and appearance g... The goal of street-to-aerial cross-view image geo-localization is to determine the location of the query street-view image by retrieving the aerial-view image from the same place.The drastic viewpoint and appearance gap between the aerial-view and the street-view images brings a huge challenge against this task.In this paper,we propose a novel multiscale attention encoder to capture the multiscale contextual information of the aerial/street-view images.To bridge the domain gap between these two view images,we first use an inverse polar transform to make the street-view images approximately aligned with the aerial-view images.Then,the explored multiscale attention encoder is applied to convert the image into feature representation with the guidance of the learnt multiscale information.Finally,we propose a novel global mining strategy to enable the network to pay more attention to hard negative exemplars.Experiments on standard benchmark datasets show that our approach obtains 81.39%top-1 recall rate on the CVUSA dataset and 71.52%on the CVACT dataset,achieving the state-of-the-art performance and outperforming most of the existing methods significantly. 展开更多
关键词 global mining strategy image geo-localization multiscale attention encoder street-to-aerial cross-view
下载PDF
Remaining Useful Life Prediction of Rail Based on Improved Pulse Separable Convolution Enhanced Transformer Encoder
3
作者 Zhongmei Wang Min Li +2 位作者 Jing He Jianhua Liu Lin Jia 《Journal of Transportation Technologies》 2024年第2期137-160,共24页
In order to prevent possible casualties and economic loss, it is critical to accurate prediction of the Remaining Useful Life (RUL) in rail prognostics health management. However, the traditional neural networks is di... In order to prevent possible casualties and economic loss, it is critical to accurate prediction of the Remaining Useful Life (RUL) in rail prognostics health management. However, the traditional neural networks is difficult to capture the long-term dependency relationship of the time series in the modeling of the long time series of rail damage, due to the coupling relationship of multi-channel data from multiple sensors. Here, in this paper, a novel RUL prediction model with an enhanced pulse separable convolution is used to solve this issue. Firstly, a coding module based on the improved pulse separable convolutional network is established to effectively model the relationship between the data. To enhance the network, an alternate gradient back propagation method is implemented. And an efficient channel attention (ECA) mechanism is developed for better emphasizing the useful pulse characteristics. Secondly, an optimized Transformer encoder was designed to serve as the backbone of the model. It has the ability to efficiently understand relationship between the data itself and each other at each time step of long time series with a full life cycle. More importantly, the Transformer encoder is improved by integrating pulse maximum pooling to retain more pulse timing characteristics. Finally, based on the characteristics of the front layer, the final predicted RUL value was provided and served as the end-to-end solution. The empirical findings validate the efficacy of the suggested approach in forecasting the rail RUL, surpassing various existing data-driven prognostication techniques. Meanwhile, the proposed method also shows good generalization performance on PHM2012 bearing data set. 展开更多
关键词 Equipment Health Prognostics Remaining Useful Life Prediction Pulse Separable Convolution attention Mechanism Transformer encoder
下载PDF
基于ENCODER_ATT机制的远程监督关系抽取
4
作者 王健 郑七凡 +1 位作者 李超 石晶 《广西师范大学学报(自然科学版)》 CAS 北大核心 2019年第4期53-60,共8页
在信息抽取中,关系抽取是一项准确识别自然语言中实体间关系的关键技术。针对关系抽取模型中容易丢失关键语义特征问题及远程监督的基本假设容易引入噪声数据的问题,本文提出一种基于远程监督的ENCODER_ATT关系抽取模型。基于循环神经... 在信息抽取中,关系抽取是一项准确识别自然语言中实体间关系的关键技术。针对关系抽取模型中容易丢失关键语义特征问题及远程监督的基本假设容易引入噪声数据的问题,本文提出一种基于远程监督的ENCODER_ATT关系抽取模型。基于循环神经网络构造的ENCODER模型在以词级别进行特征记忆提取,并在句子层面进行语义特征信息整合,保证不遗失关键语义特征的同时去除冗余特征。然后在句子层面引入了注意力机制来降低噪声数据对实验结果的影响。在真实的数据集上进行实验,并绘制准确率-召回率曲线,实验结果表明ENCODER_ATT模型对比同类型的关系抽取方法有明显的提升。 展开更多
关键词 关系抽取 远程监督 encoder 注意力机制
下载PDF
结合BERT与BiGRU-Attention-CRF模型的地质命名实体识别 被引量:14
5
作者 谢雪景 谢忠 +5 位作者 马凯 陈建国 邱芹军 李虎 潘声勇 陶留锋 《地质通报》 CAS CSCD 北大核心 2023年第5期846-855,共10页
从地质文本中提取地质命名实体,对地质大数据的深度挖掘与应用具有重要意义。定义了地质命名实体的概念并制订了标注规范,设计了地质实体对象化表达模型。地质文本存在大量长实体、复杂嵌套实体,增加了地质命名实体识别的挑战性。针对... 从地质文本中提取地质命名实体,对地质大数据的深度挖掘与应用具有重要意义。定义了地质命名实体的概念并制订了标注规范,设计了地质实体对象化表达模型。地质文本存在大量长实体、复杂嵌套实体,增加了地质命名实体识别的挑战性。针对上述问题,①引入BERT模型生成顾及上下文信息的高质量词向量表征;②采用双向门控循环单元-注意力机制-条件随机场(BiGRU-Attention-CRF)对前一层输出的语义编码进行序列标注与解码。通过与主流深度学习模型进行对比,该模型的F1值为84.02%,均比其他模型表现出更优异的性能,能在小规模地质语料库上有较好的识别效果。 展开更多
关键词 命名实体识别 地质命名实体 BERT 注意力机制 BiGRU
下载PDF
基于Encoder-Decoder注意力网络的异常驾驶行为在线识别方法 被引量:2
6
作者 唐坤 戴语琴 +2 位作者 徐永能 郭唐仪 邵飞 《兵器装备工程学报》 CAS CSCD 北大核心 2023年第8期63-71,共9页
异常驾驶行为是车辆安全运行的重大威胁,其对人员与物资的安全高效投送造成严重危害。以低成本非接触式的手机多传感器数据为基础,通过对驾驶行为特性进行数据分析,提出一种融合Encoder-Decoder深度网络与Attention机制的异常驾驶行为... 异常驾驶行为是车辆安全运行的重大威胁,其对人员与物资的安全高效投送造成严重危害。以低成本非接触式的手机多传感器数据为基础,通过对驾驶行为特性进行数据分析,提出一种融合Encoder-Decoder深度网络与Attention机制的异常驾驶行为的在线识别方法。该方法由基于LSTM(long short-term memory)的Encoder-Decoder、Attention机制与基于SVM(support vector machine)的分类器3个模块构成。该系统识别方法包括:输入编码、注意力学习、特征解码、序列重构、残差计算与驾驶行为分类等6个步骤。该技术方法利用自然驾驶条件下所采集的手机传感器数据进行实验。实验结果表明:①手机多传感器数据融合方法对驾驶行为识别具备有效性;②异常驾驶行为必然会造成数据异常波动;③Attention机制有助于提升模型学习效果,对所提出模型的识别准确率F1-score为0.717,与经典同类模型比较,准确率得到显著提升;④对于汽车异常驾驶行为来说,SVM比Logistic与随机森林算法具有更优越的识别效果。 展开更多
关键词 异常驾驶 深度学习 编码器-解码器 长短时记忆网络 注意力机制
下载PDF
基于BERT与Loc-Attention的文本情感分析模型 被引量:1
7
作者 何传鹏 黄勃 +3 位作者 周科亮 尹玲 王明胜 李佩佩 《传感器与微系统》 CSCD 北大核心 2023年第12期146-150,共5页
传统的情感分析方法由于没有关注文本相对于主题词的位置(Loc)关系,分类效果并不理想。提出一种基于BERT与LDA的Loc-注意力(Attention)的双向长短期记忆(Bi-LSTM)模型的文本情感分析方法——BL-LABL方法。使用LDA主题模型获得每个评论... 传统的情感分析方法由于没有关注文本相对于主题词的位置(Loc)关系,分类效果并不理想。提出一种基于BERT与LDA的Loc-注意力(Attention)的双向长短期记忆(Bi-LSTM)模型的文本情感分析方法——BL-LABL方法。使用LDA主题模型获得每个评论的主题及其词分布,将筛选出的主题词和原文本拼接输入到BERT模型,进行词向量训练,得到包含主题信息的文本词向量以及包含文本信息的主题词向量;利用Bi-LSTM网络,加入文本的位置权重,结合注意力权重最终得到的文本特征表示为两者的加权求和;最后,再利用SoftMax分类器获得文本的情感类别。通过在两种数据集上的实验表明,该模型与传统的注意力情感分类模型相比,有效地提高了分类性能。 展开更多
关键词 情感分析 主题模型 BERT模型 文本特征 位置权重 注意力
下载PDF
Attention-based spatio-temporal graph convolutional network considering external factors for multi-step traffic flow prediction 被引量:2
8
作者 Jihua Ye Shengjun Xue Aiwen Jiang 《Digital Communications and Networks》 SCIE CSCD 2022年第3期343-350,共8页
Traffic flow prediction is an important part of the intelligent transportation system. Accurate multi-step traffic flow prediction plays an important role in improving the operational efficiency of the traffic network... Traffic flow prediction is an important part of the intelligent transportation system. Accurate multi-step traffic flow prediction plays an important role in improving the operational efficiency of the traffic network. Since traffic flow data has complex spatio-temporal correlation and non-linearity, existing prediction methods are mainly accomplished through a combination of a Graph Convolutional Network (GCN) and a recurrent neural network. The combination strategy has an excellent performance in traffic prediction tasks. However, multi-step prediction error accumulates with the predicted step size. Some scholars use multiple sampling sequences to achieve more accurate prediction results. But it requires high hardware conditions and multiplied training time. Considering the spatiotemporal correlation of traffic flow and influence of external factors, we propose an Attention Based Spatio-Temporal Graph Convolutional Network considering External Factors (ABSTGCN-EF) for multi-step traffic flow prediction. This model models the traffic flow as diffusion on a digraph and extracts the spatial characteristics of traffic flow through GCN. We add meaningful time-slots attention to the encoder-decoder to form an Attention Encoder Network (AEN) to handle temporal correlation. The attention vector is used as a competitive choice to draw the correlation between predicted states and historical states. We considered the impact of three external factors (daytime, weekdays, and traffic accident markers) on the traffic flow prediction tasks. Experiments on two public data sets show that it makes sense to consider external factors. The prediction performance of our ABSTGCN-EF model achieves 7.2%–8.7% higher than the state-of-the-art baselines. 展开更多
关键词 Multi-step traffic flow prediction Graph convolutional network External factors attentional encoder network Spatiotemporal correlation
下载PDF
基于注意力机制的Encoder-Decoder光伏发电预测模型 被引量:10
9
作者 宋良才 索贵龙 +2 位作者 胡军涛 窦艳梅 崔志永 《计算机与现代化》 2020年第9期112-117,共6页
影响光伏发电系统出力的天气因素具有很大的波动性和不连续性,因此需要创建合适的预测模型来对光伏出力特性进行精准预测,从而保证电网系统的有效运行。本文通过最大信息系数选择合适的历史光伏发电数据,将其作为特征之一进行输入数据重... 影响光伏发电系统出力的天气因素具有很大的波动性和不连续性,因此需要创建合适的预测模型来对光伏出力特性进行精准预测,从而保证电网系统的有效运行。本文通过最大信息系数选择合适的历史光伏发电数据,将其作为特征之一进行输入数据重构,并在由LSTM神经元构建的Encoder-Decoder模型上引入注意力机制,最终得到结合注意力机制的Encoder-Decoder光伏发电预测模型。经实际光伏电厂算例分析,验证了所提模型在光伏发电预测方面的准确性和适用性。 展开更多
关键词 光伏发电 最大信息系数 长短期记忆神经网络 encoder-Decoder框架 注意力机制
下载PDF
A multi-modal clustering method for traditonal Chinese medicine clinical data via media convergence
10
作者 Jingna Si Ziwei Tian +6 位作者 Dongmei Li Lei Zhang Lei Yao Wenjuan Jiang Jia Liu Runshun Zhang Xiaoping Zhang 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第2期390-400,共11页
Media convergence is a media change led by technological innovation.Applying media convergence technology to the study of clustering in Chinese medicine can significantly exploit the advantages of media fusion.Obtaini... Media convergence is a media change led by technological innovation.Applying media convergence technology to the study of clustering in Chinese medicine can significantly exploit the advantages of media fusion.Obtaining consistent and complementary information among multiple modalities through media convergence can provide technical support for clustering.This article presents an approach based on Media Convergence and Graph convolution Encoder Clustering(MCGEC)for traditonal Chinese medicine(TCM)clinical data.It feeds modal information and graph structure from media information into a multi-modal graph convolution encoder to obtain the media feature representation learnt from multiple modalities.MCGEC captures latent information from various modalities by fusion and optimises the feature representations and network architecture with learnt clustering labels.The experiment is conducted on real-world multimodal TCM clinical data,including information like images and text.MCGEC has improved clustering results compared to the generic single-modal clustering methods and the current more advanced multi-modal clustering methods.MCGEC applied to TCM clinical datasets can achieve better results.Integrating multimedia features into clustering algorithms offers significant benefits compared to single-modal clustering approaches that simply concatenate features from different modalities.It provides practical technical support for multi-modal clustering in the TCM field incorporating multimedia features. 展开更多
关键词 graph convolutional encoder media convergence multi-modal clustering traditional Chinese medicine
下载PDF
Adaptive multi-modal feature fusion for far and hard object detection
11
作者 LI Yang GE Hongwei 《Journal of Measurement Science and Instrumentation》 CAS CSCD 2021年第2期232-241,共10页
In order to solve difficult detection of far and hard objects due to the sparseness and insufficient semantic information of LiDAR point cloud,a 3D object detection network with multi-modal data adaptive fusion is pro... In order to solve difficult detection of far and hard objects due to the sparseness and insufficient semantic information of LiDAR point cloud,a 3D object detection network with multi-modal data adaptive fusion is proposed,which makes use of multi-neighborhood information of voxel and image information.Firstly,design an improved ResNet that maintains the structure information of far and hard objects in low-resolution feature maps,which is more suitable for detection task.Meanwhile,semantema of each image feature map is enhanced by semantic information from all subsequent feature maps.Secondly,extract multi-neighborhood context information with different receptive field sizes to make up for the defect of sparseness of point cloud which improves the ability of voxel features to represent the spatial structure and semantic information of objects.Finally,propose a multi-modal feature adaptive fusion strategy which uses learnable weights to express the contribution of different modal features to the detection task,and voxel attention further enhances the fused feature expression of effective target objects.The experimental results on the KITTI benchmark show that this method outperforms VoxelNet with remarkable margins,i.e.increasing the AP by 8.78%and 5.49%on medium and hard difficulty levels.Meanwhile,our method achieves greater detection performance compared with many mainstream multi-modal methods,i.e.outperforming the AP by 1%compared with that of MVX-Net on medium and hard difficulty levels. 展开更多
关键词 3D object detection adaptive fusion multi-modal data fusion attention mechanism multi-neighborhood features
下载PDF
基于Attention模型的法律文书生成研究
12
作者 徐惠 苏同 +2 位作者 俞鹏飞 江全胜 朱咸军 《无线互联科技》 2023年第1期111-115,129,共6页
法律文书的自动生成可以有效缓解法律服务行业中人力资源不足的问题,让用户足不出户就可方便享受到法律咨询服务。适用于法律文书的自动生成技术的研究,在减轻法律工作者文书工作上和普通人叙述法律案件时更规范地描述法律内容具有重要... 法律文书的自动生成可以有效缓解法律服务行业中人力资源不足的问题,让用户足不出户就可方便享受到法律咨询服务。适用于法律文书的自动生成技术的研究,在减轻法律工作者文书工作上和普通人叙述法律案件时更规范地描述法律内容具有重要的现实意义。文章提出一种筛选案件要素信息,在Encoder-Decoder模型中加入注意力机制的Attention模型,最终生成合格的法律文书。实验表明,该模型优化了LSTM模型对长文本的记忆效果,能够较好地完成生成法律文书任务。 展开更多
关键词 法律文书 LSTM encoder-Decoder attention模型
下载PDF
基于GRU Encoder-decoder和注意力机制的RUL预测方法
13
作者 兰杰 李宁 +1 位作者 李志宁 吕建刚 《现代电子技术》 2023年第8期99-105,共7页
深度学习模型可直接建立机械设备的状态与剩余使用寿命(RUL)之间的映射关系,从而避免人工提取特征和建立健康指标的过程。文中基于深度学习理论,提出一种基于注意力机制和时序编码解码器(Encoder-decoder)相结合的RUL预测方法。首先,基... 深度学习模型可直接建立机械设备的状态与剩余使用寿命(RUL)之间的映射关系,从而避免人工提取特征和建立健康指标的过程。文中基于深度学习理论,提出一种基于注意力机制和时序编码解码器(Encoder-decoder)相结合的RUL预测方法。首先,基于门控循环神经网络(GRU)构建一个时序编码解码器以实现输入序列的重构,其中GRU-Encoder对输入的多元时间序列进行编码;再引入注意力机制对GRU-Encoder在每个时刻的输出向量进行加权融合,以融合后的向量作为编码结果,并将其输入到GRU-Decoder中实现输入序列的重构,同时将编码结果映射为输入样本的RUL。采用CMAPSS数据集对所提方法的有效性进行验证,结果表明,该方法预测精度较高,可行且有效。 展开更多
关键词 剩余使用寿命 RUL预测方法 门控循环神经网络 解码编码器 注意力机制 对比验证
下载PDF
基于改进Attention Mask编解码器CPI的研究
14
作者 李大舟 陈思思 +1 位作者 高巍 于锦涛 《计算机技术与发展》 2022年第2期214-220,共7页
化合物-蛋白质相互作用(CPI)的研究对药物发现有着重要作用,它可以为药物靶标选择提供有价值的信息,在一定程度上提高先导化合物的命中率,进而加快药物发现的进程。由此提出了一种基于改进Attention Mask编解码器的化合物与蛋白质相互... 化合物-蛋白质相互作用(CPI)的研究对药物发现有着重要作用,它可以为药物靶标选择提供有价值的信息,在一定程度上提高先导化合物的命中率,进而加快药物发现的进程。由此提出了一种基于改进Attention Mask编解码器的化合物与蛋白质相互作用分类的预测模型,分别使用RDkit和Item2vec处理化合物的SMILES字符串和蛋白质的氨基酸序列,将得到的化合物和蛋白质低维特征表示的向量输入到该模型,通过分配权重的方式来计算蛋白质中的哪个子序列对化合物分子更重要,使用带有Attention机制的神经网络计算权重,模拟化合物和蛋白质之间的相互作用关系,最后作为一个二分类问题输出化合物和蛋白质是否相互作用的预测概率。模型性能测评采用ROC曲线下面积、准确召回率曲线作为评价指标,实验结果表明,该模型相比于GraphDTA和GCN模型而言,拥有更好的性能表现,AUC值提高了0.04左右,PRC值提高了0.07左右。 展开更多
关键词 深度学习 多头自注意力 化合物蛋白相互作用 Item2vec 编码器-解码器
下载PDF
Fake News Detection Based on Text-Modal Dominance and Fusing Multiple Multi-Model Clues
15
作者 Li fang Fu Huanxin Peng +1 位作者 Changjin Ma Yuhan Liu 《Computers, Materials & Continua》 SCIE EI 2024年第3期4399-4416,共18页
In recent years,how to efficiently and accurately identify multi-model fake news has become more challenging.First,multi-model data provides more evidence but not all are equally important.Secondly,social structure in... In recent years,how to efficiently and accurately identify multi-model fake news has become more challenging.First,multi-model data provides more evidence but not all are equally important.Secondly,social structure information has proven to be effective in fake news detection and how to combine it while reducing the noise information is critical.Unfortunately,existing approaches fail to handle these problems.This paper proposes a multi-model fake news detection framework based on Tex-modal Dominance and fusing Multiple Multi-model Cues(TD-MMC),which utilizes three valuable multi-model clues:text-model importance,text-image complementary,and text-image inconsistency.TD-MMC is dominated by textural content and assisted by image information while using social network information to enhance text representation.To reduce the irrelevant social structure’s information interference,we use a unidirectional cross-modal attention mechanism to selectively learn the social structure’s features.A cross-modal attention mechanism is adopted to obtain text-image cross-modal features while retaining textual features to reduce the loss of important information.In addition,TD-MMC employs a new multi-model loss to improve the model’s generalization ability.Extensive experiments have been conducted on two public real-world English and Chinese datasets,and the results show that our proposed model outperforms the state-of-the-art methods on classification evaluation metrics. 展开更多
关键词 Fake news detection cross-modal attention mechanism multi-modal fusion social network transfer learning
下载PDF
Enhancing Human Action Recognition with Adaptive Hybrid Deep Attentive Networks and Archerfish Optimization
16
作者 Ahmad Yahiya Ahmad Bani Ahmad Jafar Alzubi +3 位作者 Sophers James Vincent Omollo Nyangaresi Chanthirasekaran Kutralakani Anguraju Krishnan 《Computers, Materials & Continua》 SCIE EI 2024年第9期4791-4812,共22页
In recent years,wearable devices-based Human Activity Recognition(HAR)models have received significant attention.Previously developed HAR models use hand-crafted features to recognize human activities,leading to the e... In recent years,wearable devices-based Human Activity Recognition(HAR)models have received significant attention.Previously developed HAR models use hand-crafted features to recognize human activities,leading to the extraction of basic features.The images captured by wearable sensors contain advanced features,allowing them to be analyzed by deep learning algorithms to enhance the detection and recognition of human actions.Poor lighting and limited sensor capabilities can impact data quality,making the recognition of human actions a challenging task.The unimodal-based HAR approaches are not suitable in a real-time environment.Therefore,an updated HAR model is developed using multiple types of data and an advanced deep-learning approach.Firstly,the required signals and sensor data are accumulated from the standard databases.From these signals,the wave features are retrieved.Then the extracted wave features and sensor data are given as the input to recognize the human activity.An Adaptive Hybrid Deep Attentive Network(AHDAN)is developed by incorporating a“1D Convolutional Neural Network(1DCNN)”with a“Gated Recurrent Unit(GRU)”for the human activity recognition process.Additionally,the Enhanced Archerfish Hunting Optimizer(EAHO)is suggested to fine-tune the network parameters for enhancing the recognition process.An experimental evaluation is performed on various deep learning networks and heuristic algorithms to confirm the effectiveness of the proposed HAR model.The EAHO-based HAR model outperforms traditional deep learning networks with an accuracy of 95.36,95.25 for recall,95.48 for specificity,and 95.47 for precision,respectively.The result proved that the developed model is effective in recognizing human action by taking less time.Additionally,it reduces the computation complexity and overfitting issue through using an optimization approach. 展开更多
关键词 Human action recognition multi-modal sensor data and signals adaptive hybrid deep attentive network enhanced archerfish hunting optimizer 1D convolutional neural network gated recurrent units
下载PDF
Dual encoding feature filtering generalized attention UNET for retinal vessel segmentation
17
作者 ISLAM Md Tauhidul WU Da-Wen +6 位作者 TANG Qing-Qing ZHAO Kai-Yang YIN Teng LI Yan-Fei SHANG Wen-Yi LIU Jing-Yu ZHANG Hai-Xian 《四川大学学报(自然科学版)》 2025年第1期79-95,共17页
Retinal blood vessel segmentation is crucial for diagnosing ocular and cardiovascular diseases.Although the introduction of U-Net in 2015 by Olaf Ronneberger significantly advanced this field,yet issues like limited t... Retinal blood vessel segmentation is crucial for diagnosing ocular and cardiovascular diseases.Although the introduction of U-Net in 2015 by Olaf Ronneberger significantly advanced this field,yet issues like limited training data,imbalance data distribution,and inadequate feature extraction persist,hindering both the segmentation performance and optimal model generalization.Addressing these critical issues,the DEFFA-Unet is proposed featuring an additional encoder to process domain-invariant pre-processed inputs,thereby improving both richer feature encoding and enhanced model generalization.A feature filtering fusion module is developed to ensure the precise feature filtering and robust hybrid feature fusion.In response to the task-specific need for higher precision where false positives are very costly,traditional skip connections are replaced with the attention-guided feature reconstructing fusion module.Additionally,innovative data augmentation and balancing methods are proposed to counter data scarcity and distribution imbalance,further boosting the robustness and generalization of the model.With a comprehensive suite of evaluation metrics,extensive validations on four benchmark datasets(DRIVE,CHASEDB1,STARE,and HRF)and an SLO dataset(IOSTAR),demonstrate the proposed method’s superiority over both baseline and state-of-the-art models.Particularly the proposed method significantly outperforms the compared methods in cross-validation model generalization. 展开更多
关键词 Vessel segmentation Data balancing Data augmentation Dual encoder attention Mechanism Model generalization
下载PDF
基于单目RGB图像的三维手部姿态估计方法
18
作者 杨冰 徐楚阳 +1 位作者 姚金良 向学勤 《浙江大学学报(工学版)》 北大核心 2025年第1期18-26,共9页
现有的三维手部姿态估计方法大多基于Transformer技术,未充分利用高分辨率下的局部空间信息,为此提出基于改进FastMETRO的三维手部姿态估计方法.引入可变形注意力机制,使得编码器的设计不再受限于图像特征序列长度;引入交错更新多尺度... 现有的三维手部姿态估计方法大多基于Transformer技术,未充分利用高分辨率下的局部空间信息,为此提出基于改进FastMETRO的三维手部姿态估计方法.引入可变形注意力机制,使得编码器的设计不再受限于图像特征序列长度;引入交错更新多尺度特征编码器来融合多尺度特征,强化生成手部姿态;引入图卷积残差模块来挖掘网格顶点间的显式语义联系.为了验证所提方法的有效性,在数据集FreiHAND、HO3D V2和HO3D V3上开展训练及评估实验.结果表明,所提方法的回归精度优于现有先进方法,在FreiHAND、HO3D V2、HO3D V3上的普鲁克对齐-平均关节点误差分别为5.8、10.0、10.5 mm. 展开更多
关键词 三维手部姿态估计 TRANSFORMER 可变形注意力机制 交错更新多尺度特征编码器 神经网络
下载PDF
融合多阶段特征的中文命名实体识别模型
19
作者 杨先凤 范玥 +1 位作者 李自强 汤依磊 《计算机工程与设计》 北大核心 2025年第1期37-43,共7页
针对中文命名实体识别中未充分利用完整的文本表示和语句特征的问题,提出一种融合多阶段特征的中文命名实体识别模型(LM-CNER)。采用全局注意力机制文本融合字符级嵌入与其预训练词向量,同时获取字符级特征和单词级特征。采用翻转长短... 针对中文命名实体识别中未充分利用完整的文本表示和语句特征的问题,提出一种融合多阶段特征的中文命名实体识别模型(LM-CNER)。采用全局注意力机制文本融合字符级嵌入与其预训练词向量,同时获取字符级特征和单词级特征。采用翻转长短时记忆网络(Re-LSTM)进行上下文特征提取,采用多头自注意力机制进行句法分析,并将二者进行拼接。使用条件随机场作为解码器,得到命名实体识别结果。在微博和简历两个数据集上的实验结果表明,该模型能够获取更加准确的文本表示和语句特征,提升模型的实体识别效果。 展开更多
关键词 命名实体识别 翻转长短时记忆网络 注意力机制 编码器 预训练词向量 多阶段特征 条件随机场
下载PDF
基于无人机影像的改进YOLOv5道路目标检测
20
作者 马荣贵 张翼 董世浩 《无线电工程》 2025年第1期1-10,共10页
针对无人机影像中道路小目标漏检和目标之间遮挡导致的目标检测精度低、鲁棒性差等问题,提出一种多尺度的道路目标检测算法——YOLOv5-FTCE。执行多尺度的目标定位改进,采用完全交并比(Complete Intersection over Union,CIoU)边界框损... 针对无人机影像中道路小目标漏检和目标之间遮挡导致的目标检测精度低、鲁棒性差等问题,提出一种多尺度的道路目标检测算法——YOLOv5-FTCE。执行多尺度的目标定位改进,采用完全交并比(Complete Intersection over Union,CIoU)边界框损失,通过K-means算法对先验框进行重聚类,调整先验框的锚框参数并增加一个针对小目标的YOLO检测头;引入Transformer encoder结构融入C3模块改进Backbone网络,增强网络对不同局部信息的捕获能力;选用基于特征重组的Content-Aware ReAssembly of FEatures(CARAFE)模块进行上采样,提高上采样性能的同时减少特征处理过程中的信息损失;引入高效注意力模块(Efficient Attention Module,EAM)融合空间和通道信息,对网络中重要的信息进行增强。结果表明,YOLOv5-FTCE算法在VisDrone数据集上,检测精确率相比原始算法提高了9.5%,mAP50提高了8.9%,优于YOLOv7等其他常见的算法,有效改善了道路小目标和遮挡目标的漏检现象。 展开更多
关键词 道路目标检测 YOLOv5 Transformer编码器 特征重组 高效卷积注意力模块
下载PDF
上一页 1 2 27 下一页 到第
使用帮助 返回顶部