期刊文献+
共找到32篇文章
< 1 2 >
每页显示 20 50 100
A Tabletop Nano-CT Image Noise Reduction Network Based on 3-Dimensional Axial Attention Mechanism
1
作者 Huijuan Fu Linlin Zhu +5 位作者 ChunhuiWang Xiaoqi Xi Yu Han Lei Li Yanmin Sun Bin Yan 《Computers, Materials & Continua》 SCIE EI 2024年第7期1711-1725,共15页
Nano-computed tomography(Nano-CT)is an emerging,high-resolution imaging technique.However,due to their low-light properties,tabletop Nano-CT has to be scanned under long exposure conditions,which the scanning process ... Nano-computed tomography(Nano-CT)is an emerging,high-resolution imaging technique.However,due to their low-light properties,tabletop Nano-CT has to be scanned under long exposure conditions,which the scanning process is time-consuming.For 3D reconstruction data,this paper proposed a lightweight 3D noise reduction method for desktop-level Nano-CT called AAD-ResNet(Axial Attention DeNoise ResNet).The network is framed by theU-net structure.The encoder and decoder are incorporated with the proposed 3D axial attention mechanism and residual dense block.Each layer of the residual dense block can directly access the features of the previous layer,which reduces the redundancy of parameters and improves the efficiency of network training.The 3D axial attention mechanism enhances the correlation between 3D information in the training process and captures the long-distance dependence.It can improve the noise reduction effect and avoid the loss of image structure details.Experimental results show that the network can effectively improve the image quality of a 0.1-s exposure scan to a level close to a 3-s exposure,significantly shortening the sample scanning time. 展开更多
关键词 deep learning tabletop Nano-CT image denoising 3d axial attention mechanism
下载PDF
An Assisted Diagnosis of Alzheimer’s Disease Incorporating Attention Mechanisms Med-3D Transfer Modeling
2
作者 Yanmei Li Jinghong Tang +3 位作者 Weiwu Ding Jian Luo Naveed Ahmad Rajesh Kumar 《Computers, Materials & Continua》 SCIE EI 2024年第1期713-733,共21页
Alzheimer’s disease(AD)is a complex,progressive neurodegenerative disorder.The subtle and insidious onset of its pathogenesis makes early detection of a formidable challenge in both contemporary neuroscience and clin... Alzheimer’s disease(AD)is a complex,progressive neurodegenerative disorder.The subtle and insidious onset of its pathogenesis makes early detection of a formidable challenge in both contemporary neuroscience and clinical practice.In this study,we introduce an advanced diagnostic methodology rooted in theMed-3D transfermodel and enhanced with an attention mechanism.We aim to improve the precision of AD diagnosis and facilitate its early identification.Initially,we employ a spatial normalization technique to address challenges like clarity degradation and unsaturation,which are commonly observed in imaging datasets.Subsequently,an attention mechanism is incorporated to selectively focus on the salient features within the imaging data.Building upon this foundation,we present the novelMed-3D transfermodel,designed to further elucidate and amplify the intricate features associated withADpathogenesis.Our proposedmodel has demonstrated promising results,achieving a classification accuracy of 92%.To emphasize the robustness and practicality of our approach,we introduce an adaptive‘hot-updating’auxiliary diagnostic system.This system not only enables continuous model training and optimization but also provides a dynamic platform to meet the real-time diagnostic and therapeutic demands of AD. 展开更多
关键词 Alzheimer’s disease channel attention Med-3d hot update
下载PDF
Attention Guided Multi Scale Feature Fusion Network for Automatic Prostate Segmentation
3
作者 Yuchun Li Mengxing Huang +1 位作者 Yu Zhang Zhiming Bai 《Computers, Materials & Continua》 SCIE EI 2024年第2期1649-1668,共20页
The precise and automatic segmentation of prostate magnetic resonance imaging(MRI)images is vital for assisting doctors in diagnosing prostate diseases.In recent years,many advanced methods have been applied to prosta... The precise and automatic segmentation of prostate magnetic resonance imaging(MRI)images is vital for assisting doctors in diagnosing prostate diseases.In recent years,many advanced methods have been applied to prostate segmentation,but due to the variability caused by prostate diseases,automatic segmentation of the prostate presents significant challenges.In this paper,we propose an attention-guided multi-scale feature fusion network(AGMSF-Net)to segment prostate MRI images.We propose an attention mechanism for extracting multi-scale features,and introduce a 3D transformer module to enhance global feature representation by adding it during the transition phase from encoder to decoder.In the decoder stage,a feature fusion module is proposed to obtain global context information.We evaluate our model on MRI images of the prostate acquired from a local hospital.The relative volume difference(RVD)and dice similarity coefficient(DSC)between the results of automatic prostate segmentation and ground truth were 1.21%and 93.68%,respectively.To quantitatively evaluate prostate volume on MRI,which is of significant clinical significance,we propose a unique AGMSF-Net.The essential performance evaluation and validation experiments have demonstrated the effectiveness of our method in automatic prostate segmentation. 展开更多
关键词 Prostate segmentation multi-scale attention 3d Transformer feature fusion MRI
下载PDF
A Visual-attention-based Mobile 3D Mapping Method for Robots 被引量:3
4
作者 Binghua Guo Hongyue Dai Zhonghua Li 《自动化学报》 EI CSCD 北大核心 2017年第7期1248-1256,共9页
关键词 智能移动机器人 三维地图 视觉系统 注意力 绘制方法 环境建模 传感器建模 贝叶斯定理
下载PDF
Short‐term and long‐term memory self‐attention network for segmentation of tumours in 3D medical images
5
作者 Mingwei Wen Quan Zhou +3 位作者 Bo Tao Pavel Shcherbakov Yang Xu Xuming Zhang 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第4期1524-1537,共14页
Tumour segmentation in medical images(especially 3D tumour segmentation)is highly challenging due to the possible similarity between tumours and adjacent tissues,occurrence of multiple tumours and variable tumour shap... Tumour segmentation in medical images(especially 3D tumour segmentation)is highly challenging due to the possible similarity between tumours and adjacent tissues,occurrence of multiple tumours and variable tumour shapes and sizes.The popular deep learning‐based segmentation algorithms generally rely on the convolutional neural network(CNN)and Transformer.The former cannot extract the global image features effectively while the latter lacks the inductive bias and involves the complicated computation for 3D volume data.The existing hybrid CNN‐Transformer network can only provide the limited performance improvement or even poorer segmentation performance than the pure CNN.To address these issues,a short‐term and long‐term memory self‐attention network is proposed.Firstly,a distinctive self‐attention block uses the Transformer to explore the correlation among the region features at different levels extracted by the CNN.Then,the memory structure filters and combines the above information to exclude the similar regions and detect the multiple tumours.Finally,the multi‐layer reconstruction blocks will predict the tumour boundaries.Experimental results demonstrate that our method outperforms other methods in terms of subjective visual and quantitative evaluation.Compared with the most competitive method,the proposed method provides Dice(82.4%vs.76.6%)and Hausdorff distance 95%(HD95)(10.66 vs.11.54 mm)on the KiTS19 as well as Dice(80.2%vs.78.4%)and HD95(9.632 vs.12.17 mm)on the LiTS. 展开更多
关键词 3d medical images convolutional neural network self‐attention network TRANSFORMER tumor segmentation
下载PDF
A 3D attention U-Net network and its application in geological model parameterization
6
作者 LI Xiaobo LI Xin +4 位作者 YAN Lin ZHOU Tenghua LI Shunming WANG Jiqiang LI Xinhao 《Petroleum Exploration and Development》 2023年第1期183-190,共8页
To solve the problems of convolutional neural network–principal component analysis(CNN-PCA)in fine description and generalization of complex reservoir geological features,a 3D attention U-Net network was proposed not... To solve the problems of convolutional neural network–principal component analysis(CNN-PCA)in fine description and generalization of complex reservoir geological features,a 3D attention U-Net network was proposed not using a trained C3D video motion analysis model to extract the style of a 3D model,and applied to complement the details of geologic model lost in the dimension reduction of PCA method in this study.The 3D attention U-Net network was applied to a complex river channel sandstone reservoir to test its effects.The results show that compared with CNN-PCA method,the 3D attention U-Net network could better complement the details of geological model lost in the PCA dimension reduction,better reflect the fluid flow features in the original geologic model,and improve history matching results. 展开更多
关键词 reservoir history matching geological model parameterization deep learning attention mechanism 3d U-Net
下载PDF
An Efficient 3D CNN Framework with Attention Mechanisms for Alzheimer’s Disease Classification
7
作者 Athena George Bejoy Abraham +2 位作者 Neetha George Linu Shine Sivakumar Ramachandran 《Computer Systems Science & Engineering》 SCIE EI 2023年第11期2097-2118,共22页
Neurodegeneration is the gradual deterioration and eventual death of brain cells,leading to progressive loss of structure and function of neurons in the brain and nervous system.Neurodegenerative disorders,such as Alz... Neurodegeneration is the gradual deterioration and eventual death of brain cells,leading to progressive loss of structure and function of neurons in the brain and nervous system.Neurodegenerative disorders,such as Alzheimer’s,Huntington’s,Parkinson’s,amyotrophic lateral sclerosis,multiple system atrophy,and multiple sclerosis,are characterized by progressive deterioration of brain function,resulting in symptoms such as memory impairment,movement difficulties,and cognitive decline.Early diagnosis of these conditions is crucial to slowing down cell degeneration and reducing the severity of the diseases.Magnetic resonance imaging(MRI)is widely used by neurologists for diagnosing brain abnormalities.The majority of the research in this field focuses on processing the 2D images extracted from the 3D MRI volumetric scans for disease diagnosis.This might result in losing the volumetric information obtained from the whole brain MRI.To address this problem,a novel 3D-CNN architecture with an attention mechanism is proposed to classify whole-brain MRI images for Alzheimer’s disease(AD)detection.The 3D-CNN model uses channel and spatial attention mechanisms to extract relevant features and improve accuracy in identifying brain dysfunctions by focusing on specific regions of the brain.The pipeline takes pre-processed MRI volumetric scans as input,and the 3D-CNN model leverages both channel and spatial attention mechanisms to extract precise feature representations of the input MRI volume for accurate classification.The present study utilizes the publicly available Alzheimer’s disease Neuroimaging Initiative(ADNI)dataset,which has three image classes:Mild Cognitive Impairment(MCI),Cognitive Normal(CN),and AD affected.The proposed approach achieves an overall accuracy of 79%when classifying three classes and an average accuracy of 87%when identifying AD and the other two classes.The findings reveal that 3D-CNN models with an attention mechanism exhibit significantly higher classification performance compared to other models,highlighting the potential of deep learning algorithms to aid in the early detection and prediction of AD. 展开更多
关键词 3d CNN alzheimer’s disease attention mechanism CLASSIFICATION
下载PDF
Rail-Pillar Net:A 3D Detection Network for Railway Foreign Object Based on LiDAR
8
作者 Fan Li Shuyao Zhang +2 位作者 Jie Yang Zhicheng Feng Zhichao Chen 《Computers, Materials & Continua》 SCIE EI 2024年第9期3819-3833,共15页
Aiming at the limitations of the existing railway foreign object detection methods based on two-dimensional(2D)images,such as short detection distance,strong influence of environment and lack of distance information,w... Aiming at the limitations of the existing railway foreign object detection methods based on two-dimensional(2D)images,such as short detection distance,strong influence of environment and lack of distance information,we propose Rail-PillarNet,a three-dimensional(3D)LIDAR(Light Detection and Ranging)railway foreign object detection method based on the improvement of PointPillars.Firstly,the parallel attention pillar encoder(PAPE)is designed to fully extract the features of the pillars and alleviate the problem of local fine-grained information loss in PointPillars pillars encoder.Secondly,a fine backbone network is designed to improve the feature extraction capability of the network by combining the coding characteristics of LIDAR point cloud feature and residual structure.Finally,the initial weight parameters of the model were optimised by the transfer learning training method to further improve accuracy.The experimental results on the OSDaR23 dataset show that the average accuracy of Rail-PillarNet reaches 58.51%,which is higher than most mainstream models,and the number of parameters is 5.49 M.Compared with PointPillars,the accuracy of each target is improved by 10.94%,3.53%,16.96%and 19.90%,respectively,and the number of parameters only increases by 0.64M,which achieves a balance between the number of parameters and accuracy. 展开更多
关键词 Railway foreign object light detection and ranging(LidAR) 3d object detection PointPillars parallel attention mechanism transfer learning
下载PDF
Learnable three-dimensional Gabor convolutional network with global affinity attention for hyperspectral image classification
9
作者 Hai-Zhu Pan Mo-Qi Liu +1 位作者 Hai-Miao Ge Qi Yuan 《Chinese Physics B》 SCIE EI CAS CSCD 2022年第12期118-135,共18页
Benefiting from the development of hyperspectral imaging technology,hyperspectral image(HSI)classification has become a valuable direction in remote sensing image processing.Recently,researchers have found a connectio... Benefiting from the development of hyperspectral imaging technology,hyperspectral image(HSI)classification has become a valuable direction in remote sensing image processing.Recently,researchers have found a connection between convolutional neural networks(CNNs)and Gabor filters.Therefore,some Gabor-based CNN methods have been proposed for HSI classification.However,most Gabor-based CNN methods still manually generate Gabor filters whose parameters are empirically set and remain unchanged during the CNN learning process.Moreover,these methods require patch cubes as network inputs.Such patch cubes may contain interference pixels,which will negatively affect the classification results.To address these problems,in this paper,we propose a learnable three-dimensional(3D)Gabor convolutional network with global affinity attention for HSI classification.More precisely,the learnable 3D Gabor convolution kernel is constructed by the 3D Gabor filter,which can be learned and updated during the training process.Furthermore,spatial and spectral global affinity attention modules are introduced to capture more discriminative features between spatial locations and spectral bands in the patch cube,thus alleviating the interfering pixels problem.Experimental results on three well-known HSI datasets(including two natural crop scenarios and one urban scenario)have demonstrated that the proposed network can achieve powerful classification performance and outperforms widely used machine-learning-based and deep-learning-based methods. 展开更多
关键词 image processing remote sensing 3d Gabor filter neural networks global affinity attention
下载PDF
Hand gesture tracking algorithm based on visual attention
10
作者 冯志全 徐涛 +3 位作者 吕娜 唐好魁 蒋彦 梁丽伟 《Journal of Beijing Institute of Technology》 EI CAS 2016年第4期491-501,共11页
In the majority of the interaction process, the operator often focuses on the tracked 3D hand gesture model at the "interaction points" in the collision detectionscene, such as "grasp" and "release" and objects ... In the majority of the interaction process, the operator often focuses on the tracked 3D hand gesture model at the "interaction points" in the collision detectionscene, such as "grasp" and "release" and objects in the scene, without paying attention to the tracked 3D hand gesture model in the total procedure. Thus in this paper, a visual attention distribution model of operator in the "grasp", "translation", "release" and other basic operation procedures is first studied and a 3D hand gesture tracking algorithm based on this distribution model is proposed. Utilizing the algorithm, in the period with a low degree of visual attention, a pre-stored 3D hand gesture animation can be used to directly visualise a 3D hand gesture model in the interactive scene; in the time period with a high degree of visual attention, an existing "frame-by-frame tracking" approach can be adopted to obtain a 3D gesture model. The results demonstrate that the proposed method can achieve real-time tracking of 3D hand gestures with an effective improvement on the efficiency, fluency, and availability of 3D hand gesture interaction. 展开更多
关键词 visual attention 3d hand gesture tracking hand gesture interaction
下载PDF
基于改进DETR的机器人铆接缺陷检测方法研究 被引量:2
11
作者 李宗刚 宋秋凡 +1 位作者 杜亚江 陈引娟 《铁道科学与工程学报》 EI CAS CSCD 北大核心 2024年第4期1690-1700,共11页
铆接作为铁道车辆结构件的主要连接方式,合格的铆接质量是车辆安全稳定运行的重要保证。针对现有铆接缺陷检测方法存在检测精度低、检测点位少、检测智能化水平不高等问题,提出一种基于改进DETR的机器人铆接缺陷检测方法。首先,搭建铆... 铆接作为铁道车辆结构件的主要连接方式,合格的铆接质量是车辆安全稳定运行的重要保证。针对现有铆接缺陷检测方法存在检测精度低、检测点位少、检测智能化水平不高等问题,提出一种基于改进DETR的机器人铆接缺陷检测方法。首先,搭建铆接缺陷检测系统,依次采集工件尺寸大、铆钉尺寸小工况下的铆接缺陷图像。其次,为了增强DETR模型在小目标中的图像特征提取能力和检测性能,以EfficientNet作为DETR中的主干特征提取网络,并将3-D权重注意力机制SimAM引入EfficientNet网络,从而有效保留图像特征层的镦头形态信息和铆点区域的空间信息。然后,在颈部网络中引入加权双向特征金字塔模块,以EfficientNet网络的输出作为特征融合模块的输入对各尺度特征信息进行聚合,增大不同铆接缺陷的类间差异。最后,利用Smooth L1和DIoU的线性组合改进原模型预测网络的回归损失函数,提高模型的检测精度和收敛速度。结果表明,改进模型表现出较高的检测性能,对于铆接缺陷的平均检测精度mAP为97.12%,检测速度FPS为25.4帧/s,与Faster RCNN、YOLOX等其他主流检测模型相比,在检测精度和检测速度方面均具有较大优势。研究结果能够满足实际工况中大型铆接件的小尺寸铆钉铆接缺陷实时在线检测的需求,为视觉检测技术在铆接工艺中的应用提供一定的参考价值。 展开更多
关键词 铆接缺陷检测 dETR EfficientNet 3-d注意力机制 多尺度加权特征融合
下载PDF
3D Vehicle Detection Algorithm Based onMultimodal Decision-Level Fusion
12
作者 Peicheng Shi Heng Qi +1 位作者 Zhiqiang Liu Aixi Yang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第6期2007-2023,共17页
3D vehicle detection based on LiDAR-camera fusion is becoming an emerging research topic in autonomous driving.The algorithm based on the Camera-LiDAR object candidate fusion method(CLOCs)is currently considered to be... 3D vehicle detection based on LiDAR-camera fusion is becoming an emerging research topic in autonomous driving.The algorithm based on the Camera-LiDAR object candidate fusion method(CLOCs)is currently considered to be a more effective decision-level fusion algorithm,but it does not fully utilize the extracted features of 3D and 2D.Therefore,we proposed a 3D vehicle detection algorithm based onmultimodal decision-level fusion.First,project the anchor point of the 3D detection bounding box into the 2D image,calculate the distance between 2D and 3D anchor points,and use this distance as a new fusion feature to enhance the feature redundancy of the network.Subsequently,add an attention module:squeeze-and-excitation networks,weight each feature channel to enhance the important features of the network,and suppress useless features.The experimental results show that the mean average precision of the algorithm in the KITTI dataset is 82.96%,which outperforms previous state-ofthe-art multimodal fusion-based methods,and the average accuracy in the Easy,Moderate and Hard evaluation indicators reaches 88.96%,82.60%,and 77.31%,respectively,which are higher compared to the original CLOCs model by 1.02%,2.29%,and 0.41%,respectively.Compared with the original CLOCs algorithm,our algorithm has higher accuracy and better performance in 3D vehicle detection. 展开更多
关键词 3d vehicle detection multimodal fusion CLOCs network structure optimization attention module
下载PDF
MFF-Net: Multimodal Feature Fusion Network for 3D Object Detection
13
作者 Peicheng Shi Zhiqiang Liu +1 位作者 Heng Qi Aixi Yang 《Computers, Materials & Continua》 SCIE EI 2023年第6期5615-5637,共23页
In complex traffic environment scenarios,it is very important for autonomous vehicles to accurately perceive the dynamic information of other vehicles around the vehicle in advance.The accuracy of 3D object detection ... In complex traffic environment scenarios,it is very important for autonomous vehicles to accurately perceive the dynamic information of other vehicles around the vehicle in advance.The accuracy of 3D object detection will be affected by problems such as illumination changes,object occlusion,and object detection distance.To this purpose,we face these challenges by proposing a multimodal feature fusion network for 3D object detection(MFF-Net).In this research,this paper first uses the spatial transformation projection algorithm to map the image features into the feature space,so that the image features are in the same spatial dimension when fused with the point cloud features.Then,feature channel weighting is performed using an adaptive expression augmentation fusion network to enhance important network features,suppress useless features,and increase the directionality of the network to features.Finally,this paper increases the probability of false detection and missed detection in the non-maximum suppression algo-rithm by increasing the one-dimensional threshold.So far,this paper has constructed a complete 3D target detection network based on multimodal feature fusion.The experimental results show that the proposed achieves an average accuracy of 82.60%on the Karlsruhe Institute of Technology and Toyota Technological Institute(KITTI)dataset,outperforming previous state-of-the-art multimodal fusion networks.In Easy,Moderate,and hard evaluation indicators,the accuracy rate of this paper reaches 90.96%,81.46%,and 75.39%.This shows that the MFF-Net network has good performance in 3D object detection. 展开更多
关键词 3d object detection multimodal fusion neural network autonomous driving attention mechanism
下载PDF
3D Perception Algorithms: Towards Perceptually Driven Compression of 3D Video
14
作者 Ruimin Hu Rui Zhong +1 位作者 Zhongyuan Wang Zhen Han 《ZTE Communications》 2013年第1期11-16,共6页
In this paper, we summarize 3D perception-oriented algorithms for perceptually driven 3D video coding. Several perceptual ef- fects have been exploited for 2D video viewing; however, this is not yet the case for 3D vi... In this paper, we summarize 3D perception-oriented algorithms for perceptually driven 3D video coding. Several perceptual ef- fects have been exploited for 2D video viewing; however, this is not yet the case for 3D video viewing. 3D video requires depth perception, which implies binocular effects such as con fl icts, fusion, and rivalry. A better understanding of these effects is necessary for 3D perceptual compression, which provides users with a more comfortable visual experience for video that is de- livered over a channel with limited bandwidth. We present state-of-the-art of 3D visual attention models, 3D just-notice- able difference models, and 3D texture-synthesis models that address 3D human vision issues in 3D video coding and trans-mission. 展开更多
关键词 3d perception 3d visual attention 3d just-noticeable differ-ence 3d texture-synthesis 3d video compression
下载PDF
Image attention transformer network for indoor 3D object detection
15
作者 REN KeYan YAN Tong +2 位作者 HU ZhaoXin HAN HongGui ZHANG YunLu 《Science China(Technological Sciences)》 SCIE EI CAS CSCD 2024年第7期2176-2190,共15页
Point clouds and RGB images are both critical data for 3D object detection. While recent multi-modal methods combine them directly and show remarkable performances, they ignore the distinct forms of these two types of... Point clouds and RGB images are both critical data for 3D object detection. While recent multi-modal methods combine them directly and show remarkable performances, they ignore the distinct forms of these two types of data. For mitigating the influence of this intrinsic difference on performance, we propose a novel but effective fusion model named LI-Attention model, which takes both RGB features and point cloud features into consideration and assigns a weight to each RGB feature by attention mechanism.Furthermore, based on the LI-Attention model, we propose a 3D object detection method called image attention transformer network(IAT-Net) specialized for indoor RGB-D scene. Compared with previous work on multi-modal detection, IAT-Net fuses elaborate RGB features from 2D detection results with point cloud features in attention mechanism, meanwhile generates and refines 3D detection results with transformer model. Extensive experiments demonstrate that our approach outperforms stateof-the-art performance on two widely used benchmarks of indoor 3D object detection, SUN RGB-D and NYU Depth V2, while ablation studies have been provided to analyze the effect of each module. And the source code for the proposed IAT-Net is publicly available at https://github.com/wisper181/IAT-Net. 展开更多
关键词 3d object detection TRANSFORMER attention mechanism
原文传递
HgaNets:Fusion of Visual Data and Skeletal Heatmap for Human Gesture Action Recognition
16
作者 Wuyan Liang Xiaolong Xu 《Computers, Materials & Continua》 SCIE EI 2024年第4期1089-1103,共15页
Recognition of human gesture actions is a challenging issue due to the complex patterns in both visual andskeletal features. Existing gesture action recognition (GAR) methods typically analyze visual and skeletal data... Recognition of human gesture actions is a challenging issue due to the complex patterns in both visual andskeletal features. Existing gesture action recognition (GAR) methods typically analyze visual and skeletal data,failing to meet the demands of various scenarios. Furthermore, multi-modal approaches lack the versatility toefficiently process both uniformand disparate input patterns.Thus, in this paper, an attention-enhanced pseudo-3Dresidual model is proposed to address the GAR problem, called HgaNets. This model comprises two independentcomponents designed formodeling visual RGB (red, green and blue) images and 3Dskeletal heatmaps, respectively.More specifically, each component consists of two main parts: 1) a multi-dimensional attention module forcapturing important spatial, temporal and feature information in human gestures;2) a spatiotemporal convolutionmodule that utilizes pseudo-3D residual convolution to characterize spatiotemporal features of gestures. Then,the output weights of the two components are fused to generate the recognition results. Finally, we conductedexperiments on four datasets to assess the efficiency of the proposed model. The results show that the accuracy onfour datasets reaches 85.40%, 91.91%, 94.70%, and 95.30%, respectively, as well as the inference time is 0.54 s andthe parameters is 2.74M. These findings highlight that the proposed model outperforms other existing approachesin terms of recognition accuracy. 展开更多
关键词 Gesture action recognition multi-dimensional attention pseudo-3d skeletal heatmap
下载PDF
ARM3D:Attention-based relation module for indoor 3D object detection 被引量:4
17
作者 Yuqing Lan Yao Duan +4 位作者 Chenyi Liu Chenyang Zhu Yueshan Xiong Hui Huang Kai Xu 《Computational Visual Media》 SCIE EI CSCD 2022年第3期395-414,共20页
Relation contexts have been proved to be useful for many challenging vision tasks.In the field of3D object detection,previous methods have been taking the advantage of context encoding,graph embedding,or explicit rela... Relation contexts have been proved to be useful for many challenging vision tasks.In the field of3D object detection,previous methods have been taking the advantage of context encoding,graph embedding,or explicit relation reasoning to extract relation contexts.However,there exist inevitably redundant relation contexts due to noisy or low-quality proposals.In fact,invalid relation contexts usually indicate underlying scene misunderstanding and ambiguity,which may,on the contrary,reduce the performance in complex scenes.Inspired by recent attention mechanism like Transformer,we propose a novel 3D attention-based relation module(ARM3D).It encompasses objectaware relation reasoning to extract pair-wise relation contexts among qualified proposals and an attention module to distribute attention weights towards different relation contexts.In this way,ARM3D can take full advantage of the useful relation contexts and filter those less relevant or even confusing contexts,which mitigates the ambiguity in detection.We have evaluated the effectiveness of ARM3D by plugging it into several state-of-the-art 3D object detectors and showing more accurate and robust detection results.Extensive experiments show the capability and generalization of ARM3D on 3D object detection.Our source code is available at https://github.com/lanlan96/ARM3D. 展开更多
关键词 attention mechanism scene understanding relational reasoning 3d indoor object detection
原文传递
SUNet++: A Deep Network with Channel Attention for Small-Scale Object Segmentation on 3D Medical Images 被引量:2
18
作者 Lan Zhang Kejia Zhang Haiwei Pan 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2023年第4期628-638,共11页
As a deep learning network with an encoder-decoder architecture,UNet and its series of improved versions have been widely used in medical image segmentation with great applications.However,when used to segment targets... As a deep learning network with an encoder-decoder architecture,UNet and its series of improved versions have been widely used in medical image segmentation with great applications.However,when used to segment targets in 3D medical images such as magnetic resonance imaging(MRI),computed tomography(CT),these models do not model the relevance of images in vertical space,resulting in poor accurate analysis of consecutive slices of the same patient.On the other hand,the large amount of detail lost during the encoding process makes these models incapable of segmenting small-scale tumor targets.Aiming at the scene of small-scale target segmentation in 3D medical images,a fully new neural network model SUNet++is proposed on the basis of UNet and UNet++.SUNet++improves the existing models mainly in three aspects:1)the modeling strategy of slice superposition is used to thoroughly excavate the three dimensional information of the data;2)by adding an attention mechanism during the decoding process,small scale targets in the picture are retained and amplified;3)in the up-sampling process,the transposed convolution operation is used to further enhance the effect of the model.In order to verify the effect of the model,we collected and produced a dataset of hyperintensity MRI liver-stage images containing over 400 cases of liver nodules.Experimental results on both public and proprietary datasets demonstrate the superiority of SUNet++in small-scale target segmentation of three-dimensional medical images. 展开更多
关键词 3d medical images small-scale target SEGMENTATION attention mechanism
原文传递
A Deep Double-Channel Dense Network for Hyperspectral Image Classifica-tion 被引量:15
19
作者 Kexian WANG Shunyi ZHENG +1 位作者 Rui LI Li GUI 《Journal of Geodesy and Geoinformation Science》 2021年第4期46-62,共17页
Hyperspectral Image(HSI)classification based on deep learning has been an attractive area in recent years.However,as a kind of data-driven algorithm,the deep learning method usually requires numerous computational res... Hyperspectral Image(HSI)classification based on deep learning has been an attractive area in recent years.However,as a kind of data-driven algorithm,the deep learning method usually requires numerous computational resources and high-quality labelled datasets,while the expenditures of high-performance computing and data annotation are expensive.In this paper,to reduce the dependence on massive calculation and labelled samples,we propose a deep Double-Channel dense network(DDCD)for Hyperspectral Image Classification.Specifically,we design a 3D Double-Channel dense layer to capture the local and global features of the input.And we propose a Linear Attention Mechanism that is approximate to dot-product attention with much less memory and computational costs.The number of parameters and the consumptions of calculation are observably less than contrapositive deep learning methods,which means DDCD owns simpler architecture and higher efficiency.A series of quantitative experiences on 6 widely used hyperspectral datasets show that the proposed DDCD obtains state-of-the-art performance,even though when the absence of labelled samples is severe. 展开更多
关键词 3d double-Channel dense layer Linear attention Mechanism deep Learning(dL) hyperspectral classification
下载PDF
Recurrent 3D attentional networks for end-to-end active object recognition
20
作者 Min Liu Yifei Shi +3 位作者 Lintao Zheng Kai Xu Hui Huang Dinesh Manocha 《Computational Visual Media》 CSCD 2019年第1期91-103,共13页
Active vision is inherently attention-driven:an agent actively selects views to attend in order to rapidly perform a vision task while improving its internal representation of the scene being observed.Inspired by the ... Active vision is inherently attention-driven:an agent actively selects views to attend in order to rapidly perform a vision task while improving its internal representation of the scene being observed.Inspired by the recent success of attention-based models in 2D vision tasks based on single RGB images, we address multi-view depth-based active object recognition using an attention mechanism, by use of an end-to-end recurrent 3D attentional network. The architecture takes advantage of a recurrent neural network to store and update an internal representation. Our model,trained with 3D shape datasets, is able to iteratively attend the best views targeting an object of interest for recognizing it. To realize 3D view selection, we derive a 3D spatial transformer network. It is dierentiable,allowing training with backpropagation, and so achieving much faster convergence than the reinforcement learning employed by most existing attention-based models. Experiments show that our method, with only depth input, achieves state-of-the-art next-best-view performance both in terms of time taken and recognition accuracy. 展开更多
关键词 active object RECOGNITION RECURRENT NEURAL network next-best-view 3d attention
原文传递
上一页 1 2 下一页 到第
使用帮助 返回顶部