期刊文献+
共找到15,618篇文章
< 1 2 250 >
每页显示 20 50 100
Enhancing Dense Small Object Detection in UAV Images Based on Hybrid Transformer 被引量:1
1
作者 Changfeng Feng Chunping Wang +2 位作者 Dongdong Zhang Renke Kou Qiang Fu 《Computers, Materials & Continua》 SCIE EI 2024年第3期3993-4013,共21页
Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unman... Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unmanned aerial vehicle(UAV)imagery.Addressing these limitations,we propose a hybrid transformer-based detector,H-DETR,and enhance it for dense small objects,leading to an accurate and efficient model.Firstly,we introduce a hybrid transformer encoder,which integrates a convolutional neural network-based cross-scale fusion module with the original encoder to handle multi-scale feature sequences more efficiently.Furthermore,we propose two novel strategies to enhance detection performance without incurring additional inference computation.Query filter is designed to cope with the dense clustering inherent in drone-captured images by counteracting similar queries with a training-aware non-maximum suppression.Adversarial denoising learning is a novel enhancement method inspired by adversarial learning,which improves the detection of numerous small targets by counteracting the effects of artificial spatial and semantic noise.Extensive experiments on the VisDrone and UAVDT datasets substantiate the effectiveness of our approach,achieving a significant improvement in accuracy with a reduction in computational complexity.Our method achieves 31.9%and 21.1%AP on the VisDrone and UAVDT datasets,respectively,and has a faster inference speed,making it a competitive model in UAV image object detection. 展开更多
关键词 UAV images transformER dense small object detection
下载PDF
Transformer-Based Cloud Detection Method for High-Resolution Remote Sensing Imagery
2
作者 Haotang Tan Song Sun +1 位作者 Tian Cheng Xiyuan Shu 《Computers, Materials & Continua》 SCIE EI 2024年第7期661-678,共18页
Cloud detection from satellite and drone imagery is crucial for applications such as weather forecasting and environmentalmonitoring.Addressing the limitations of conventional convolutional neural networks,we propose ... Cloud detection from satellite and drone imagery is crucial for applications such as weather forecasting and environmentalmonitoring.Addressing the limitations of conventional convolutional neural networks,we propose an innovative transformer-based method.This method leverages transformers,which are adept at processing data sequences,to enhance cloud detection accuracy.Additionally,we introduce a Cyclic Refinement Architecture that improves the resolution and quality of feature extraction,thereby aiding in the retention of critical details often lost during cloud detection.Our extensive experimental validation shows that our approach significantly outperforms established models,excelling in high-resolution feature extraction and precise cloud segmentation.By integrating Positional Visual Transformers(PVT)with this architecture,our method advances high-resolution feature delineation and segmentation accuracy.Ultimately,our research offers a novel perspective for surmounting traditional challenges in cloud detection and contributes to the advancement of precise and dependable image analysis across various domains. 展开更多
关键词 CLOUD transformER image segmentation remotely sensed imagery pyramid vision transformer
下载PDF
Efficient single-pixel imaging encrypted transmission based on 3D Arnold transformation
3
作者 梁振宇 王朝瑾 +4 位作者 王阳阳 高皓琪 朱东涛 许颢砾 杨星 《Chinese Physics B》 SCIE EI CAS CSCD 2024年第3期378-386,共9页
Single-pixel imaging(SPI)can transform 2D or 3D image data into 1D light signals,which offers promising prospects for image compression and transmission.However,during data communication these light signals in public ... Single-pixel imaging(SPI)can transform 2D or 3D image data into 1D light signals,which offers promising prospects for image compression and transmission.However,during data communication these light signals in public channels will easily draw the attention of eavesdroppers.Here,we introduce an efficient encryption method for SPI data transmission that uses the 3D Arnold transformation to directly disrupt 1D single-pixel light signals and utilizes the elliptic curve encryption algorithm for key transmission.This encryption scheme immediately employs Hadamard patterns to illuminate the scene and then utilizes the 3D Arnold transformation to permutate the 1D light signal of single-pixel detection.Then the transformation parameters serve as the secret key,while the security of key exchange is guaranteed by an elliptic curve-based key exchange mechanism.Compared with existing encryption schemes,both computer simulations and optical experiments have been conducted to demonstrate that the proposed technique not only enhances the security of encryption but also eliminates the need for complicated pattern scrambling rules.Additionally,this approach solves the problem of secure key transmission,thus ensuring the security of information and the quality of the decrypted images. 展开更多
关键词 single-pixel imaging 3D Arnold transformation elliptic curve encryption image encryption
下载PDF
Triple-path feature transform network for ring-array photoacoustic tomography image reconstruction
4
作者 Lingyu Ma Zezheng Qin +1 位作者 Yiming Ma Mingjian Sun 《Journal of Innovative Optical Health Sciences》 SCIE EI CSCD 2024年第3期23-40,共18页
Photoacoustic imaging(PAI)is a noninvasive emerging imaging method based on the photoacoustic effect,which provides necessary assistance for medical diagnosis.It has the characteristics of large imaging depth and high... Photoacoustic imaging(PAI)is a noninvasive emerging imaging method based on the photoacoustic effect,which provides necessary assistance for medical diagnosis.It has the characteristics of large imaging depth and high contrast.However,limited by the equipment cost and reconstruction time requirements,the existing PAI systems distributed with annular array transducers are difficult to take into account both the image quality and the imaging speed.In this paper,a triple-path feature transform network(TFT-Net)for ring-array photoacoustic tomography is proposed to enhance the imaging quality from limited-view and sparse measurement data.Specifically,the network combines the raw photoacoustic pressure signals and conventional linear reconstruction images as input data,and takes the photoacoustic physical model as a prior information to guide the reconstruction process.In addition,to enhance the ability of extracting signal features,the residual block and squeeze and excitation block are introduced into the TFT-Net.For further efficient reconstruction,the final output of photoacoustic signals uses‘filter-then-upsample’operation with a pixel-shuffle multiplexer and a max out module.Experiment results on simulated and in-vivo data demonstrate that the constructed TFT-Net can restore the target boundary clearly,reduce background noise,and realize fast and high-quality photoacoustic image reconstruction of limited view with sparse sampling. 展开更多
关键词 Deep learning feature transformation image reconstruction limited-view measurement photoacoustic tomography.
下载PDF
Multiscale Fusion Transformer Network for Hyperspectral Image Classification
5
作者 Yuquan Gan Hao Zhang Chen Yi 《Journal of Beijing Institute of Technology》 EI CAS 2024年第3期255-270,共16页
Convolutional neural network(CNN)has excellent ability to model locally contextual information.However,CNNs face challenges for descripting long-range semantic features,which will lead to relatively low classification... Convolutional neural network(CNN)has excellent ability to model locally contextual information.However,CNNs face challenges for descripting long-range semantic features,which will lead to relatively low classification accuracy of hyperspectral images.To address this problem,this article proposes an algorithm based on multiscale fusion and transformer network for hyperspectral image classification.Firstly,the low-level spatial-spectral features are extracted by multi-scale residual structure.Secondly,an attention module is introduced to focus on the more important spatialspectral information.Finally,high-level semantic features are represented and learned by a token learner and an improved transformer encoder.The proposed algorithm is compared with six classical hyperspectral classification algorithms on real hyperspectral images.The experimental results show that the proposed algorithm effectively improves the land cover classification accuracy of hyperspectral images. 展开更多
关键词 hyperspectral image land cover classification MULTI-SCALE transformER
下载PDF
Integrating Transformer and Bidirectional Long Short-Term Memory for Intelligent Breast Cancer Detection from Histopathology Biopsy Images
6
作者 Prasanalakshmi Balaji Omar Alqahtani +2 位作者 Sangita Babu Mousmi Ajay Chaurasia Shanmugapriya Prakasam 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第10期443-458,共16页
Breast cancer is a significant threat to the global population,affecting not only women but also a threat to the entire population.With recent advancements in digital pathology,Eosin and hematoxylin images provide enh... Breast cancer is a significant threat to the global population,affecting not only women but also a threat to the entire population.With recent advancements in digital pathology,Eosin and hematoxylin images provide enhanced clarity in examiningmicroscopic features of breast tissues based on their staining properties.Early cancer detection facilitates the quickening of the therapeutic process,thereby increasing survival rates.The analysis made by medical professionals,especially pathologists,is time-consuming and challenging,and there arises a need for automated breast cancer detection systems.The upcoming artificial intelligence platforms,especially deep learning models,play an important role in image diagnosis and prediction.Initially,the histopathology biopsy images are taken from standard data sources.Further,the gathered images are given as input to the Multi-Scale Dilated Vision Transformer,where the essential features are acquired.Subsequently,the features are subjected to the Bidirectional Long Short-Term Memory(Bi-LSTM)for classifying the breast cancer disorder.The efficacy of the model is evaluated using divergent metrics.When compared with other methods,the proposed work reveals that it offers impressive results for detection. 展开更多
关键词 Bidirectional long short-term memory breast cancer detection feature extraction histopathology biopsy images multi-scale dilated vision transformer
下载PDF
Pre-training transformer with dual-branch context content module for table detection in document images
7
作者 Yongzhi LI Pengle ZHANG +2 位作者 Meng SUN Jin HUANG Ruhan HE 《虚拟现实与智能硬件(中英文)》 EI 2024年第5期408-420,共13页
Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such... Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such as information extraction.However,because of the diversity in the shapes and sizes of tables,existing table detection methods adapted from general object detection algorithms,have not yet achieved satisfactory results.Incorrect detection results might lead to the loss of critical information.Methods Therefore,we propose a novel end-to-end trainable deep network combined with a self-supervised pretraining transformer for feature extraction to minimize incorrect detections.To better deal with table areas of different shapes and sizes,we added a dualbranch context content attention module(DCCAM)to high-dimensional features to extract context content information,thereby enhancing the network's ability to learn shape features.For feature fusion at different scales,we replaced the original 3×3 convolution with a multilayer residual module,which contains enhanced gradient flow information to improve the feature representation and extraction capability.Results We evaluated our method on public document datasets and compared it with previous methods,which achieved state-of-the-art results in terms of evaluation metrics such as recall and F1-score.https://github.com/Yong Z-Lee/TD-DCCAM. 展开更多
关键词 Table detection Document image analysis transformER Dilated convolution Deformable convolution Feature fusion
下载PDF
Research on PolSAR Image Classification Method Based on Vision Transformer Considering Local Information
8
作者 Mingxia Zhang Aichun Wang +2 位作者 Xiaozheng Du Xinmeng Wang Yu Wu 《Journal of Computer and Communications》 2024年第9期22-38,共17页
In response to the problem of inadequate utilization of local information in PolSAR image classification using Vision Transformer in existing studies, this paper proposes a Vision Transformer method considering local ... In response to the problem of inadequate utilization of local information in PolSAR image classification using Vision Transformer in existing studies, this paper proposes a Vision Transformer method considering local information, LIViT. The method replaces image patch sequence with polarimetric feature sequence in the feature embedding, and uses convolution for mapping to preserve image spatial detail information. On the other hand, the addition of the wavelet transform branch enables the network to pay more attention to the shape and edge information of the feature target and improves the extraction of local edge information. The results in Wuhan, China and Flevoland, Netherlands show that considering local information when using Vision Transformer for PolSAR image classification effectively improves the image classification accuracy and shows better advantages in PolSAR image classification. 展开更多
关键词 Vision transformer POLSAR image Classification LIViT
下载PDF
ATFF: Advanced Transformer with Multiscale Contextual Fusion for Medical Image Segmentation
9
作者 Xinping Guo Lei Wang +2 位作者 Zizhen Huang Yukun Zhang Yaolong Han 《Journal of Computer and Communications》 2024年第3期238-251,共14页
Deep convolutional neural network (CNN) greatly promotes the automatic segmentation of medical images. However, due to the inherent properties of convolution operations, CNN usually cannot establish long-distance inte... Deep convolutional neural network (CNN) greatly promotes the automatic segmentation of medical images. However, due to the inherent properties of convolution operations, CNN usually cannot establish long-distance interdependence, which limits the segmentation performance. Transformer has been successfully applied to various computer vision, using self-attention mechanism to simulate long-distance interaction, so as to capture global information. However, self-attention lacks spatial location and high-performance computing. In order to solve the above problems, we develop a new medical transformer, which has a multi-scale context fusion function and can be used for medical image segmentation. The proposed model combines convolution operation and attention mechanism to form a u-shaped framework, which can capture both local and global information. First, the traditional converter module is improved to an advanced converter module, which uses post-layer normalization to obtain mild activation values, and uses scaled cosine attention with a moving window to obtain accurate spatial information. Secondly, we also introduce a deep supervision strategy to guide the model to fuse multi-scale feature information. It further enables the proposed model to effectively propagate feature information across layers, Thanks to this, it can achieve better segmentation performance while being more robust and efficient. The proposed model is evaluated on multiple medical image segmentation datasets. Experimental results demonstrate that the proposed model achieves better performance on a challenging dataset (ETIS) compared to existing methods that rely only on convolutional neural networks, transformers, or a combination of both. The mDice and mIou indicators increased by 2.74% and 3.3% respectively. 展开更多
关键词 Medical image Segmentation Advanced transformer Deep Supervision Attention Mechanism
下载PDF
A Swin Transformer and Residualnetwork Combined Model for Breast Cancer Disease Multi-Classification Using Histopathological Images
10
作者 Jianjun Zhuang Xiaohui Wu +1 位作者 Dongdong Meng Shenghua Jing 《Instrumentation》 2024年第1期112-120,共9页
Breast cancer has become a killer of women's health nowadays.In order to exploit the potential representational capabilities of the models more comprehensively,we propose a multi-model fusion strategy.Specifically... Breast cancer has become a killer of women's health nowadays.In order to exploit the potential representational capabilities of the models more comprehensively,we propose a multi-model fusion strategy.Specifically,we combine two differently structured deep learning models,ResNet101 and Swin Transformer(SwinT),with the addition of the Convolutional Block Attention Module(CBAM)attention mechanism,which makes full use of SwinT's global context information modeling ability and ResNet101's local feature extraction ability,and additionally the cross entropy loss function is replaced by the focus loss function to solve the problem of unbalanced allocation of breast cancer data sets.The multi-classification recognition accuracies of the proposed fusion model under 40X,100X,200X and 400X BreakHis datasets are 97.50%,96.60%,96.30 and 96.10%,respectively.Compared with a single SwinT model and ResNet 101 model,the fusion model has higher accuracy and better generalization ability,which provides a more effective method for screening,diagnosis and pathological classification of female breast cancer. 展开更多
关键词 breast cancer pathological image swin transformer ResNet101 focal loss
下载PDF
An improved non-uniform fast Fourier transform method for radio imaging of coronal mass ejections
11
作者 Weidan Zhang Bing Wang +3 位作者 Zhao Wu Shuwang Chang Yao Chen Fabao Yan 《Astronomical Techniques and Instruments》 CSCD 2024年第2期117-127,共11页
Radioheliographs can obtain solar images at high temporal and spatial resolution,with a high dynamic range.These are among the most important instruments for studying solar radio bursts,understanding solar eruption ev... Radioheliographs can obtain solar images at high temporal and spatial resolution,with a high dynamic range.These are among the most important instruments for studying solar radio bursts,understanding solar eruption events,and conducting space weather forecasting.This study aims to explore the effective use of radioheliographs for solar observations,specifically for imaging coronal mass ejections(CME),to track their evolution and provide space weather warnings.We have developed an imaging simulation program based on the principle of aperture synthesis imaging,covering the entire data processing flow from antenna configuration to dirty map generation.For grid processing,we propose an improved non-uniform fast Fourier transform(NUFFT)method to provide superior image quality.Using simulated imaging of radio coronal mass ejections,we provide practical recommendations for the performance of radioheliographs.This study provides important support for the validation and calibration of radioheliograph data processing,and is expected to profoundly enhance our understanding of solar activities. 展开更多
关键词 Radio interference GRIDDING imagING Non-uniform fast Fourier transform
下载PDF
Image Retrieval Based on Vision Transformer and Masked Learning 被引量:5
12
作者 李锋 潘煌圣 +1 位作者 盛守祥 王国栋 《Journal of Donghua University(English Edition)》 CAS 2023年第5期539-547,共9页
Deep convolutional neural networks(DCNNs)are widely used in content-based image retrieval(CBIR)because of the advantages in image feature extraction.However,the training of deep neural networks requires a large number... Deep convolutional neural networks(DCNNs)are widely used in content-based image retrieval(CBIR)because of the advantages in image feature extraction.However,the training of deep neural networks requires a large number of labeled data,which limits the application.Self-supervised learning is a more general approach in unlabeled scenarios.A method of fine-tuning feature extraction networks based on masked learning is proposed.Masked autoencoders(MAE)are used in the fine-tune vision transformer(ViT)model.In addition,the scheme of extracting image descriptors is discussed.The encoder of the MAE uses the ViT to extract global features and performs self-supervised fine-tuning by reconstructing masked area pixels.The method works well on category-level image retrieval datasets with marked improvements in instance-level datasets.For the instance-level datasets Oxford5k and Paris6k,the retrieval accuracy of the base model is improved by 7%and 17%compared to that of the original model,respectively. 展开更多
关键词 content-based image retrieval vision transformer masked autoencoder feature extraction
下载PDF
CFM-UNet:A Joint CNN and Transformer Network via Cross Feature Modulation for Remote Sensing Images Segmentation 被引量:3
13
作者 Min WANG Peidong WANG 《Journal of Geodesy and Geoinformation Science》 CSCD 2023年第4期40-47,共8页
The semantic segmentation methods based on CNN have made great progress,but there are still some shortcomings in the application of remote sensing images segmentation,such as the small receptive field can not effectiv... The semantic segmentation methods based on CNN have made great progress,but there are still some shortcomings in the application of remote sensing images segmentation,such as the small receptive field can not effectively capture global context.In order to solve this problem,this paper proposes a hybrid model based on ResNet50 and swin transformer to directly capture long-range dependence,which fuses features through Cross Feature Modulation Module(CFMM).Experimental results on two publicly available datasets,Vaihingen and Potsdam,are mIoU of 70.27%and 76.63%,respectively.Thus,CFM-UNet can maintain a high segmentation performance compared with other competitive networks. 展开更多
关键词 remote sensing images semantic segmentation swin transformer feature modulation module
下载PDF
Cerebrovascular segmentation from mesoscopic optical images using Swin Transformer 被引量:1
14
作者 Yuxin Li Qianlong Zhang +3 位作者 Hang Zhou Junhuai Li Xiangning Li Anan Li 《Journal of Innovative Optical Health Sciences》 SCIE EI CSCD 2023年第4期120-133,共14页
Vascular segmentation is a crucial task in biomedical image processing,which is significant for analyzing and modeling vascular networks under physiological and pathological states.With advances in fluorescent labelin... Vascular segmentation is a crucial task in biomedical image processing,which is significant for analyzing and modeling vascular networks under physiological and pathological states.With advances in fluorescent labeling and mesoscopic optical techniques,it has become possible to map the whole-mouse-brain vascular networks at capillary resolution.However,segmenting vessels from mesoscopic optical images is a challenging task.The problems,such as vascular signal discontinuities,vessel lumens,and background fluorescence signals in mesoscopic optical images,belong to global semantic information during vascular segmentation.Traditional vascular segmentation methods based on convolutional neural networks(CNNs)have been limited by their insufficient receptive fields,making it challenging to capture global semantic information of vessels and resulting in inaccurate segmentation results.Here,we propose SegVesseler,a vascular segmentation method based on Swin Transformer.SegVesseler adopts 3D Swin Transformer blocks to extract global contextual information in 3D images.This approach is able to maintain the connectivity and topology of blood vessels during segmentation.We evaluated the performance of our method on mouse cerebrovascular datasets generated from three different labeling and imaging modalities.The experimental results demonstrate that the segmentation effect of our method is significantly better than traditional CNNs and achieves state-of-the-art performance. 展开更多
关键词 Vascular segmentation Swin transformer mesoscopic optical imaging fMOST
下载PDF
Single Image Desnow Based on Vision Transformer and Conditional Generative Adversarial Network for Internet of Vehicles 被引量:1
15
作者 Bingcai Wei Di Wang +1 位作者 Zhuang Wang Liye Zhang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第11期1975-1988,共14页
With the increasing popularity of artificial intelligence applications,machine learning is also playing an increasingly important role in the Internet of Things(IoT)and the Internet of Vehicles(IoV).As an essential pa... With the increasing popularity of artificial intelligence applications,machine learning is also playing an increasingly important role in the Internet of Things(IoT)and the Internet of Vehicles(IoV).As an essential part of the IoV,smart transportation relies heavily on information obtained from images.However,inclement weather,such as snowy weather,negatively impacts the process and can hinder the regular operation of imaging equipment and the acquisition of conventional image information.Not only that,but the snow also makes intelligent transportation systems make the wrong judgment of road conditions and the entire system of the Internet of Vehicles adverse.This paper describes the single image snowremoval task and the use of a vision transformer to generate adversarial networks.The residual structure is used in the algorithm,and the Transformer structure is used in the network structure of the generator in the generative adversarial networks,which improves the accuracy of the snow removal task.Moreover,the vision transformer has good scalability and versatility for larger models and has a more vital fitting ability than the previously popular convolutional neural networks.The Snow100K dataset is used for training,testing and comparison,and the peak signal-to-noise ratio and structural similarity are used as evaluation indicators.The experimental results show that the improved snow removal algorithm performs well and can obtain high-quality snow removal images. 展开更多
关键词 Artificial intelligence Internet of Things vision transformer deep learning image desnow
下载PDF
Iris Liveness Detection Using Fragmental Energy of Haar Transformed Iris Images Using Ensemble of Machine Learning Classifiers
16
作者 Smita Khade Shilpa Gite +2 位作者 Sudeep D.Thepade Biswajeet Pradhan Abdullah Alamri 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第7期323-345,共23页
Contactless verification is possible with iris biometric identification,which helps prevent infections like COVID-19 from spreading.Biometric systems have grown unsteady and dangerous as a result of spoofing assaults ... Contactless verification is possible with iris biometric identification,which helps prevent infections like COVID-19 from spreading.Biometric systems have grown unsteady and dangerous as a result of spoofing assaults employing contact lenses,replayed the video,and print attacks.The work demonstrates an iris liveness detection approach by utilizing fragmental coefficients of Haar transformed Iris images as signatures to prevent spoofing attacks for the very first time in the identification of iris liveness.Seven assorted feature creation ways are studied in the presented solutions,and these created features are explored for the training of eight distinct machine learning classifiers and ensembles.The predicted iris liveness identification variants are evaluated using recall,F-measure,precision,accuracy,APCER,BPCER,and ACER.Three standard datasets were used in the investigation.The main contribution of our study is achieving a good accuracy of 99.18%with a smaller feature vector.The fragmental coefficients of Haar transformed iris image of size 8∗8 utilizing random forest algorithm showed superior iris liveness detection with reduced featured vector size(64 features).Random forest gave 99.18%accuracy.Additionally,conduct an extensive experiment on cross datasets for detailed analysis.The results of our experiments showthat the iris biometric template is decreased in size tomake the proposed framework suitable for algorithmic verification in real-time environments and settings. 展开更多
关键词 Iris images liveness identification Haar transform machine learning BIOMETRIC feature formation ensemble model
下载PDF
TC-Fuse: A Transformers Fusing CNNs Network for Medical Image Segmentation
17
作者 Peng Geng Ji Lu +3 位作者 Ying Zhang Simin Ma Zhanzhong Tang Jianhua Liu 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第11期2001-2023,共23页
In medical image segmentation task,convolutional neural networks(CNNs)are difficult to capture long-range dependencies,but transformers can model the long-range dependencies effectively.However,transformers have a fle... In medical image segmentation task,convolutional neural networks(CNNs)are difficult to capture long-range dependencies,but transformers can model the long-range dependencies effectively.However,transformers have a flexible structure and seldom assume the structural bias of input data,so it is difficult for transformers to learn positional encoding of the medical images when using fewer images for training.To solve these problems,a dual branch structure is proposed.In one branch,Mix-Feed-Forward Network(Mix-FFN)and axial attention are adopted to capture long-range dependencies and keep the translation invariance of the model.Mix-FFN whose depth-wise convolutions can provide position information is better than ordinary positional encoding.In the other branch,traditional convolutional neural networks(CNNs)are used to extract different features of fewer medical images.In addition,the attention fusion module BiFusion is used to effectively integrate the information from the CNN branch and Transformer branch,and the fused features can effectively capture the global and local context of the current spatial resolution.On the public standard datasets Gland Segmentation(GlaS),Colorectal adenocarcinoma gland(CRAG)and COVID-19 CT Images Segmentation,the F1-score,Intersection over Union(IoU)and parameters of the proposed TC-Fuse are superior to those by Axial Attention U-Net,U-Net,Medical Transformer and other methods.And F1-score increased respectively by 2.99%,3.42%and 3.95%compared with Medical Transformer. 展开更多
关键词 transformERS convolutional neural networks fusion medical image segmentation axial attention
下载PDF
基于Transformer和自适应特征融合的矿井低照度图像亮度提升和细节增强方法 被引量:1
18
作者 田子建 吴佳奇 +4 位作者 张文琪 陈伟 周涛 杨伟 王帅 《煤炭科学技术》 EI CAS CSCD 北大核心 2024年第1期297-310,共14页
高质量矿井影像为矿山安全生产提供保障,也有利于提高后续图像分析技术的性能。矿井影像受低照度环境的影响,易出现亮度低,照度不均,颜色失真,细节信息丢失严重等问题。针对上述问题,提出一种基于Transformer和自适应特征融合的矿井低... 高质量矿井影像为矿山安全生产提供保障,也有利于提高后续图像分析技术的性能。矿井影像受低照度环境的影响,易出现亮度低,照度不均,颜色失真,细节信息丢失严重等问题。针对上述问题,提出一种基于Transformer和自适应特征融合的矿井低照度图像亮度提升和细节增强方法。基于生成对抗思想搭建生成对抗式主体模型框架,使用目标图像域而非单一参考图像驱动判别器监督生成器的训练,实现对低照度图像的充分增强;基于特征表示学习理论搭建特征编码器,将图像解耦为亮度分量和反射分量,避免图像增强过程中亮度与颜色特征相互影响从而导致颜色失真问题;设计CEM-Transformer Encoder通过捕获全局上下文关系和提取局部区域特征,能够充分提升整体图像亮度并消除局部区域照度不均;在反射分量增强过程中,使用结合CEM-Cross-Transformer Encoder的跳跃连接将低级特征与深层网络处特征进行自适应融合,能够有效避免细节特征丢失,并在编码网络中添加ECA-Net,提高浅层网络的特征提取效率。制作矿井低照度图像数据集为矿井低照度图像增强任务提供数据资源。试验显示,在矿井低照度图像数据集和公共数据集中,与5种先进的低照度图像增强算法相比,该算法增强图像的质量指标PSNR、SSIM、VIF平均提高了16.564%,10.998%,16.226%和14.438%,10.888%,14.948%,证明该算法能够有效提升整体图像亮度,消除照度不均,避免颜色失真和细节丢失,实现矿井低照度图像增强。 展开更多
关键词 图像增强 图像识别 生成对抗网络 特征解耦 transformER
下载PDF
基于Depth-wise卷积和视觉Transformer的图像分类模型 被引量:2
19
作者 张峰 黄仕鑫 +1 位作者 花强 董春茹 《计算机科学》 CSCD 北大核心 2024年第2期196-204,共9页
图像分类作为一种常见的视觉识别任务,有着广阔的应用场景。在处理图像分类问题时,传统的方法通常使用卷积神经网络,然而,卷积网络的感受野有限,难以建模图像的全局关系表示,导致分类精度低,难以处理复杂多样的图像数据。为了对全局关... 图像分类作为一种常见的视觉识别任务,有着广阔的应用场景。在处理图像分类问题时,传统的方法通常使用卷积神经网络,然而,卷积网络的感受野有限,难以建模图像的全局关系表示,导致分类精度低,难以处理复杂多样的图像数据。为了对全局关系进行建模,一些研究者将Transformer应用于图像分类任务,但为了满足Transformer的序列化和并行化要求,需要将图像分割成大小相等、互不重叠的图像块,破坏了相邻图像数据块之间的局部信息。此外,由于Transformer具有较少的先验知识,模型往往需要在大规模数据集上进行预训练,因此计算复杂度较高。为了同时建模图像相邻块之间的局部信息并充分利用图像的全局信息,提出了一种基于Depth-wise卷积的视觉Transformer(Efficient Pyramid Vision Transformer,EPVT)模型。EPVT模型可以实现以较低的计算成本提取相邻图像块之间的局部和全局信息。EPVT模型主要包含3个关键组件:局部感知模块(Local Perceptron Module,LPM)、空间信息融合模块(Spatial Information Fusion,SIF)和“+卷积前馈神经网络(Convolution Feed-forward Network,CFFN)。LPM模块用于捕获图像的局部相关性;SIF模块用于融合相邻图像块之间的局部信息,并利用不同图像块之间的远距离依赖关系,提升模型的特征表达能力,使模型学习到输出特征在不同维度下的语义信息;CFFN模块用于编码位置信息和重塑张量。在图像分类数据集ImageNet-1K上,所提模型优于现有的同等规模的视觉Transformer分类模型,取得了82.6%的分类准确度,证明了该模型在大规模数据集上具有竞争力。 展开更多
关键词 深度学习 图像分类 Depth-wise卷积 视觉transformer 注意力机制
下载PDF
基于Transformer和动态3D卷积的多源遥感图像分类 被引量:1
20
作者 高峰 孟德森 +2 位作者 解正源 亓林 董军宇 《北京航空航天大学学报》 EI CAS CSCD 北大核心 2024年第2期606-614,共9页
多源遥感数据具有互补性和协同性,近年来,基于深度学习的方法已经在多源遥感图像分类中取得了一定进展,但当前方法仍面临关键难题,如多源遥感图像特征表达不一致,融合困难,基于静态推理范式的神经网络缺乏对不同类别地物的适应性。为解... 多源遥感数据具有互补性和协同性,近年来,基于深度学习的方法已经在多源遥感图像分类中取得了一定进展,但当前方法仍面临关键难题,如多源遥感图像特征表达不一致,融合困难,基于静态推理范式的神经网络缺乏对不同类别地物的适应性。为解决上述问题,提出了基于跨模态Transformer和多尺度动态3D卷积的多源遥感图像分类模型。为提高多源特征表达的一致性,设计了基于Transformer的融合模块,借助其强大的注意力建模能力挖掘高光谱和LiDAR数据特征之间的相互作用;为提高特征提取方法对不同地物类别的适应性,设计了多尺度动态3D卷积模块,将输入特征的多尺度信息融入卷积核的调制,提高卷积操作对不同地物的适应性。采用多源遥感数据集Houston和Trento对所提方法进行验证,实验结果表明:所提方法在Houston和Trento数据集上总体准确率分别达到94.60%和98.21%,相比MGA-MFN等主流方法,总体准确率分别至少提升0.97%和0.25%,验证了所提方法可有效提升多源遥感图像分类的准确率。 展开更多
关键词 高光谱图像 激光雷达 transformER 多源特征融合 动态卷积
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部