Transformer models have emerged as dominant networks for various tasks in computer vision compared to Convolutional Neural Networks(CNNs).The transformers demonstrate the ability to model long-range dependencies by ut...Transformer models have emerged as dominant networks for various tasks in computer vision compared to Convolutional Neural Networks(CNNs).The transformers demonstrate the ability to model long-range dependencies by utilizing a self-attention mechanism.This study aims to provide a comprehensive survey of recent transformerbased approaches in image and video applications,as well as diffusion models.We begin by discussing existing surveys of vision transformers and comparing them to this work.Then,we review the main components of a vanilla transformer network,including the self-attention mechanism,feed-forward network,position encoding,etc.In the main part of this survey,we review recent transformer-based models in three categories:Transformer for downstream tasks,Vision Transformer for Generation,and Vision Transformer for Segmentation.We also provide a comprehensive overview of recent transformer models for video tasks and diffusion models.We compare the performance of various hierarchical transformer networks for multiple tasks on popular benchmark datasets.Finally,we explore some future research directions to further improve the field.展开更多
Due to the limitations of a priori knowledge and convolution operation,the existing image restoration techniques cannot be directly applied to the cultural relics mural restoration,in order to more accurately restore ...Due to the limitations of a priori knowledge and convolution operation,the existing image restoration techniques cannot be directly applied to the cultural relics mural restoration,in order to more accurately restore the original appearance of the cultural relics mural images,an image restoration based on the denoising diffusion probability model(Denoising Diffusion Probability Model(DDPM))and the Transformer method.The process involves two steps:in the first step,the damaged mural image is firstly utilized as the condition to generate the noise image,using the time,condition and noise image patch as the inputs to the noise prediction network,capturing the global dependencies in the input sequence through the multi-attentionmechanismof the input sequence and feedforward neural network processing,and designing a long skip connection between the shallow and deep layers in the Transformer blocks between the shallow and deep layers using long skip connections to fuse the feature information of global and local outputs to maintain the overall consistency of the restoration results;In the second step,taking the noisy image as a condition to direct the diffusion model to back sample to generate the restored image.Experiment results show that the PSNR and SSIM of the proposedmethod are improved by 2%to 9%and 1%to 3.3%,respectively,which are compared to the comparison methods.This study proposed synthesizes the advantages of the diffusionmodel and deep learningmodel to make themural restoration results more accurate.展开更多
Wheat is a critical crop,extensively consumed worldwide,and its production enhancement is essential to meet escalating demand.The presence of diseases like stem rust,leaf rust,yellow rust,and tan spot significantly di...Wheat is a critical crop,extensively consumed worldwide,and its production enhancement is essential to meet escalating demand.The presence of diseases like stem rust,leaf rust,yellow rust,and tan spot significantly diminishes wheat yield,making the early and precise identification of these diseases vital for effective disease management.With advancements in deep learning algorithms,researchers have proposed many methods for the automated detection of disease pathogens;however,accurately detectingmultiple disease pathogens simultaneously remains a challenge.This challenge arises due to the scarcity of RGB images for multiple diseases,class imbalance in existing public datasets,and the difficulty in extracting features that discriminate between multiple classes of disease pathogens.In this research,a novel method is proposed based on Transfer Generative Adversarial Networks for augmenting existing data,thereby overcoming the problems of class imbalance and data scarcity.This study proposes a customized architecture of Vision Transformers(ViT),where the feature vector is obtained by concatenating features extracted from the custom ViT and Graph Neural Networks.This paper also proposes a Model AgnosticMeta Learning(MAML)based ensemble classifier for accurate classification.The proposedmodel,validated on public datasets for wheat disease pathogen classification,achieved a test accuracy of 99.20%and an F1-score of 97.95%.Compared with existing state-of-the-art methods,this proposed model outperforms in terms of accuracy,F1-score,and the number of disease pathogens detection.In future,more diseases can be included for detection along with some other modalities like pests and weed.展开更多
Transformers are normally designed and built for use at rated frequency and sinusoidal load current. A non-linear load on a transformer leads to harmonic power losses which cause increased operational costs and additi...Transformers are normally designed and built for use at rated frequency and sinusoidal load current. A non-linear load on a transformer leads to harmonic power losses which cause increased operational costs and additional heating in transformer parts. It leads to higher losses, early fatigue of insulation, premature failure and reduction of the useful life of the transformer. To prevent these problems, the rated capacity of transformer which supplies harmonic loads must be reduced. In this work, a typical 50 kVA three-phase distribution transformer with real practical parameters is taken under non-linear loads generated due to domestic loads. The core losses is evaluated using the three dimensional model of the transformer developed in FEM (finite element method) program based on valid model of transformer under high harmonic conditions. And finally a relation associated with core losses and amplitude of high harmonic order are reviewed & analyzed and then a comparison is being carried out on the results obtained by different excitation current in transformer windings.展开更多
基金supported in part by the National Natural Science Foundation of China under Grants 61502162,61702175,and 61772184in part by the Fund of the State Key Laboratory of Geo-information Engineering under Grant SKLGIE2016-M-4-2+4 种基金in part by the Hunan Natural Science Foundation of China under Grant 2018JJ2059in part by the Key R&D Project of Hunan Province of China under Grant 2018GK2014in part by the Open Fund of the State Key Laboratory of Integrated Services Networks under Grant ISN17-14Chinese Scholarship Council(CSC)through College of Computer Science and Electronic Engineering,Changsha,410082Hunan University with Grant CSC No.2018GXZ020784.
文摘Transformer models have emerged as dominant networks for various tasks in computer vision compared to Convolutional Neural Networks(CNNs).The transformers demonstrate the ability to model long-range dependencies by utilizing a self-attention mechanism.This study aims to provide a comprehensive survey of recent transformerbased approaches in image and video applications,as well as diffusion models.We begin by discussing existing surveys of vision transformers and comparing them to this work.Then,we review the main components of a vanilla transformer network,including the self-attention mechanism,feed-forward network,position encoding,etc.In the main part of this survey,we review recent transformer-based models in three categories:Transformer for downstream tasks,Vision Transformer for Generation,and Vision Transformer for Segmentation.We also provide a comprehensive overview of recent transformer models for video tasks and diffusion models.We compare the performance of various hierarchical transformer networks for multiple tasks on popular benchmark datasets.Finally,we explore some future research directions to further improve the field.
基金financial support from Hunan Provincial Natural Science and Technology Fund Project(Grant No.2022JJ50077)Natural Science Foundation of Hunan Province(Grant No.2024JJ8055).
文摘Due to the limitations of a priori knowledge and convolution operation,the existing image restoration techniques cannot be directly applied to the cultural relics mural restoration,in order to more accurately restore the original appearance of the cultural relics mural images,an image restoration based on the denoising diffusion probability model(Denoising Diffusion Probability Model(DDPM))and the Transformer method.The process involves two steps:in the first step,the damaged mural image is firstly utilized as the condition to generate the noise image,using the time,condition and noise image patch as the inputs to the noise prediction network,capturing the global dependencies in the input sequence through the multi-attentionmechanismof the input sequence and feedforward neural network processing,and designing a long skip connection between the shallow and deep layers in the Transformer blocks between the shallow and deep layers using long skip connections to fuse the feature information of global and local outputs to maintain the overall consistency of the restoration results;In the second step,taking the noisy image as a condition to direct the diffusion model to back sample to generate the restored image.Experiment results show that the PSNR and SSIM of the proposedmethod are improved by 2%to 9%and 1%to 3.3%,respectively,which are compared to the comparison methods.This study proposed synthesizes the advantages of the diffusionmodel and deep learningmodel to make themural restoration results more accurate.
基金Researchers Supporting Project Number(RSPD2024R 553),King Saud University,Riyadh,Saudi Arabia.
文摘Wheat is a critical crop,extensively consumed worldwide,and its production enhancement is essential to meet escalating demand.The presence of diseases like stem rust,leaf rust,yellow rust,and tan spot significantly diminishes wheat yield,making the early and precise identification of these diseases vital for effective disease management.With advancements in deep learning algorithms,researchers have proposed many methods for the automated detection of disease pathogens;however,accurately detectingmultiple disease pathogens simultaneously remains a challenge.This challenge arises due to the scarcity of RGB images for multiple diseases,class imbalance in existing public datasets,and the difficulty in extracting features that discriminate between multiple classes of disease pathogens.In this research,a novel method is proposed based on Transfer Generative Adversarial Networks for augmenting existing data,thereby overcoming the problems of class imbalance and data scarcity.This study proposes a customized architecture of Vision Transformers(ViT),where the feature vector is obtained by concatenating features extracted from the custom ViT and Graph Neural Networks.This paper also proposes a Model AgnosticMeta Learning(MAML)based ensemble classifier for accurate classification.The proposedmodel,validated on public datasets for wheat disease pathogen classification,achieved a test accuracy of 99.20%and an F1-score of 97.95%.Compared with existing state-of-the-art methods,this proposed model outperforms in terms of accuracy,F1-score,and the number of disease pathogens detection.In future,more diseases can be included for detection along with some other modalities like pests and weed.
文摘Transformers are normally designed and built for use at rated frequency and sinusoidal load current. A non-linear load on a transformer leads to harmonic power losses which cause increased operational costs and additional heating in transformer parts. It leads to higher losses, early fatigue of insulation, premature failure and reduction of the useful life of the transformer. To prevent these problems, the rated capacity of transformer which supplies harmonic loads must be reduced. In this work, a typical 50 kVA three-phase distribution transformer with real practical parameters is taken under non-linear loads generated due to domestic loads. The core losses is evaluated using the three dimensional model of the transformer developed in FEM (finite element method) program based on valid model of transformer under high harmonic conditions. And finally a relation associated with core losses and amplitude of high harmonic order are reviewed & analyzed and then a comparison is being carried out on the results obtained by different excitation current in transformer windings.