Generating realistic and synthetic video from text is a highly challenging task due to the multitude of issues involved,including digit deformation,noise interference between frames,blurred output,and the need for tem...Generating realistic and synthetic video from text is a highly challenging task due to the multitude of issues involved,including digit deformation,noise interference between frames,blurred output,and the need for temporal coherence across frames.In this paper,we propose a novel approach for generating coherent videos of moving digits from textual input using a Deep Deconvolutional Generative Adversarial Network(DD-GAN).The DDGAN comprises a Deep Deconvolutional Neural Network(DDNN)as a Generator(G)and a modified Deep Convolutional Neural Network(DCNN)as a Discriminator(D)to ensure temporal coherence between adjacent frames.The proposed research involves several steps.First,the input text is fed into a Long Short Term Memory(LSTM)based text encoder and then smoothed using Conditioning Augmentation(CA)techniques to enhance the effectiveness of the Generator(G).Next,using a DDNN to generate video frames by incorporating enhanced text and random noise and modifying a DCNN to act as a Discriminator(D),effectively distinguishing between generated and real videos.This research evaluates the quality of the generated videos using standard metrics like Inception Score(IS),Fréchet Inception Distance(FID),Fréchet Inception Distance for video(FID2vid),and Generative Adversarial Metric(GAM),along with a human study based on realism,coherence,and relevance.By conducting experiments on Single-Digit Bouncing MNIST GIFs(SBMG),Two-Digit Bouncing MNIST GIFs(TBMG),and a custom dataset of essential mathematics videos with related text,this research demonstrates significant improvements in both metrics and human study results,confirming the effectiveness of DD-GAN.This research also took the exciting challenge of generating preschool math videos from text,handling complex structures,digits,and symbols,and achieving successful results.The proposed research demonstrates promising results for generating coherent videos from textual input.展开更多
Climate models are vital for understanding and projecting global climate change and its associated impacts.However,these models suffer from biases that limit their accuracy in historical simulations and the trustworth...Climate models are vital for understanding and projecting global climate change and its associated impacts.However,these models suffer from biases that limit their accuracy in historical simulations and the trustworthiness of future projections.Addressing these challenges requires addressing internal variability,hindering the direct alignment between model simulations and observations,and thwarting conventional supervised learning methods.Here,we employ an unsupervised Cycle-consistent Generative Adversarial Network(CycleGAN),to correct daily Sea Surface Temperature(SST)simulations from the Community Earth System Model 2(CESM2).Our results reveal that the CycleGAN not only corrects climatological biases but also improves the simulation of major dynamic modes including the El Niño-Southern Oscillation(ENSO)and the Indian Ocean Dipole mode,as well as SST extremes.Notably,it substantially corrects climatological SST biases,decreasing the globally averaged Root-Mean-Square Error(RMSE)by 58%.Intriguingly,the CycleGAN effectively addresses the well-known excessive westward bias in ENSO SST anomalies,a common issue in climate models that traditional methods,like quantile mapping,struggle to rectify.Additionally,it substantially improves the simulation of SST extremes,raising the pattern correlation coefficient(PCC)from 0.56 to 0.88 and lowering the RMSE from 0.5 to 0.32.This enhancement is attributed to better representations of interannual,intraseasonal,and synoptic scales variabilities.Our study offers a novel approach to correct global SST simulations and underscores its effectiveness across different time scales and primary dynamical modes.展开更多
Mechanically cleaved two-dimensional materials are random in size and thickness.Recognizing atomically thin flakes by human experts is inefficient and unsuitable for scalable production.Deep learning algorithms have b...Mechanically cleaved two-dimensional materials are random in size and thickness.Recognizing atomically thin flakes by human experts is inefficient and unsuitable for scalable production.Deep learning algorithms have been adopted as an alternative,nevertheless a major challenge is a lack of sufficient actual training images.Here we report the generation of synthetic two-dimensional materials images using StyleGAN3 to complement the dataset.DeepLabv3Plus network is trained with the synthetic images which reduces overfitting and improves recognition accuracy to over 90%.A semi-supervisory technique for labeling images is introduced to reduce manual efforts.The sharper edges recognized by this method facilitate material stacking with precise edge alignment,which benefits exploring novel properties of layered-material devices that crucially depend on the interlayer twist-angle.This feasible and efficient method allows for the rapid and high-quality manufacturing of atomically thin materials and devices.展开更多
This study addresses challenges in fetal magnetic resonance imaging (MRI) related to motion artifacts, maternal respiration, and hardware limitations. To enhance MRI quality, we employ deep learning techniques, specif...This study addresses challenges in fetal magnetic resonance imaging (MRI) related to motion artifacts, maternal respiration, and hardware limitations. To enhance MRI quality, we employ deep learning techniques, specifically utilizing Cycle GAN. Synthetic pairs of images, simulating artifacts in fetal MRI, are generated to train the model. Our primary contribution is the use of Cycle GAN for fetal MRI restoration, augmented by artificially corrupted data. We compare three approaches (supervised Cycle GAN, Pix2Pix, and Mobile Unet) for artifact removal. Experimental results demonstrate that the proposed supervised Cycle GAN effectively removes artifacts while preserving image details, as validated through Structural Similarity Index Measure (SSIM) and normalized Mean Absolute Error (MAE). The method proves comparable to alternatives but avoids the generation of spurious regions, which is crucial for medical accuracy.展开更多
Geochemical maps are of great value in mineral exploration.Integrated geochemical anomaly maps provide comprehensive information about mapping assemblages of element concentrations to possible types of mineralization/...Geochemical maps are of great value in mineral exploration.Integrated geochemical anomaly maps provide comprehensive information about mapping assemblages of element concentrations to possible types of mineralization/ore,but vary depending on expert's knowledge and experience.This paper aims to test the capability of deep neural networks to delineate integrated anomaly based on a case study of the Zhaojikou Pb-Zn deposit,Southeast China.Three hundred fifty two samples were collected,and each sample consisted of 26 variables covering elemental composition,geological,and tectonic information.At first,generative adversarial networks were adopted for data augmentation.Then,DNN was trained on sets of synthetic and real data to identify an integrated anomaly.Finally,the results of DNN analyses were visualized in probability maps and compared with traditional anomaly maps to check its performance.Results showed that the average accuracy of the validation set was 94.76%.The probability maps showed that newly-identified integrated anomalous areas had a probability of above 75%in the northeast zones.It also showed that DNN models that used big data not only successfully recognized the anomalous areas identified on traditional geochemical element maps,but also discovered new anomalous areas,not picked up by the elemental anomaly maps previously.展开更多
In underground mining,the belt is a critical component,as its state directly affects the safe and stable operation of the conveyor.Most of the existing non-contact detection methods based on machine vision can only de...In underground mining,the belt is a critical component,as its state directly affects the safe and stable operation of the conveyor.Most of the existing non-contact detection methods based on machine vision can only detect a single type of damage and they require pre-processing operations.This tends to cause a large amount of calculation and low detection precision.To solve these problems,in the work described in this paper a belt tear detection method based on a multi-class conditional deep convolutional generative adversarial network(CDCGAN)was designed.In the traditional DCGAN,the image generated by the generator has a certain degree of randomness.Here,a small number of labeled belt images are taken as conditions and added them to the generator and discriminator,so the generator can generate images with the characteristics of belt damage under the aforementioned conditions.Moreover,because the discriminator cannot identify multiple types of damage,the multi-class softmax function is used as the output function of the discriminator to output a vector of class probabilities,and it can accurately classify cracks,scratches,and tears.To avoid the features learned incompletely,skiplayer connection is adopted in the generator and discriminator.This not only can minimize the loss of features,but also improves the convergence speed.Compared with other algorithms,experimental results show that the loss value of the generator and discriminator is the least.Moreover,its convergence speed is faster,and the mean average precision of the proposed algorithm is up to 96.2%,which is at least 6%higher than that of other algorithms.展开更多
Short Retraction Notice The authors claim that this paper needs modifications. This article has been retracted to straighten the academic record. In making this decision the Editorial Board follows COPE's Retracti...Short Retraction Notice The authors claim that this paper needs modifications. This article has been retracted to straighten the academic record. In making this decision the Editorial Board follows COPE's Retraction Guidelines. The aim is to promote the circulation of scientific research by offering an ideal research publication platform with due consideration of internationally accepted standards on publication ethics. The Editorial Board would like to extend its sincere apologies for any inconvenience this retraction may have caused. Editor guiding this retraction: Prof. Baozong Yuan(EiC of JSIP) The full retraction notice in PDF is preceding the original paper, which is marked "RETRACTED".展开更多
Recently,speech enhancement methods based on Generative Adversarial Networks have achieved good performance in time-domain noisy signals.However,the training of Generative Adversarial Networks has such problems as con...Recently,speech enhancement methods based on Generative Adversarial Networks have achieved good performance in time-domain noisy signals.However,the training of Generative Adversarial Networks has such problems as convergence difficulty,model collapse,etc.In this work,an end-to-end speech enhancement model based on Wasserstein Generative Adversarial Networks is proposed,and some improvements have been made in order to get faster convergence speed and better generated speech quality.Specifically,in the generator coding part,each convolution layer adopts different convolution kernel sizes to conduct convolution operations for obtaining speech coding information from multiple scales;a gated linear unit is introduced to alleviate the vanishing gradient problem with the increase of network depth;the gradient penalty of the discriminator is replaced with spectral normalization to accelerate the convergence rate of themodel;a hybrid penalty termcomposed of L1 regularization and a scale-invariant signal-to-distortion ratio is introduced into the loss function of the generator to improve the quality of generated speech.The experimental results on both TIMIT corpus and Tibetan corpus show that the proposed model improves the speech quality significantly and accelerates the convergence speed of the model.展开更多
Spectrum prediction is one of the new techniques in cognitive radio that predicts changes in the spectrum state and plays a crucial role in improving spectrum sensing performance.Prediction models previously trained i...Spectrum prediction is one of the new techniques in cognitive radio that predicts changes in the spectrum state and plays a crucial role in improving spectrum sensing performance.Prediction models previously trained in the source band tend to perform poorly in the new target band because of changes in the channel.In addition,cognitive radio devices require dynamic spectrum access,which means that the time to retrain the model in the new band is minimal.To increase the amount of data in the target band,we use the GAN to convert the data of source band into target band.First,we analyze the data differences between bands and calculate FID scores to identify the available bands with the slightest difference from the target predicted band.The original GAN structure is unsuitable for converting spectrum data,and we propose the spectrum data conversion GAN(SDC-GAN).The generator module consists of a convolutional network and an LSTM module that can integrate multiple features of the data and can convert data from the source band to the target band.Finally,we use the generated target band data to train the prediction model.The experimental results validate the effectiveness of the proposed algorithm.展开更多
Ceramic tiles are one of the most indispensable materials for interior decoration.The ceramic patterns can’t match the design requirements in terms of diversity and interactivity due to their natural textures.In this...Ceramic tiles are one of the most indispensable materials for interior decoration.The ceramic patterns can’t match the design requirements in terms of diversity and interactivity due to their natural textures.In this paper,we propose a sketch-based generation method for generating diverse ceramic tile images based on a hand-drawn sketches using Generative Adversarial Network(GAN).The generated tile images can be tailored to meet the specific needs of the user for the tile textures.The proposed method consists of four steps.Firstly,a dataset of ceramic tile images with diverse distributions is created and then pre-trained based on GAN.Secondly,for each ceramic tile image in the dataset,the corresponding sketch image is generated and then the mapping relationship between the images is trained based on a sketch extraction network using ResNet Block and jump connection to improve the quality of the generated sketches.Thirdly,the sketch style is redefined according to the characteristics of the ceramic tile images and then double cross-domain adversarial loss functions are employed to guide the ceramic tile generation network for fitting in the direction of the sketch style and to improve the training speed.Finally,we apply hidden space perturbation and interpolation for further enriching the output textures style and satisfying the concept of“one style with multiple faces”.We conduct the training process of the proposed generation network on 2583 ceramic tile images dataset.To measure the generative diversity and quality,we use Frechet Inception Distance(FID)and Blind/Referenceless Image Spatial Quality Evaluator(BRISQUE)metrics.The experimental results prove that the proposed model greatly enhances the generation results of the ceramic tile images,with FID of 32.47 and BRISQUE of 28.44.展开更多
The classification of lung nodules is a challenging problem as the visual analysis of the nodules and non-nodules revealed homogenous textural patterns.In this work,an Auxiliary Classifier(AC)-Generative Adversarial Net...The classification of lung nodules is a challenging problem as the visual analysis of the nodules and non-nodules revealed homogenous textural patterns.In this work,an Auxiliary Classifier(AC)-Generative Adversarial Network(GAN)based Lung Cancer Classification(LCC)system is developed.The pro-posed AC-GAN-LCC system consists of three modules;preprocessing,Lungs Region Detection(LRD),and AC-GAN classification.A Wienerfilter is employed in the preprocessing module to remove the Gaussian noise.In the LRD module,only the lung regions(left and right lungs)are detected using itera-tive thresholding and morphological operations.In order to extract the lung region only,floodfilling and background subtraction.The detected lung regions are fed to the AC-GAN classifier to detect the nodules.It classifies the nodules into one of the two classes,i.e.,binary classification(such as nodules or non-nodules).The AC-GAN is the extended version of the conditional GAN that predicts the label of a given image.Three different optimization techniques,adaptive gradient optimi-zation,root mean square propagation optimization,and Adam optimization are employed for optimizing the AC-GAN architecture.The proposed AC-GAN-LCC system is evaluated on the Lung Image Database Consortium(LIDC)data-base Computed Tomography(CT)scan images.The proposed AC-GAN-LCC system classifies∼15000 CT slices(7310 non-nodules and 7685 nodules).It pro-vides an overall accuracy of 98.8%on the LIDC database using Adam optimiza-tion by a 10-fold cross-validation approach.展开更多
Accurate boundaries of smallholder farm fields are important and indispensable geo-information that benefits farmers,managers,and policymakers in terms of better managing and utilizing their agricultural resources.Due...Accurate boundaries of smallholder farm fields are important and indispensable geo-information that benefits farmers,managers,and policymakers in terms of better managing and utilizing their agricultural resources.Due to their small size,irregular shape,and the use of mixed-cropping techniques,the farm fields of smallholder can be difficult to delineate automatically.In recent years,numerous studies on field contour extraction using a deep Convolutional Neural Network(CNN)have been proposed.However,there is a relative shortage of labeled data for filed boundaries,thus affecting the training effect of CNN.Traditional methods mostly use image flipping,and random rotation for data augmentation.In this paper,we propose to apply Generative Adversarial Network(GAN)for the data augmentation of farm fields label to increase the diversity of samples.Specifically,we propose an automated method featured by Fully Convolutional Neural networks(FCN)in combination with GAN to improve the delineation accuracy of smallholder farms from Very High Resolution(VHR)images.We first investigate four State-Of-The-Art(SOTA)FCN architectures,i.e.,U-Net,PSPNet,SegNet and OCRNet,to find the optimal architecture in the contour detection task of smallholder farm fields.Second,we apply the identified optimal FCN architecture in combination with Contour GAN and pixel2pixel GAN to improve the accuracy of contour detection.We test our method on the study area in the Sudano-Sahelian savanna region of northern Nigeria.The best combination achieved F1 scores of 0.686 on Test Set 1(TS1),0.684 on Test Set 2(TS2),and 0.691 on Test Set 3(TS3).Results indicate that our architecture adapts to a variety of advanced networks and proves its effectiveness in this task.The conceptual,theoretical,and experimental knowledge from this study is expected to seed many GAN-based farm delineation methods in the future.展开更多
Deep Learning(DL)is such a powerful tool that we have seen tremendous success in areas such as Computer Vision,Speech Recognition,and Natural Language Processing.Since Automated Modulation Classification(AMC)is an imp...Deep Learning(DL)is such a powerful tool that we have seen tremendous success in areas such as Computer Vision,Speech Recognition,and Natural Language Processing.Since Automated Modulation Classification(AMC)is an important part in Cognitive Radio Networks,we try to explore its potential in solving signal modulation recognition problem.It cannot be overlooked that DL model is a complex model,thus making them prone to over-fitting.DL model requires many training data to combat with over-fitting,but adding high quality labels to training data manually is not always cheap and accessible,especially in real-time system,which may counter unprecedented data in dataset.Semi-supervised Learning is a way to exploit unlabeled data effectively to reduce over-fitting in DL.In this paper,we extend Generative Adversarial Networks(GANs)to the semi-supervised learning will show it is a method can be used to create a more dataefficient classifier.展开更多
In recent years,landslide susceptibility mapping has substantially improved with advances in machine learning.However,there are still challenges remain in landslide mapping due to the availability of limited inventory...In recent years,landslide susceptibility mapping has substantially improved with advances in machine learning.However,there are still challenges remain in landslide mapping due to the availability of limited inventory data.In this paper,a novel method that improves the performance of machine learning techniques is presented.The proposed method creates synthetic inventory data using Generative Adversarial Networks(GANs)for improving the prediction of landslides.In this research,landslide inventory data of 156 landslide locations were identified in Cameron Highlands,Malaysia,taken from previous projects the authors worked on.Elevation,slope,aspect,plan curvature,profile curvature,total curvature,lithology,land use and land cover(LULC),distance to the road,distance to the river,stream power index(SPI),sediment transport index(STI),terrain roughness index(TRI),topographic wetness index(TWI)and vegetation density are geo-environmental factors considered in this study based on suggestions from previous works on Cameron Highlands.To show the capability of GANs in improving landslide prediction models,this study tests the proposed GAN model with benchmark models namely Artificial Neural Network(ANN),Support Vector Machine(SVM),Decision Trees(DT),Random Forest(RF)and Bagging ensemble models with ANN and SVM models.These models were validated using the area under the receiver operating characteristic curve(AUROC).The DT,RF,SVM,ANN and Bagging ensemble could achieve the AUROC values of(0.90,0.94,0.86,0.69 and 0.82)for the training;and the AUROC of(0.76,0.81,0.85,0.72 and 0.75)for the test,subsequently.When using additional samples,the same models achieved the AUROC values of(0.92,0.94,0.88,0.75 and 0.84)for the training and(0.78,0.82,0.82,0.78 and 0.80)for the test,respectively.Using the additional samples improved the test accuracy of all the models except SVM.As a result,in data-scarce environments,this research showed that utilizing GANs to generate supplementary samples is promising because it can improve the predictive capability of common landslide prediction models.展开更多
The topological connectivity information derived from the brain functional network can bring new insights for diagnosing and analyzing dementia disorders.The brain functional network is suitable to bridge the correlat...The topological connectivity information derived from the brain functional network can bring new insights for diagnosing and analyzing dementia disorders.The brain functional network is suitable to bridge the correlation between abnormal connectivities and dementia disorders.However,it is challenging to access considerable amounts of brain functional network data,which hinders the widespread application of data-driven models in dementia diagnosis.In this study,a novel distribution-regularized adversarial graph auto-Encoder(DAGAE)with transformer is proposed to generate new fake brain functional networks to augment the brain functional network dataset,improving the dementia diagnosis accuracy of data-driven models.Specifically,the label distribution is estimated to regularize the latent space learned by the graph encoder,which canmake the learning process stable and the learned representation robust.Also,the transformer generator is devised to map the node representations into node-to-node connections by exploring the long-term dependence of highly-correlated distant brain regions.The typical topological properties and discriminative features can be preserved entirely.Furthermore,the generated brain functional networks improve the prediction performance using different classifiers,which can be applied to analyze other cognitive diseases.Attempts on the Alzheimer’s Disease Neuroimaging Initiative(ADNI)dataset demonstrate that the proposed model can generate good brain functional networks.The classification results show adding generated data can achieve the best accuracy value of 85.33%,sensitivity value of 84.00%,specificity value of 86.67%.The proposed model also achieves superior performance compared with other related augmentedmodels.Overall,the proposedmodel effectively improves cognitive disease diagnosis by generating diverse brain functional networks.展开更多
Lithium-ion batteries are key drivers of the renewable energy revolution,bolstered by progress in battery design,modelling,and management.Yet,achieving high-performance battery health prognostics is a significant chal...Lithium-ion batteries are key drivers of the renewable energy revolution,bolstered by progress in battery design,modelling,and management.Yet,achieving high-performance battery health prognostics is a significant challenge.With the availability of open data and software,coupled with automated simulations,deep learning has become an integral component of battery health prognostics.We offer a comprehensive overview of potential deep learning techniques specifically designed for modeling and forecasting the dynamics of multiphysics and multiscale battery systems.Following this,we provide a concise summary of publicly available lithium-ion battery test and cycle datasets.By providing illustrative examples,we emphasize the efficacy of five techniques capable of enhancing deep learning for accurate battery state prediction and health-focused management.Each of these techniques offers unique benefits.(1)Transformer models address challenges using self-attention mechanisms and positional encoding methods.(2) Transfer learning improves learning tasks within a target domain by leveraging knowledge from a source domain.(3) Physics-informed learning uses prior knowledge to enhance learning algorithms.(4)Generative adversarial networks(GANs) earn praise for their ability to generate diverse and high-quality outputs,exhibiting outstanding performance with complex datasets.(5) Deep reinforcement learning enables an agent to make optimal decisions through continuous interactions with its environment,thus maximizing cumulative rewards.In this Review,we highlight examples that employ these techniques for battery health prognostics,summarizing both their challenges and opportunities.These methodologies offer promising prospects for researchers and industry professionals,enabling the creation of specialized network architectures that autonomously extract features,especially for long-range spatial-temporal connections across extended timescales.The outcomes could include improved accuracy,faster training,and enhanced generalization.展开更多
In the past,sketches were a standard technique used for recognizing offenders and have remained a valuable tool for law enforcement and social security purposes.However,relying on eyewitness observations can lead to d...In the past,sketches were a standard technique used for recognizing offenders and have remained a valuable tool for law enforcement and social security purposes.However,relying on eyewitness observations can lead to discrepancies in the depictions of the sketch,depending on the experience and skills of the sketch artist.With the emergence of modern technologies such as Generative Adversarial Networks(GANs),generating images using verbal and textual cues is now possible,resulting in more accurate sketch depictions.In this study,we propose an adversarial network that generates human facial sketches using such cues provided by an observer.Additionally,we have introduced an Inverse Gamma Correction Technique to improve the training and enhance the quality of the generated sketches.To evaluate the effectiveness of our proposed method,we conducted experiments and analyzed the results using the inception score and Frechet Inception Distance metrics.Our proposed method achieved an overall inception score of 1.438±0.049 and a Frechet Inception Distance of 65.29,outperforming other state-of-the-art techniques.展开更多
Single image super resolution(SISR)is an important research content in the field of computer vision and image processing.With the rapid development of deep neural networks,different image super-resolution models have ...Single image super resolution(SISR)is an important research content in the field of computer vision and image processing.With the rapid development of deep neural networks,different image super-resolution models have emerged.Compared to some traditional SISR methods,deep learning-based methods can complete the super-resolution tasks through a single image.In addition,compared with the SISR methods using traditional convolutional neural networks,SISR based on generative adversarial networks(GAN)has achieved the most advanced visual performance.In this review,we first explore the challenges faced by SISR and introduce some common datasets and evaluation metrics.Then,we review the improved network structures and loss functions of GAN-based perceptual SISR.Subsequently,the advantages and disadvantages of different networks are analyzed by multiple comparative experiments.Finally,we summarize the paper and look forward to the future development trends of GAN-based perceptual SISR.展开更多
Digital watermark embeds information bits into digital cover such as images and videos to prove the creator’s ownership of his work.In this paper,we propose a robust image watermark algorithm based on a generative ad...Digital watermark embeds information bits into digital cover such as images and videos to prove the creator’s ownership of his work.In this paper,we propose a robust image watermark algorithm based on a generative adversarial network.This model includes two modules,generator and adversary.Generator is mainly used to generate images embedded with watermark,and decode the image damaged by noise to obtain the watermark.Adversary is used to discriminate whether the image is embedded with watermark and damage the image by noise.Based on the model Hidden(hiding data with deep networks),we add a high-pass filter in front of the discriminator,making the watermark tend to be embedded in the mid-frequency region of the image.Since the human visual system pays more attention to the central area of the image,we give a higher weight to the image center region,and a lower weight to the edge region when calculating the loss between cover and embedded image.The watermarked image obtained by this scheme has a better visual performance.Experimental results show that the proposed architecture is more robust against noise interference compared with the state-of-art schemes.展开更多
Ground military target recognition plays a crucial role in unmanned equipment and grasping the battlefield dynamics for military applications, but is disturbed by low-resolution and noisyrepresentation. In this paper,...Ground military target recognition plays a crucial role in unmanned equipment and grasping the battlefield dynamics for military applications, but is disturbed by low-resolution and noisyrepresentation. In this paper, a recognition method, involving a novel visual attention mechanismbased Gabor region proposal sub-network(Gabor RPN) and improved refinement generative adversarial sub-network(GAN), is proposed. Novel central-peripheral rivalry 3D color Gabor filters are proposed to simulate retinal structures and taken as feature extraction convolutional kernels in low-level layer to improve the recognition accuracy and framework training efficiency in Gabor RPN. Improved refinement GAN is used to solve the problem of blurry target classification, involving a generator to directly generate large high-resolution images from small blurry ones and a discriminator to distinguish not only real images vs. fake images but also the class of targets. A special recognition dataset for ground military target, named Ground Military Target Dataset(GMTD), is constructed. Experiments performed on the GMTD dataset effectively demonstrate that our method can achieve better energy-saving and recognition results when low-resolution and noisy-representation targets are involved, thus ensuring this algorithm a good engineering application prospect.展开更多
基金supported by the General Program of the National Natural Science Foundation of China(Grant No.61977029).
文摘Generating realistic and synthetic video from text is a highly challenging task due to the multitude of issues involved,including digit deformation,noise interference between frames,blurred output,and the need for temporal coherence across frames.In this paper,we propose a novel approach for generating coherent videos of moving digits from textual input using a Deep Deconvolutional Generative Adversarial Network(DD-GAN).The DDGAN comprises a Deep Deconvolutional Neural Network(DDNN)as a Generator(G)and a modified Deep Convolutional Neural Network(DCNN)as a Discriminator(D)to ensure temporal coherence between adjacent frames.The proposed research involves several steps.First,the input text is fed into a Long Short Term Memory(LSTM)based text encoder and then smoothed using Conditioning Augmentation(CA)techniques to enhance the effectiveness of the Generator(G).Next,using a DDNN to generate video frames by incorporating enhanced text and random noise and modifying a DCNN to act as a Discriminator(D),effectively distinguishing between generated and real videos.This research evaluates the quality of the generated videos using standard metrics like Inception Score(IS),Fréchet Inception Distance(FID),Fréchet Inception Distance for video(FID2vid),and Generative Adversarial Metric(GAM),along with a human study based on realism,coherence,and relevance.By conducting experiments on Single-Digit Bouncing MNIST GIFs(SBMG),Two-Digit Bouncing MNIST GIFs(TBMG),and a custom dataset of essential mathematics videos with related text,this research demonstrates significant improvements in both metrics and human study results,confirming the effectiveness of DD-GAN.This research also took the exciting challenge of generating preschool math videos from text,handling complex structures,digits,and symbols,and achieving successful results.The proposed research demonstrates promising results for generating coherent videos from textual input.
基金supported by the National Natural Science Foundation of China(Grant Nos.42141019 and 42261144687)the Second Tibetan Plateau Scientific Expedition and Research(STEP)program(Grant No.2019QZKK0102)+4 种基金the Strategic Priority Research Program of the Chinese Academy of Sciences(Grant No.XDB42010404)the National Natural Science Foundation of China(Grant No.42175049)the Guangdong Meteorological Service Science and Technology Research Project(Grant No.GRMC2021M01)the National Key Scientific and Technological Infrastructure project“Earth System Science Numerical Simulator Facility”(EarthLab)for computational support and Prof.Shiming XIANG for many useful discussionsNiklas BOERS acknowledges funding from the Volkswagen foundation.
文摘Climate models are vital for understanding and projecting global climate change and its associated impacts.However,these models suffer from biases that limit their accuracy in historical simulations and the trustworthiness of future projections.Addressing these challenges requires addressing internal variability,hindering the direct alignment between model simulations and observations,and thwarting conventional supervised learning methods.Here,we employ an unsupervised Cycle-consistent Generative Adversarial Network(CycleGAN),to correct daily Sea Surface Temperature(SST)simulations from the Community Earth System Model 2(CESM2).Our results reveal that the CycleGAN not only corrects climatological biases but also improves the simulation of major dynamic modes including the El Niño-Southern Oscillation(ENSO)and the Indian Ocean Dipole mode,as well as SST extremes.Notably,it substantially corrects climatological SST biases,decreasing the globally averaged Root-Mean-Square Error(RMSE)by 58%.Intriguingly,the CycleGAN effectively addresses the well-known excessive westward bias in ENSO SST anomalies,a common issue in climate models that traditional methods,like quantile mapping,struggle to rectify.Additionally,it substantially improves the simulation of SST extremes,raising the pattern correlation coefficient(PCC)from 0.56 to 0.88 and lowering the RMSE from 0.5 to 0.32.This enhancement is attributed to better representations of interannual,intraseasonal,and synoptic scales variabilities.Our study offers a novel approach to correct global SST simulations and underscores its effectiveness across different time scales and primary dynamical modes.
基金Project supported by the National Key Research and Development Program of China(Grant No.2022YFB2803900)the National Natural Science Foundation of China(Grant Nos.61974075 and 61704121)+2 种基金the Natural Science Foundation of Tianjin Municipality(Grant Nos.22JCZDJC00460 and 19JCQNJC00700)Tianjin Municipal Education Commission(Grant No.2019KJ028)Fundamental Research Funds for the Central Universities(Grant No.22JCZDJC00460).
文摘Mechanically cleaved two-dimensional materials are random in size and thickness.Recognizing atomically thin flakes by human experts is inefficient and unsuitable for scalable production.Deep learning algorithms have been adopted as an alternative,nevertheless a major challenge is a lack of sufficient actual training images.Here we report the generation of synthetic two-dimensional materials images using StyleGAN3 to complement the dataset.DeepLabv3Plus network is trained with the synthetic images which reduces overfitting and improves recognition accuracy to over 90%.A semi-supervisory technique for labeling images is introduced to reduce manual efforts.The sharper edges recognized by this method facilitate material stacking with precise edge alignment,which benefits exploring novel properties of layered-material devices that crucially depend on the interlayer twist-angle.This feasible and efficient method allows for the rapid and high-quality manufacturing of atomically thin materials and devices.
文摘This study addresses challenges in fetal magnetic resonance imaging (MRI) related to motion artifacts, maternal respiration, and hardware limitations. To enhance MRI quality, we employ deep learning techniques, specifically utilizing Cycle GAN. Synthetic pairs of images, simulating artifacts in fetal MRI, are generated to train the model. Our primary contribution is the use of Cycle GAN for fetal MRI restoration, augmented by artificially corrupted data. We compare three approaches (supervised Cycle GAN, Pix2Pix, and Mobile Unet) for artifact removal. Experimental results demonstrate that the proposed supervised Cycle GAN effectively removes artifacts while preserving image details, as validated through Structural Similarity Index Measure (SSIM) and normalized Mean Absolute Error (MAE). The method proves comparable to alternatives but avoids the generation of spurious regions, which is crucial for medical accuracy.
基金supported by NFSC Funds(Grant Nos.41902071 and 42011530173)the Doctoral Research Start-up Fund,East China University of Technology(DHBK2019313)。
文摘Geochemical maps are of great value in mineral exploration.Integrated geochemical anomaly maps provide comprehensive information about mapping assemblages of element concentrations to possible types of mineralization/ore,but vary depending on expert's knowledge and experience.This paper aims to test the capability of deep neural networks to delineate integrated anomaly based on a case study of the Zhaojikou Pb-Zn deposit,Southeast China.Three hundred fifty two samples were collected,and each sample consisted of 26 variables covering elemental composition,geological,and tectonic information.At first,generative adversarial networks were adopted for data augmentation.Then,DNN was trained on sets of synthetic and real data to identify an integrated anomaly.Finally,the results of DNN analyses were visualized in probability maps and compared with traditional anomaly maps to check its performance.Results showed that the average accuracy of the validation set was 94.76%.The probability maps showed that newly-identified integrated anomalous areas had a probability of above 75%in the northeast zones.It also showed that DNN models that used big data not only successfully recognized the anomalous areas identified on traditional geochemical element maps,but also discovered new anomalous areas,not picked up by the elemental anomaly maps previously.
基金This work was supported by the Shanxi Province Applied Basic Research Project,China(Grant No.201901D111100).Xiaoli Hao received the grant,and the URL of the sponsors’website is http://kjt.shanxi.gov.cn/.
文摘In underground mining,the belt is a critical component,as its state directly affects the safe and stable operation of the conveyor.Most of the existing non-contact detection methods based on machine vision can only detect a single type of damage and they require pre-processing operations.This tends to cause a large amount of calculation and low detection precision.To solve these problems,in the work described in this paper a belt tear detection method based on a multi-class conditional deep convolutional generative adversarial network(CDCGAN)was designed.In the traditional DCGAN,the image generated by the generator has a certain degree of randomness.Here,a small number of labeled belt images are taken as conditions and added them to the generator and discriminator,so the generator can generate images with the characteristics of belt damage under the aforementioned conditions.Moreover,because the discriminator cannot identify multiple types of damage,the multi-class softmax function is used as the output function of the discriminator to output a vector of class probabilities,and it can accurately classify cracks,scratches,and tears.To avoid the features learned incompletely,skiplayer connection is adopted in the generator and discriminator.This not only can minimize the loss of features,but also improves the convergence speed.Compared with other algorithms,experimental results show that the loss value of the generator and discriminator is the least.Moreover,its convergence speed is faster,and the mean average precision of the proposed algorithm is up to 96.2%,which is at least 6%higher than that of other algorithms.
文摘Short Retraction Notice The authors claim that this paper needs modifications. This article has been retracted to straighten the academic record. In making this decision the Editorial Board follows COPE's Retraction Guidelines. The aim is to promote the circulation of scientific research by offering an ideal research publication platform with due consideration of internationally accepted standards on publication ethics. The Editorial Board would like to extend its sincere apologies for any inconvenience this retraction may have caused. Editor guiding this retraction: Prof. Baozong Yuan(EiC of JSIP) The full retraction notice in PDF is preceding the original paper, which is marked "RETRACTED".
基金supported by the National Science Foundation under Grant No.62066039.
文摘Recently,speech enhancement methods based on Generative Adversarial Networks have achieved good performance in time-domain noisy signals.However,the training of Generative Adversarial Networks has such problems as convergence difficulty,model collapse,etc.In this work,an end-to-end speech enhancement model based on Wasserstein Generative Adversarial Networks is proposed,and some improvements have been made in order to get faster convergence speed and better generated speech quality.Specifically,in the generator coding part,each convolution layer adopts different convolution kernel sizes to conduct convolution operations for obtaining speech coding information from multiple scales;a gated linear unit is introduced to alleviate the vanishing gradient problem with the increase of network depth;the gradient penalty of the discriminator is replaced with spectral normalization to accelerate the convergence rate of themodel;a hybrid penalty termcomposed of L1 regularization and a scale-invariant signal-to-distortion ratio is introduced into the loss function of the generator to improve the quality of generated speech.The experimental results on both TIMIT corpus and Tibetan corpus show that the proposed model improves the speech quality significantly and accelerates the convergence speed of the model.
基金supported by the fund coded,National Natural Science Fund program(No.11975307)China National Defence Science and Technology Innovation Special Zone Project(19-H863-01-ZT-003-003-12).
文摘Spectrum prediction is one of the new techniques in cognitive radio that predicts changes in the spectrum state and plays a crucial role in improving spectrum sensing performance.Prediction models previously trained in the source band tend to perform poorly in the new target band because of changes in the channel.In addition,cognitive radio devices require dynamic spectrum access,which means that the time to retrain the model in the new band is minimal.To increase the amount of data in the target band,we use the GAN to convert the data of source band into target band.First,we analyze the data differences between bands and calculate FID scores to identify the available bands with the slightest difference from the target predicted band.The original GAN structure is unsuitable for converting spectrum data,and we propose the spectrum data conversion GAN(SDC-GAN).The generator module consists of a convolutional network and an LSTM module that can integrate multiple features of the data and can convert data from the source band to the target band.Finally,we use the generated target band data to train the prediction model.The experimental results validate the effectiveness of the proposed algorithm.
基金funded by the Public Welfare Technology Research Project of Zhejiang Province(Grant No.LGF21F020014)the Opening Project ofKey Laboratory of Public Security Information Application Based on Big-Data Architecture,Ministry of Public Security of Zhejiang Police College(Grant No.2021DSJSYS002).
文摘Ceramic tiles are one of the most indispensable materials for interior decoration.The ceramic patterns can’t match the design requirements in terms of diversity and interactivity due to their natural textures.In this paper,we propose a sketch-based generation method for generating diverse ceramic tile images based on a hand-drawn sketches using Generative Adversarial Network(GAN).The generated tile images can be tailored to meet the specific needs of the user for the tile textures.The proposed method consists of four steps.Firstly,a dataset of ceramic tile images with diverse distributions is created and then pre-trained based on GAN.Secondly,for each ceramic tile image in the dataset,the corresponding sketch image is generated and then the mapping relationship between the images is trained based on a sketch extraction network using ResNet Block and jump connection to improve the quality of the generated sketches.Thirdly,the sketch style is redefined according to the characteristics of the ceramic tile images and then double cross-domain adversarial loss functions are employed to guide the ceramic tile generation network for fitting in the direction of the sketch style and to improve the training speed.Finally,we apply hidden space perturbation and interpolation for further enriching the output textures style and satisfying the concept of“one style with multiple faces”.We conduct the training process of the proposed generation network on 2583 ceramic tile images dataset.To measure the generative diversity and quality,we use Frechet Inception Distance(FID)and Blind/Referenceless Image Spatial Quality Evaluator(BRISQUE)metrics.The experimental results prove that the proposed model greatly enhances the generation results of the ceramic tile images,with FID of 32.47 and BRISQUE of 28.44.
文摘The classification of lung nodules is a challenging problem as the visual analysis of the nodules and non-nodules revealed homogenous textural patterns.In this work,an Auxiliary Classifier(AC)-Generative Adversarial Network(GAN)based Lung Cancer Classification(LCC)system is developed.The pro-posed AC-GAN-LCC system consists of three modules;preprocessing,Lungs Region Detection(LRD),and AC-GAN classification.A Wienerfilter is employed in the preprocessing module to remove the Gaussian noise.In the LRD module,only the lung regions(left and right lungs)are detected using itera-tive thresholding and morphological operations.In order to extract the lung region only,floodfilling and background subtraction.The detected lung regions are fed to the AC-GAN classifier to detect the nodules.It classifies the nodules into one of the two classes,i.e.,binary classification(such as nodules or non-nodules).The AC-GAN is the extended version of the conditional GAN that predicts the label of a given image.Three different optimization techniques,adaptive gradient optimi-zation,root mean square propagation optimization,and Adam optimization are employed for optimizing the AC-GAN architecture.The proposed AC-GAN-LCC system is evaluated on the Lung Image Database Consortium(LIDC)data-base Computed Tomography(CT)scan images.The proposed AC-GAN-LCC system classifies∼15000 CT slices(7310 non-nodules and 7685 nodules).It pro-vides an overall accuracy of 98.8%on the LIDC database using Adam optimiza-tion by a 10-fold cross-validation approach.
基金Foundation of Anhui Province Key Laboratory of Physical Geographic Environment(No.2022PGE012)
文摘Accurate boundaries of smallholder farm fields are important and indispensable geo-information that benefits farmers,managers,and policymakers in terms of better managing and utilizing their agricultural resources.Due to their small size,irregular shape,and the use of mixed-cropping techniques,the farm fields of smallholder can be difficult to delineate automatically.In recent years,numerous studies on field contour extraction using a deep Convolutional Neural Network(CNN)have been proposed.However,there is a relative shortage of labeled data for filed boundaries,thus affecting the training effect of CNN.Traditional methods mostly use image flipping,and random rotation for data augmentation.In this paper,we propose to apply Generative Adversarial Network(GAN)for the data augmentation of farm fields label to increase the diversity of samples.Specifically,we propose an automated method featured by Fully Convolutional Neural networks(FCN)in combination with GAN to improve the delineation accuracy of smallholder farms from Very High Resolution(VHR)images.We first investigate four State-Of-The-Art(SOTA)FCN architectures,i.e.,U-Net,PSPNet,SegNet and OCRNet,to find the optimal architecture in the contour detection task of smallholder farm fields.Second,we apply the identified optimal FCN architecture in combination with Contour GAN and pixel2pixel GAN to improve the accuracy of contour detection.We test our method on the study area in the Sudano-Sahelian savanna region of northern Nigeria.The best combination achieved F1 scores of 0.686 on Test Set 1(TS1),0.684 on Test Set 2(TS2),and 0.691 on Test Set 3(TS3).Results indicate that our architecture adapts to a variety of advanced networks and proves its effectiveness in this task.The conceptual,theoretical,and experimental knowledge from this study is expected to seed many GAN-based farm delineation methods in the future.
基金This work is supported by the National Natural Science Foundation of China(Nos.61771154,61603239,61772454,6171101570).
文摘Deep Learning(DL)is such a powerful tool that we have seen tremendous success in areas such as Computer Vision,Speech Recognition,and Natural Language Processing.Since Automated Modulation Classification(AMC)is an important part in Cognitive Radio Networks,we try to explore its potential in solving signal modulation recognition problem.It cannot be overlooked that DL model is a complex model,thus making them prone to over-fitting.DL model requires many training data to combat with over-fitting,but adding high quality labels to training data manually is not always cheap and accessible,especially in real-time system,which may counter unprecedented data in dataset.Semi-supervised Learning is a way to exploit unlabeled data effectively to reduce over-fitting in DL.In this paper,we extend Generative Adversarial Networks(GANs)to the semi-supervised learning will show it is a method can be used to create a more dataefficient classifier.
基金This research is funded by the Centre for Advanced Modeling and Geospatial Information Systems(CAMGIS),Faculty of Engineering and Information Technology,the University of Technology Sydney,Australia.
文摘In recent years,landslide susceptibility mapping has substantially improved with advances in machine learning.However,there are still challenges remain in landslide mapping due to the availability of limited inventory data.In this paper,a novel method that improves the performance of machine learning techniques is presented.The proposed method creates synthetic inventory data using Generative Adversarial Networks(GANs)for improving the prediction of landslides.In this research,landslide inventory data of 156 landslide locations were identified in Cameron Highlands,Malaysia,taken from previous projects the authors worked on.Elevation,slope,aspect,plan curvature,profile curvature,total curvature,lithology,land use and land cover(LULC),distance to the road,distance to the river,stream power index(SPI),sediment transport index(STI),terrain roughness index(TRI),topographic wetness index(TWI)and vegetation density are geo-environmental factors considered in this study based on suggestions from previous works on Cameron Highlands.To show the capability of GANs in improving landslide prediction models,this study tests the proposed GAN model with benchmark models namely Artificial Neural Network(ANN),Support Vector Machine(SVM),Decision Trees(DT),Random Forest(RF)and Bagging ensemble models with ANN and SVM models.These models were validated using the area under the receiver operating characteristic curve(AUROC).The DT,RF,SVM,ANN and Bagging ensemble could achieve the AUROC values of(0.90,0.94,0.86,0.69 and 0.82)for the training;and the AUROC of(0.76,0.81,0.85,0.72 and 0.75)for the test,subsequently.When using additional samples,the same models achieved the AUROC values of(0.92,0.94,0.88,0.75 and 0.84)for the training and(0.78,0.82,0.82,0.78 and 0.80)for the test,respectively.Using the additional samples improved the test accuracy of all the models except SVM.As a result,in data-scarce environments,this research showed that utilizing GANs to generate supplementary samples is promising because it can improve the predictive capability of common landslide prediction models.
基金This paper is partially supported by the British Heart Foundation Accelerator Award,UK(AA\18\3\34220)Royal Society International Exchanges Cost Share Award,UK(RP202G0230)+9 种基金Hope Foundation for Cancer Research,UK(RM60G0680)Medical Research Council Confidence in Concept Award,UK(MC_PC_17171)Sino-UK Industrial Fund,UK(RP202G0289)Global Challenges Research Fund(GCRF),UK(P202PF11)LIAS Pioneering Partnerships Award,UK(P202ED10)Data Science Enhancement Fund,UK(P202RE237)Fight for Sight,UK(24NN201)Sino-UK Education Fund,UK(OP202006)Biotechnology and Biological Sciences Research Council,UK(RM32G0178B8)LIAS Seed Corn,UK(P202RE969).
文摘The topological connectivity information derived from the brain functional network can bring new insights for diagnosing and analyzing dementia disorders.The brain functional network is suitable to bridge the correlation between abnormal connectivities and dementia disorders.However,it is challenging to access considerable amounts of brain functional network data,which hinders the widespread application of data-driven models in dementia diagnosis.In this study,a novel distribution-regularized adversarial graph auto-Encoder(DAGAE)with transformer is proposed to generate new fake brain functional networks to augment the brain functional network dataset,improving the dementia diagnosis accuracy of data-driven models.Specifically,the label distribution is estimated to regularize the latent space learned by the graph encoder,which canmake the learning process stable and the learned representation robust.Also,the transformer generator is devised to map the node representations into node-to-node connections by exploring the long-term dependence of highly-correlated distant brain regions.The typical topological properties and discriminative features can be preserved entirely.Furthermore,the generated brain functional networks improve the prediction performance using different classifiers,which can be applied to analyze other cognitive diseases.Attempts on the Alzheimer’s Disease Neuroimaging Initiative(ADNI)dataset demonstrate that the proposed model can generate good brain functional networks.The classification results show adding generated data can achieve the best accuracy value of 85.33%,sensitivity value of 84.00%,specificity value of 86.67%.The proposed model also achieves superior performance compared with other related augmentedmodels.Overall,the proposedmodel effectively improves cognitive disease diagnosis by generating diverse brain functional networks.
文摘Lithium-ion batteries are key drivers of the renewable energy revolution,bolstered by progress in battery design,modelling,and management.Yet,achieving high-performance battery health prognostics is a significant challenge.With the availability of open data and software,coupled with automated simulations,deep learning has become an integral component of battery health prognostics.We offer a comprehensive overview of potential deep learning techniques specifically designed for modeling and forecasting the dynamics of multiphysics and multiscale battery systems.Following this,we provide a concise summary of publicly available lithium-ion battery test and cycle datasets.By providing illustrative examples,we emphasize the efficacy of five techniques capable of enhancing deep learning for accurate battery state prediction and health-focused management.Each of these techniques offers unique benefits.(1)Transformer models address challenges using self-attention mechanisms and positional encoding methods.(2) Transfer learning improves learning tasks within a target domain by leveraging knowledge from a source domain.(3) Physics-informed learning uses prior knowledge to enhance learning algorithms.(4)Generative adversarial networks(GANs) earn praise for their ability to generate diverse and high-quality outputs,exhibiting outstanding performance with complex datasets.(5) Deep reinforcement learning enables an agent to make optimal decisions through continuous interactions with its environment,thus maximizing cumulative rewards.In this Review,we highlight examples that employ these techniques for battery health prognostics,summarizing both their challenges and opportunities.These methodologies offer promising prospects for researchers and industry professionals,enabling the creation of specialized network architectures that autonomously extract features,especially for long-range spatial-temporal connections across extended timescales.The outcomes could include improved accuracy,faster training,and enhanced generalization.
文摘In the past,sketches were a standard technique used for recognizing offenders and have remained a valuable tool for law enforcement and social security purposes.However,relying on eyewitness observations can lead to discrepancies in the depictions of the sketch,depending on the experience and skills of the sketch artist.With the emergence of modern technologies such as Generative Adversarial Networks(GANs),generating images using verbal and textual cues is now possible,resulting in more accurate sketch depictions.In this study,we propose an adversarial network that generates human facial sketches using such cues provided by an observer.Additionally,we have introduced an Inverse Gamma Correction Technique to improve the training and enhance the quality of the generated sketches.To evaluate the effectiveness of our proposed method,we conducted experiments and analyzed the results using the inception score and Frechet Inception Distance metrics.Our proposed method achieved an overall inception score of 1.438±0.049 and a Frechet Inception Distance of 65.29,outperforming other state-of-the-art techniques.
基金The authors are highly thankful to the Development Research Center of Guangxi Relatively Sparse-populated Minorities(ID:GXRKJSZ201901)to the Natural Science Foundation of Guangxi Province(No.2018GXNSFAA281164)This research was financially supported by the project of outstanding thousand young teachers’training in higher education institutions of Guangxi,Guangxi Colleges and Universities Key Laboratory Breeding Base of System Control and Information Processing.
文摘Single image super resolution(SISR)is an important research content in the field of computer vision and image processing.With the rapid development of deep neural networks,different image super-resolution models have emerged.Compared to some traditional SISR methods,deep learning-based methods can complete the super-resolution tasks through a single image.In addition,compared with the SISR methods using traditional convolutional neural networks,SISR based on generative adversarial networks(GAN)has achieved the most advanced visual performance.In this review,we first explore the challenges faced by SISR and introduce some common datasets and evaluation metrics.Then,we review the improved network structures and loss functions of GAN-based perceptual SISR.Subsequently,the advantages and disadvantages of different networks are analyzed by multiple comparative experiments.Finally,we summarize the paper and look forward to the future development trends of GAN-based perceptual SISR.
基金supported by the National Natural Science Foundation of China under Grants 62072295,61525203,U1636206,U1936214Natural Science Foundation of Shanghai under Grant 19ZR1419000。
文摘Digital watermark embeds information bits into digital cover such as images and videos to prove the creator’s ownership of his work.In this paper,we propose a robust image watermark algorithm based on a generative adversarial network.This model includes two modules,generator and adversary.Generator is mainly used to generate images embedded with watermark,and decode the image damaged by noise to obtain the watermark.Adversary is used to discriminate whether the image is embedded with watermark and damage the image by noise.Based on the model Hidden(hiding data with deep networks),we add a high-pass filter in front of the discriminator,making the watermark tend to be embedded in the mid-frequency region of the image.Since the human visual system pays more attention to the central area of the image,we give a higher weight to the image center region,and a lower weight to the edge region when calculating the loss between cover and embedded image.The watermarked image obtained by this scheme has a better visual performance.Experimental results show that the proposed architecture is more robust against noise interference compared with the state-of-art schemes.
基金the National Key Research and Development Program of China(No.2016YFC0802904)National Natural Science Foundation of China(No.61671470)Natural Science Foundation of Jiangsu Province(BK20161470).
文摘Ground military target recognition plays a crucial role in unmanned equipment and grasping the battlefield dynamics for military applications, but is disturbed by low-resolution and noisyrepresentation. In this paper, a recognition method, involving a novel visual attention mechanismbased Gabor region proposal sub-network(Gabor RPN) and improved refinement generative adversarial sub-network(GAN), is proposed. Novel central-peripheral rivalry 3D color Gabor filters are proposed to simulate retinal structures and taken as feature extraction convolutional kernels in low-level layer to improve the recognition accuracy and framework training efficiency in Gabor RPN. Improved refinement GAN is used to solve the problem of blurry target classification, involving a generator to directly generate large high-resolution images from small blurry ones and a discriminator to distinguish not only real images vs. fake images but also the class of targets. A special recognition dataset for ground military target, named Ground Military Target Dataset(GMTD), is constructed. Experiments performed on the GMTD dataset effectively demonstrate that our method can achieve better energy-saving and recognition results when low-resolution and noisy-representation targets are involved, thus ensuring this algorithm a good engineering application prospect.