The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-genera...The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.展开更多
The prediction of fundus fluorescein angiography(FFA)images from fundus structural images is a cutting-edge research topic in ophthalmological image processing.Prediction comprises estimating FFA from fundus camera im...The prediction of fundus fluorescein angiography(FFA)images from fundus structural images is a cutting-edge research topic in ophthalmological image processing.Prediction comprises estimating FFA from fundus camera imaging,single-phase FFA from scanning laser ophthalmoscopy(SLO),and three-phase FFA also from SLO.Although many deep learning models are available,a single model can only perform one or two of these prediction tasks.To accomplish three prediction tasks using a unified method,we propose a unified deep learning model for predicting FFA images from fundus structure images using a supervised generative adversarial network.The three prediction tasks are processed as follows:data preparation,network training under FFA supervision,and FFA image prediction from fundus structure images on a test set.By comparing the FFA images predicted by our model,pix2pix,and CycleGAN,we demonstrate the remarkable progress achieved by our proposal.The high performance of our model is validated in terms of the peak signal-to-noise ratio,structural similarity index,and mean squared error.展开更多
"Qi" (spirit) is originally an ancient Chinese philosophical term regarded as the fundamental material to constitute the universe. Later, it's used as a term in literary criticism referring to the authors' talen..."Qi" (spirit) is originally an ancient Chinese philosophical term regarded as the fundamental material to constitute the universe. Later, it's used as a term in literary criticism referring to the authors' talent, qualities, and their work styles. Poetry, as a language form concentrating thoughts and feelings, is the best form reflecting the various changes of"Qi" (spirit). "Qi" (spirit) combines the style, imagery and charm of a poem in a union, translation of the poem should be close to such a combination. Translation of"Qi" (spirit) is a pursuit of translating poetry which is based on translation of sense and image, but it may focus more on the wording technique with a wide insight on the original poem including the meaning and the tone of the poem and the author's thoughts and feelings, focusing on an overall effect of appreciation.展开更多
In this paper,we propose a framework based deep learning for medical image translation using paired and unpaired training data.Initially,a deep neural network with an encoder-decoder structure is proposed for image-to...In this paper,we propose a framework based deep learning for medical image translation using paired and unpaired training data.Initially,a deep neural network with an encoder-decoder structure is proposed for image-to-image translation using paired training data.A multi-scale context aggregation approach is then used to extract various features from different levels of encoding,which are used during the corresponding network decoding stage.At this point,we further propose an edge-guided generative adversarial network for image-to-image translation based on unpaired training data.An edge constraint loss function is used to improve network performance in tissue boundaries.To analyze framework performance,we conducted five different medical image translation tasks.The assessment demonstrates that the proposed deep learning framework brings significant improvement beyond state-of-the-arts.展开更多
Automatic cell counting provides an effective tool for medical research and diagnosis.Currently,cell counting can be completed by transmitted-light microscope,however,it requires expert knowledge and the counting accu...Automatic cell counting provides an effective tool for medical research and diagnosis.Currently,cell counting can be completed by transmitted-light microscope,however,it requires expert knowledge and the counting accuracy which is unsatisfied for overlapped cells.Further,the image-translation-based detection method has been proposed and the potential has been shown to accomplish cell counting from transmitted-light microscope,automatically and effectively.In this work,a new deep-learning(DL)-based two-stage detection method(cGAN-YOLO)is designed to further enhance the performance of cell counting,which is achieved by combining a DL-based fluorescent image translation model and a DL-based cell detection model.The various results show that cGAN-YOLO can effectively detect and count some different types of cells from the acquired transmitted-light microscope images.Compared with the previously reported YOLO-based one-stage detection method,high recognition accuracy(RA)is achieved by the cGAN-YOLO method,with an improvement of 29.80%.Furthermore,we can also observe that cGAN-YOLO obtains an improvement of 12.11%in RA compared with the previously reported image-translation-based detection method.In a word,cGAN-YOLO makes it possible to implement cell counting directly from the experimental acquired transmitted-light microscopy images with high flexibility and performance,which extends the applicability in clinical research.展开更多
In recent years,radiotherapy based only on Magnetic Resonance(MR)images has become a hot spot for radiotherapy planning research in the current medical field.However,functional computed tomography(CT)is still needed f...In recent years,radiotherapy based only on Magnetic Resonance(MR)images has become a hot spot for radiotherapy planning research in the current medical field.However,functional computed tomography(CT)is still needed for dose calculation in the clinic.Recent deep-learning approaches to synthesized CT images from MR images have raised much research interest,making radiotherapy based only on MR images possible.In this paper,we proposed a novel unsupervised image synthesis framework with registration networks.This paper aims to enforce the constraints between the reconstructed image and the input image by registering the reconstructed image with the input image and registering the cycle-consistent image with the input image.Furthermore,this paper added ConvNeXt blocks to the network and used large kernel convolutional layers to improve the network’s ability to extract features.This research used the collected head and neck data of 180 patients with nasopharyngeal carcinoma to experiment and evaluate the training model with four evaluation metrics.At the same time,this research made a quantitative comparison of several commonly used model frameworks.We evaluate the model performance in four evaluation metrics which achieve Mean Absolute Error(MAE),Root Mean Square Error(RMSE),Peak Signal-to-Noise Ratio(PSNR),and Structural Similarity(SSIM)are 18.55±1.44,86.91±4.31,33.45±0.74 and 0.960±0.005,respectively.Compared with other methods,MAE decreased by 2.17,RMSE decreased by 7.82,PSNR increased by 0.76,and SSIM increased by 0.011.The results show that the model proposed in this paper outperforms other methods in the quality of image synthesis.The work in this paper is of guiding significance to the study of MR-only radiotherapy planning.展开更多
The main challenges in face swapping are the preservation and adaptive superimposition of attributes of two images.In this study,the Face Swapping Attention Network(FSA-Net)is proposed to generate photoreal-istic face...The main challenges in face swapping are the preservation and adaptive superimposition of attributes of two images.In this study,the Face Swapping Attention Network(FSA-Net)is proposed to generate photoreal-istic face swapping.The existing face-swapping methods ignore the blending attributes or mismatch the facial keypoint(cheek,mouth,eye,nose,etc.),which causes artifacts and makes the generated face silhouette non-realistic.To address this problem,a novel reinforced multi-aware attention module,referred to as RMAA,is proposed for handling facial fusion and expression occlusion flaws.The framework includes two stages.In the first stage,a novel attribute encoder is proposed to extract multiple levels of target face attributes and integrate identities and attributes when synthesizing swapped faces.In the second stage,a novel Stochastic Error Refinement(SRE)module is designed to solve the problem of facial occlusion,which is used to repair occlusion regions in a semi-supervised way without any post-processing.The proposed method is then compared with the current state-of-the-art methods.The obtained results demonstrate the qualitative and quantitative outperformance of the proposed method.More details are provided at the footnote link and at https://sites.google.com/view/fsa-net-official.展开更多
Change detection(CD)is becoming indispensable for unmanned aerial vehicles(UAVs),especially in the domain of water landing,rescue and search.However,even the most advanced models require large amounts of data for mode...Change detection(CD)is becoming indispensable for unmanned aerial vehicles(UAVs),especially in the domain of water landing,rescue and search.However,even the most advanced models require large amounts of data for model training and testing.Therefore,sufficient labeled images with different imaging conditions are needed.Inspired by computer graphics,we present a cloning method to simulate inland-water scene and collect an auto-labeled simulated dataset.The simulated dataset consists of six challenges to test the effects of dynamic background,weather,and noise on change detection models.Then,we propose an image translation framework that translates simulated images to synthetic images.This framework uses shared parameters(encoder and generator)and 22×22 receptive fields(discriminator)to generate realistic synthetic images as model training sets.The experimental results indicate that:1)different imaging challenges affect the performance of change detection models;2)compared with simulated images,synthetic images can effectively improve the accuracy of supervised models.展开更多
Fluorescence labeling and imaging provide an opportunity to observe the structure of biological tissues,playing a crucial role in the field of histopathology.However,when labeling and imaging biological tissues,there ...Fluorescence labeling and imaging provide an opportunity to observe the structure of biological tissues,playing a crucial role in the field of histopathology.However,when labeling and imaging biological tissues,there are still some challenges,e.g.,time-consuming tissue preparation steps,expensive reagents,and signal bias due to photobleaching.To overcome these limitations,we present a deep-learning-based method for fluorescence translation of tissue sections,which is achieved by conditional generative adversarial network(cGAN).Experimental results from mouse kidney tissues demonstrate that the proposed method can predict the other types of fluorescence images from one raw fluorescence image,and implement the virtual multi-label fluorescent staining by merging the generated different fluorescence images as well.Moreover,this proposed method can also effectively reduce the time-consuming and laborious preparation in imaging processes,and further saves the cost and time.展开更多
Fluorescence microscopy technology uses fluorescent dyes to provide highly specific visualization of cell components,which plays an important role in understanding the subcellular structure.However,fluorescence micros...Fluorescence microscopy technology uses fluorescent dyes to provide highly specific visualization of cell components,which plays an important role in understanding the subcellular structure.However,fluorescence microscopy has some limitations such as the risk of non-specific cross labeling in multi-labeled fluorescent staining and limited number of fluo-rescence labels due to spectral overlap.This paper proposes a deep learning-based fluorescence to fluorescence[Flu0-Fluo]translation method,which uses a conditional generative adversarial network to predict a fluorescence image from another fluorescence image and further realizes the multi-label fluorescent staining.The cell types used include human motor neurons,human breast cancer cells,rat cortical neurons,and rat cardiomyocytes.The effectiveness of the method is verified by successfully generating virtual fluorescence images highly similar to the true fluorescence images.This study shows that a deep neural network can implement Fluo-Fluo translation and describe the localization relationship between subcellular structures labeled with different fluorescent markers.The proposed Fluo-Fluo method can avoid non-specific cross labeling in multi-label fluorescence staining and is free from spectral overlaps.In theory,an unlimited number of fluorescence images can be predicted from a single fluorescence image to characterize cells.展开更多
This paper presents a survey of image synthesis and editing with Generative Adversarial Networks(GANs). GANs consist of two deep networks, a generator and a discriminator, which are trained in a competitive way. Due...This paper presents a survey of image synthesis and editing with Generative Adversarial Networks(GANs). GANs consist of two deep networks, a generator and a discriminator, which are trained in a competitive way. Due to the power of deep networks and the competitive training manner, GANs are capable of producing reasonable and realistic images, and have shown great capability in many image synthesis and editing applications.This paper surveys recent GAN papers regarding topics including, but not limited to, texture synthesis, image inpainting, image-to-image translation, and image editing.展开更多
Recent studies have shown remarkable success in face image generation task.However,existing approaches have limited diversity,quality and controllability in generating results.To address these issues,we propose a nove...Recent studies have shown remarkable success in face image generation task.However,existing approaches have limited diversity,quality and controllability in generating results.To address these issues,we propose a novel end-to-end learning framework to generate diverse,realistic and controllable face images guided by face masks.The face mask provides a good geometric constraint for a face by specifying the size and location of different components of the face,such as eyes,nose and mouse.The framework consists of four components:style encoder,style decoder,generator and discriminator.The style encoder generates a style code which represents the style of the result face;the generator translate the input face mask into a real face based on the style code;the style decoder learns to reconstruct the style code from the generated face image;and the discriminator classifies an input face image as real or fake.With the style code,the proposed model can generate different face images matching the input face mask,and by manipulating the face mask,we can finely control the generated face image.We empirically demonstrate the effectiveness of our approach on mask guided face image synthesis task.展开更多
基金the National Natural Science Foundation of China(No.61976080)the Academic Degrees&Graduate Education Reform Project of Henan Province(No.2021SJGLX195Y)+1 种基金the Teaching Reform Research and Practice Project of Henan Undergraduate Universities(No.2022SYJXLX008)the Key Project on Research and Practice of Henan University Graduate Education and Teaching Reform(No.YJSJG2023XJ006)。
文摘The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.
基金supported in part by the Gusu Innovation and Entrepreneurship Leading Talents in Suzhou City,grant numbers ZXL2021425 and ZXL2022476Doctor of Innovation and Entrepreneurship Program in Jiangsu Province,grant number JSSCBS20211440+6 种基金Jiangsu Province Key R&D Program,grant number BE2019682Natural Science Foundation of Jiangsu Province,grant number BK20200214National Key R&D Program of China,grant number 2017YFB0403701National Natural Science Foundation of China,grant numbers 61605210,61675226,and 62075235Youth Innovation Promotion Association of Chinese Academy of Sciences,grant number 2019320Frontier Science Research Project of the Chinese Academy of Sciences,grant number QYZDB-SSW-JSC03Strategic Priority Research Program of the Chinese Academy of Sciences,grant number XDB02060000.
文摘The prediction of fundus fluorescein angiography(FFA)images from fundus structural images is a cutting-edge research topic in ophthalmological image processing.Prediction comprises estimating FFA from fundus camera imaging,single-phase FFA from scanning laser ophthalmoscopy(SLO),and three-phase FFA also from SLO.Although many deep learning models are available,a single model can only perform one or two of these prediction tasks.To accomplish three prediction tasks using a unified method,we propose a unified deep learning model for predicting FFA images from fundus structure images using a supervised generative adversarial network.The three prediction tasks are processed as follows:data preparation,network training under FFA supervision,and FFA image prediction from fundus structure images on a test set.By comparing the FFA images predicted by our model,pix2pix,and CycleGAN,we demonstrate the remarkable progress achieved by our proposal.The high performance of our model is validated in terms of the peak signal-to-noise ratio,structural similarity index,and mean squared error.
文摘"Qi" (spirit) is originally an ancient Chinese philosophical term regarded as the fundamental material to constitute the universe. Later, it's used as a term in literary criticism referring to the authors' talent, qualities, and their work styles. Poetry, as a language form concentrating thoughts and feelings, is the best form reflecting the various changes of"Qi" (spirit). "Qi" (spirit) combines the style, imagery and charm of a poem in a union, translation of the poem should be close to such a combination. Translation of"Qi" (spirit) is a pursuit of translating poetry which is based on translation of sense and image, but it may focus more on the wording technique with a wide insight on the original poem including the meaning and the tone of the poem and the author's thoughts and feelings, focusing on an overall effect of appreciation.
文摘In this paper,we propose a framework based deep learning for medical image translation using paired and unpaired training data.Initially,a deep neural network with an encoder-decoder structure is proposed for image-to-image translation using paired training data.A multi-scale context aggregation approach is then used to extract various features from different levels of encoding,which are used during the corresponding network decoding stage.At this point,we further propose an edge-guided generative adversarial network for image-to-image translation based on unpaired training data.An edge constraint loss function is used to improve network performance in tissue boundaries.To analyze framework performance,we conducted five different medical image translation tasks.The assessment demonstrates that the proposed deep learning framework brings significant improvement beyond state-of-the-arts.
基金supported by the National Natural Science Foundation of China under Grant Nos.12274092,61871263,and 12034005partially by the Explorer Program of Shanghai under Grant No.21TS1400200+1 种基金partially by Natural Science Foundation of Shanghai under Grant No.21ZR1405200partially by Medical Engineering Fund of Fudan University under Grant No.YG2022-6.Mengyang Lu and Wei Shi contributed equally to this work.
文摘Automatic cell counting provides an effective tool for medical research and diagnosis.Currently,cell counting can be completed by transmitted-light microscope,however,it requires expert knowledge and the counting accuracy which is unsatisfied for overlapped cells.Further,the image-translation-based detection method has been proposed and the potential has been shown to accomplish cell counting from transmitted-light microscope,automatically and effectively.In this work,a new deep-learning(DL)-based two-stage detection method(cGAN-YOLO)is designed to further enhance the performance of cell counting,which is achieved by combining a DL-based fluorescent image translation model and a DL-based cell detection model.The various results show that cGAN-YOLO can effectively detect and count some different types of cells from the acquired transmitted-light microscope images.Compared with the previously reported YOLO-based one-stage detection method,high recognition accuracy(RA)is achieved by the cGAN-YOLO method,with an improvement of 29.80%.Furthermore,we can also observe that cGAN-YOLO obtains an improvement of 12.11%in RA compared with the previously reported image-translation-based detection method.In a word,cGAN-YOLO makes it possible to implement cell counting directly from the experimental acquired transmitted-light microscopy images with high flexibility and performance,which extends the applicability in clinical research.
基金supported by the National Science Foundation for Young Scientists of China(Grant No.61806060)2019-2021,the Basic and Applied Basic Research Foundation of Guangdong Province(2021A1515220140)the Youth Innovation Project of Sun Yat-sen University Cancer Center(QNYCPY32).
文摘In recent years,radiotherapy based only on Magnetic Resonance(MR)images has become a hot spot for radiotherapy planning research in the current medical field.However,functional computed tomography(CT)is still needed for dose calculation in the clinic.Recent deep-learning approaches to synthesized CT images from MR images have raised much research interest,making radiotherapy based only on MR images possible.In this paper,we proposed a novel unsupervised image synthesis framework with registration networks.This paper aims to enforce the constraints between the reconstructed image and the input image by registering the reconstructed image with the input image and registering the cycle-consistent image with the input image.Furthermore,this paper added ConvNeXt blocks to the network and used large kernel convolutional layers to improve the network’s ability to extract features.This research used the collected head and neck data of 180 patients with nasopharyngeal carcinoma to experiment and evaluate the training model with four evaluation metrics.At the same time,this research made a quantitative comparison of several commonly used model frameworks.We evaluate the model performance in four evaluation metrics which achieve Mean Absolute Error(MAE),Root Mean Square Error(RMSE),Peak Signal-to-Noise Ratio(PSNR),and Structural Similarity(SSIM)are 18.55±1.44,86.91±4.31,33.45±0.74 and 0.960±0.005,respectively.Compared with other methods,MAE decreased by 2.17,RMSE decreased by 7.82,PSNR increased by 0.76,and SSIM increased by 0.011.The results show that the model proposed in this paper outperforms other methods in the quality of image synthesis.The work in this paper is of guiding significance to the study of MR-only radiotherapy planning.
基金supported by the National Natural Science Foundation of China(No.61772179)the Hunan Provincial Natural Science Foundation of China(No.2020JJ4152,No.2022JJ50016)+2 种基金the science and technology innovation Program of Hunan Province(No.2016TP1020)the Scientific Research Fund of Hunan Provincial Education Department(No.21B0649)the Double First-Class University Project of Hunan Province(Xiangjiaotong[2018]469).
文摘The main challenges in face swapping are the preservation and adaptive superimposition of attributes of two images.In this study,the Face Swapping Attention Network(FSA-Net)is proposed to generate photoreal-istic face swapping.The existing face-swapping methods ignore the blending attributes or mismatch the facial keypoint(cheek,mouth,eye,nose,etc.),which causes artifacts and makes the generated face silhouette non-realistic.To address this problem,a novel reinforced multi-aware attention module,referred to as RMAA,is proposed for handling facial fusion and expression occlusion flaws.The framework includes two stages.In the first stage,a novel attribute encoder is proposed to extract multiple levels of target face attributes and integrate identities and attributes when synthesizing swapped faces.In the second stage,a novel Stochastic Error Refinement(SRE)module is designed to solve the problem of facial occlusion,which is used to repair occlusion regions in a semi-supervised way without any post-processing.The proposed method is then compared with the current state-of-the-art methods.The obtained results demonstrate the qualitative and quantitative outperformance of the proposed method.More details are provided at the footnote link and at https://sites.google.com/view/fsa-net-official.
基金supported in part by the Science and Technology Innovation 2030-Key Project of“New Generation Artificial Intelligence”(2018AAA0102303)the Young Elite Scientists Sponsorship Program of China Association of Science and Technology(YESS20210289)+1 种基金the China Postdoctoral Science Foundation(2020TQ1057,2020M682823)the National Natural Science Foundation of China(U20B2071,U1913602,91948204)。
文摘Change detection(CD)is becoming indispensable for unmanned aerial vehicles(UAVs),especially in the domain of water landing,rescue and search.However,even the most advanced models require large amounts of data for model training and testing.Therefore,sufficient labeled images with different imaging conditions are needed.Inspired by computer graphics,we present a cloning method to simulate inland-water scene and collect an auto-labeled simulated dataset.The simulated dataset consists of six challenges to test the effects of dynamic background,weather,and noise on change detection models.Then,we propose an image translation framework that translates simulated images to synthetic images.This framework uses shared parameters(encoder and generator)and 22×22 receptive fields(discriminator)to generate realistic synthetic images as model training sets.The experimental results indicate that:1)different imaging challenges affect the performance of change detection models;2)compared with simulated images,synthetic images can effectively improve the accuracy of supervised models.
基金This work was supported in part by the National Natural Science Foundation of China(61871263,12274092,and 12034005)in part by the Explorer Program of Shanghai(21TS1400200)+1 种基金in part by the Natural Science Foundation of Shanghai(21ZR1405200)in part by the Medical Engineering Fund of Fudan University(YG2022-6).
文摘Fluorescence labeling and imaging provide an opportunity to observe the structure of biological tissues,playing a crucial role in the field of histopathology.However,when labeling and imaging biological tissues,there are still some challenges,e.g.,time-consuming tissue preparation steps,expensive reagents,and signal bias due to photobleaching.To overcome these limitations,we present a deep-learning-based method for fluorescence translation of tissue sections,which is achieved by conditional generative adversarial network(cGAN).Experimental results from mouse kidney tissues demonstrate that the proposed method can predict the other types of fluorescence images from one raw fluorescence image,and implement the virtual multi-label fluorescent staining by merging the generated different fluorescence images as well.Moreover,this proposed method can also effectively reduce the time-consuming and laborious preparation in imaging processes,and further saves the cost and time.
基金This work was supported in part by the National Natural Science Foundation of China(Nos.61871263,12034005,and 11827808)the Natural Science Foundation of Shanghai(Nos.21ZR1405200 and 20S31901300).
文摘Fluorescence microscopy technology uses fluorescent dyes to provide highly specific visualization of cell components,which plays an important role in understanding the subcellular structure.However,fluorescence microscopy has some limitations such as the risk of non-specific cross labeling in multi-labeled fluorescent staining and limited number of fluo-rescence labels due to spectral overlap.This paper proposes a deep learning-based fluorescence to fluorescence[Flu0-Fluo]translation method,which uses a conditional generative adversarial network to predict a fluorescence image from another fluorescence image and further realizes the multi-label fluorescent staining.The cell types used include human motor neurons,human breast cancer cells,rat cortical neurons,and rat cardiomyocytes.The effectiveness of the method is verified by successfully generating virtual fluorescence images highly similar to the true fluorescence images.This study shows that a deep neural network can implement Fluo-Fluo translation and describe the localization relationship between subcellular structures labeled with different fluorescent markers.The proposed Fluo-Fluo method can avoid non-specific cross labeling in multi-label fluorescence staining and is free from spectral overlaps.In theory,an unlimited number of fluorescence images can be predicted from a single fluorescence image to characterize cells.
基金supported by the National Key Technology R&D Program(No.2016YFB1001402)the National Natural Science Foundation of China(No.61521002)+2 种基金the Joint NSFC-ISF Research Program(No.61561146393)Research Grant of Beijing Higher Institution Engineering Research Center and Tsinghua-Tencent Joint Laboratory for Internet Innovation Technologysupported by the EPSRC CDE(No.EP/L016540/1)
文摘This paper presents a survey of image synthesis and editing with Generative Adversarial Networks(GANs). GANs consist of two deep networks, a generator and a discriminator, which are trained in a competitive way. Due to the power of deep networks and the competitive training manner, GANs are capable of producing reasonable and realistic images, and have shown great capability in many image synthesis and editing applications.This paper surveys recent GAN papers regarding topics including, but not limited to, texture synthesis, image inpainting, image-to-image translation, and image editing.
基金This work is supported by the National Key Research and Development Program of China(2018YFF0214700).
文摘Recent studies have shown remarkable success in face image generation task.However,existing approaches have limited diversity,quality and controllability in generating results.To address these issues,we propose a novel end-to-end learning framework to generate diverse,realistic and controllable face images guided by face masks.The face mask provides a good geometric constraint for a face by specifying the size and location of different components of the face,such as eyes,nose and mouse.The framework consists of four components:style encoder,style decoder,generator and discriminator.The style encoder generates a style code which represents the style of the result face;the generator translate the input face mask into a real face based on the style code;the style decoder learns to reconstruct the style code from the generated face image;and the discriminator classifies an input face image as real or fake.With the style code,the proposed model can generate different face images matching the input face mask,and by manipulating the face mask,we can finely control the generated face image.We empirically demonstrate the effectiveness of our approach on mask guided face image synthesis task.