A simple and effective image inpainting method is proposed in this paper, which is proved to be suitable for different kinds of target regions with shapes from little scraps to large unseemly objects in a wide range o...A simple and effective image inpainting method is proposed in this paper, which is proved to be suitable for different kinds of target regions with shapes from little scraps to large unseemly objects in a wide range of images. It is an important improvement upon the traditional image inpainting techniques. By introducing a new bijeetive-mapping term into the matching cost function, the artificial repetition problem in the final inpainting image is practically solved. In addition, by adopting an inpainting error map, not only the target pixels are refined gradually during the inpainting process but also the overlapped target patches are combined more seamlessly than previous method. Finally, the inpainting time is dramatically decreased by using a new acceleration method in the matching process.展开更多
Image generation is a hot topic in the academic recently,and has been applied to AI drawing,which can bring Vivid AI paintings without labor costs.In image generation,we represent the image as a random vector,assuming...Image generation is a hot topic in the academic recently,and has been applied to AI drawing,which can bring Vivid AI paintings without labor costs.In image generation,we represent the image as a random vector,assuming that the images of the natural scene obey an unknown distribution,we hope to estimate its distribution through some observation samples.Especially,with the development of GAN(Generative Adversarial Network),The generator and discriminator improve the model capability through adversarial,the quality of the generated image is also increasing.The image quality generated by the existing GAN based image generation model is so well-paint that it can be passed for genuine one.Based on the brief introduction of the concept ofGAN,this paper analyzes themain ideas of image synthesis,studies the representative SOTA GAN based Image synthesis method.展开更多
This paper presents a survey of image synthesis and editing with Generative Adversarial Networks(GANs). GANs consist of two deep networks, a generator and a discriminator, which are trained in a competitive way. Due...This paper presents a survey of image synthesis and editing with Generative Adversarial Networks(GANs). GANs consist of two deep networks, a generator and a discriminator, which are trained in a competitive way. Due to the power of deep networks and the competitive training manner, GANs are capable of producing reasonable and realistic images, and have shown great capability in many image synthesis and editing applications.This paper surveys recent GAN papers regarding topics including, but not limited to, texture synthesis, image inpainting, image-to-image translation, and image editing.展开更多
In many applications of computer graphics,art,and design,it is desirable for a user to provide intuitive non-image input,such as text,sketch,stroke,graph,or layout,and have a computer system automatically generate pho...In many applications of computer graphics,art,and design,it is desirable for a user to provide intuitive non-image input,such as text,sketch,stroke,graph,or layout,and have a computer system automatically generate photo-realistic images according to that input.While classically,works that allow such automatic image content generation have followed a framework of image retrieval and composition,recent advances in deep generative models such as generative adversarial networks(GANs),variational autoencoders(VAEs),and flow-based methods have enabled more powerful and versatile image generation approaches.This paper reviews recent works for image synthesis given intuitive user input,covering advances in input versatility,image generation methodology,benchmark datasets,and evaluation metrics.This motivates new perspectives on input representation and interactivity,cross fertilization between major image generation paradigms,and evaluation and comparison of generation methods.展开更多
Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem.It needs to solve several difficult tasks across the fields of natural language processing...Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem.It needs to solve several difficult tasks across the fields of natural language processing and computer vision.We model it as a combination of semantic entity recognition,object retrieval and recombination,and objects’status optimization.To reach a satisfactory result,we propose a comprehensive pipeline to convert the input text to its visual counterpart.The pipeline includes text processing,foreground objects and background scene retrieval,image synthesis using constrained MCMC,and post-processing.Firstly,we roughly divide the objects parsed from the input text into foreground objects and background scenes.Secondly,we retrieve the required foreground objects from the foreground object dataset segmented from Microsoft COCO dataset,and retrieve an appropriate background scene image from the background image dataset extracted from the Internet.Thirdly,in order to ensure the rationality of foreground objects’positions and sizes in the image synthesis step,we design a cost function and use the Markov Chain Monte Carlo(MCMC)method as the optimizer to solve this constrained layout problem.Finally,to make the image look natural and harmonious,we further use Poisson-based and relighting-based methods to blend foreground objects and background scene image in the post-processing step.The synthesized results and comparison results based on Microsoft COCO dataset prove that our method outperforms some of the state-of-the-art methods based on generative adversarial networks(GANs)in visual quality of generated scene images.展开更多
In recent years,radiotherapy based only on Magnetic Resonance(MR)images has become a hot spot for radiotherapy planning research in the current medical field.However,functional computed tomography(CT)is still needed f...In recent years,radiotherapy based only on Magnetic Resonance(MR)images has become a hot spot for radiotherapy planning research in the current medical field.However,functional computed tomography(CT)is still needed for dose calculation in the clinic.Recent deep-learning approaches to synthesized CT images from MR images have raised much research interest,making radiotherapy based only on MR images possible.In this paper,we proposed a novel unsupervised image synthesis framework with registration networks.This paper aims to enforce the constraints between the reconstructed image and the input image by registering the reconstructed image with the input image and registering the cycle-consistent image with the input image.Furthermore,this paper added ConvNeXt blocks to the network and used large kernel convolutional layers to improve the network’s ability to extract features.This research used the collected head and neck data of 180 patients with nasopharyngeal carcinoma to experiment and evaluate the training model with four evaluation metrics.At the same time,this research made a quantitative comparison of several commonly used model frameworks.We evaluate the model performance in four evaluation metrics which achieve Mean Absolute Error(MAE),Root Mean Square Error(RMSE),Peak Signal-to-Noise Ratio(PSNR),and Structural Similarity(SSIM)are 18.55±1.44,86.91±4.31,33.45±0.74 and 0.960±0.005,respectively.Compared with other methods,MAE decreased by 2.17,RMSE decreased by 7.82,PSNR increased by 0.76,and SSIM increased by 0.011.The results show that the model proposed in this paper outperforms other methods in the quality of image synthesis.The work in this paper is of guiding significance to the study of MR-only radiotherapy planning.展开更多
Intelligent identification of sandstone slice images using deep learning technology is the development trend of mineral identification,and accurate mineral particle segmentation is the most critical step for intellige...Intelligent identification of sandstone slice images using deep learning technology is the development trend of mineral identification,and accurate mineral particle segmentation is the most critical step for intelligent identification.A typical identification model requires many training samples to learn as many distinguishable features as possible.However,limited by the difficulty of data acquisition,the high cost of labeling,and privacy protection,this has led to a sparse sample number and cannot meet the training requirements of deep learning image identification models.In order to increase the number of samples and improve the training effect of deep learning models,this paper proposes a tight sandstone image data augmentation method by combining the advantages of the data deformation method and the data oversampling method in the Putaohua reservoir in the Sanzhao Sag of the Songliao Basin as the target area.First,the Style Generative Adversarial Network(StyleGAN)is improved to generate high-resolution tight sandstone images to improve data diversity.Second,we improve the Automatic Data Augmentation(AutoAugment)algorithm to search for the optimal augmentation strategy to expand the data scale.Finally,we design comparison experiments to demonstrate that this method has obvious advantages in generating image quality and improving the identification effect of deep learning models in real application scenarios.展开更多
Arbitrary style transfer aims to perceptually reflect the style of a reference image in artistic creations with visual aesthetics.Traditional style transfer models,particularly those using adaptive instance normalizat...Arbitrary style transfer aims to perceptually reflect the style of a reference image in artistic creations with visual aesthetics.Traditional style transfer models,particularly those using adaptive instance normalization(AdaIN)layer,rely on global statistics,which often fail to capture the spatially local color distribution,leading to outputs that lack variation despite geometric transformations.To address this,we introduce Patchified AdaIN,a color-inspired style transfer method that applies AdaIN to localized patches,utilizing local statistics to capture the spatial color distribution of the reference image.This approach enables enhanced color awareness in style transfer,adapting dynamically to geometric transformations by leveraging local image statistics.Since Patchified AdaIN builds on AdaIN,it integrates seamlessly into existing frameworks without the need for additional training,allowing users to control the output quality through adjustable blending parameters.Our comprehensive experiments demonstrate that Patchified AdaIN can reflect geometric transformations(e.g.,translation,rotation,flipping)of images for style transfer,thereby achieving superior results compared to state-of-the-art methods.Additional experiments show the compatibility of Patchified AdaIN for integration into existing networks to enable spatial color-aware arbitrary style transfer by replacing the conventional AdaIN layer with the Patchified AdaIN layer.展开更多
Images that are taken underwater mostly present color shift with hazy effects due to the special property of water.Underwater image enhancement methods are proposed to handle this issue.However,their enhancement resul...Images that are taken underwater mostly present color shift with hazy effects due to the special property of water.Underwater image enhancement methods are proposed to handle this issue.However,their enhancement results are only evaluated on a small number of underwater images.The lack of a sufficiently large and diverse dataset for efficient evaluation of underwater image enhancement methods provokes the present paper.The present paper proposes an organized method to synthesize diverse underwater images,which can function as a benchmark dataset.The present synthesis is based on the underwater image formation model,which describes the physical degradation process.The indoor RGB-D image dataset is used as the seed for underwater style image generation.The ambient light is simulated based on the statistical mean value of real-world underwater images.Attenuation coefficients for diverse water types are carefully selected.Finally,in total 14490 underwater images of 10 water types are synthesized.Based on the synthesized database,state-of-the-art image enhancement methods are appropriately evaluated.Besides,the large diverse underwater image database is beneficial in the development of learning-based methods.展开更多
While recent years have witnessed a dramatic upsurge of exploiting deep neural networks toward solving image denoising,existing methods mostly rely on simple noise assumptions,such as additive white Gaussian noise(AWG...While recent years have witnessed a dramatic upsurge of exploiting deep neural networks toward solving image denoising,existing methods mostly rely on simple noise assumptions,such as additive white Gaussian noise(AWGN),JPEG compression noise and camera sensor noise,and a general-purpose blind denoising method for real images remains unsolved.In this paper,we attempt to solve this problem from the perspective of network architecture design and training data synthesis.Specifically,for the network architecture design,we propose a swin-conv block to incorporate the local modeling ability of residual convolutional layer and non-local modeling ability of swin transformer block,and then plug it as the main building block into the widely-used image-to-image translation UNet architecture.For the training data synthesis,we design a practical noise degradation model which takes into consideration different kinds of noise(including Gaussian,Poisson,speckle,JPEG compression,and processed camera sensor noises)and resizing,and also involves a random shuffle strategy and a double degradation strategy.Extensive experiments on AGWN removal and real image denoising demonstrate that the new network architecture design achieves state-of-the-art performance and the new degradation model can help to significantly improve the practicability.We believe our work can provide useful insights into current denoising research.The source code is available at https://github.com/cszn/SCUNet.展开更多
Transformers have dominated the field of natural language processing and have recently made an impact in the area of computer vision.In the field of medical image analysis,transformers have also been successfully used...Transformers have dominated the field of natural language processing and have recently made an impact in the area of computer vision.In the field of medical image analysis,transformers have also been successfully used in to full-stack clinical applications,including image synthesis/reconstruction,registration,segmentation,detection,and diagnosis.This paper aimed to promote awareness of the applications of transformers in medical image analysis.Specifically,we first provided an overview of the core concepts of the attention mechanism built into transformers and other basic components.Second,we reviewed various transformer architectures tailored for medical image applications and discuss their limitations.Within this review,we investigated key challenges including the use of transformers in different learning paradigms,improving model efficiency,and coupling with other techniques.We hope this review would provide a comprehensive picture of transformers to readers with an interest in medical image analysis.展开更多
Recently,the evolution of Generative Adversarial Networks(GANs)has embarked on a journey of revolutionizing the field of artificial and computational intelligence.To improve the generating ability of GANs,various loss...Recently,the evolution of Generative Adversarial Networks(GANs)has embarked on a journey of revolutionizing the field of artificial and computational intelligence.To improve the generating ability of GANs,various loss functions are introduced to measure the degree of similarity between the samples generated by the generator and the real data samples,and the effectiveness of the loss functions in improving the generating ability of GANs.In this paper,we present a detailed survey for the loss functions used in GANs,and provide a critical analysis on the pros and cons of these loss functions.First,the basic theory of GANs along with the training mechanism are introduced.Then,the most commonly used loss functions in GANs are introduced and analyzed.Third,the experimental analyses and comparison of these loss functions are presented in different GAN architectures.Finally,several suggestions on choosing suitable loss functions for image synthesis tasks are given.展开更多
One-shot face reenactment is a challenging task due to the identity mismatch between source and driving faces.Most existing methods fail to completely eliminate the interference of driving subjects’identity informati...One-shot face reenactment is a challenging task due to the identity mismatch between source and driving faces.Most existing methods fail to completely eliminate the interference of driving subjects’identity information,which may lead to face shape distortion and undermine the realism of reenactment results.To solve this problem,in this paper,we propose using a 3D morphable model(3DMM)for explicit facial semantic decomposition and identity disentanglement.Instead of using 3D coefficients alone for reenactment control,we take advantage of the generative ability of 3DMM to render textured face proxies.These proxies contain abundant yet compact geometric and semantic information of human faces,which enables us to compute the face motion field between source and driving images by estimating the dense correspondence.In this way,we can approximate reenactment results by warping source images according to the motion field,and a generative adversarial network(GAN)is adopted to further improve the visual quality of warping results.Extensive experiments on various datasets demonstrate the advantages of the proposed method over existing state-of-the-art benchmarks in both identity preservation and reenactment fulfillment.展开更多
This paper surveys the state-of-the-art of research in patch-based synthesis. Patch-based methods synthesize output images by copying small regions from exemplar imagery. This line of research originated from an area ...This paper surveys the state-of-the-art of research in patch-based synthesis. Patch-based methods synthesize output images by copying small regions from exemplar imagery. This line of research originated from an area called "texture synthesis", which focused on creating regular or semi-regular textures from small exemplars. However, more recently, much research has focused on synthesis of larger and more diverse imagery, such as photos, photo collections, videos, and light fields. Additionally, recent research has focused on customizing the synthesis process for particular problem domains, such as synthesizing artistic or decorative brushes, synthesis of rich materials, and synthesis for 3D fabrication. This report investigates recent papers that follow these themes, with a particular emphasis on papers published since 2009,when the last survey in this area was published. This survey can serve as a tutorial for readers who are not yet familiar with these topics, as well as provide comparisons between these papers, and highlight some open problems in this area.展开更多
Generation of photo-realistic images of human hair is a challenging topic in computer graphics. The difficulty in solving the Problem in this aspect comes mainly from the extremely large number of hairs and the high c...Generation of photo-realistic images of human hair is a challenging topic in computer graphics. The difficulty in solving the Problem in this aspect comes mainly from the extremely large number of hairs and the high complexity of the hair shapes. Regarding to the modeling and rendering of hair-type objects, Kajiya proposed a so-called texel model for producing furry surfaces. However, Kajiya's model could be only used for the generation of short hairs. In this paper, a concise and practical approach is presented to solve the problem of rendering long hairs, and in particular the method of rendering the smooth segmental texels for the generation of long hairs is addressed.展开更多
This paper presents an interactive graphics processing unit (GPU)-based relighting system in which local lighting condition, surface materials and viewing direction can all be changed on the fly. To support these ch...This paper presents an interactive graphics processing unit (GPU)-based relighting system in which local lighting condition, surface materials and viewing direction can all be changed on the fly. To support these changes, we simulate the lighting transportation process at run time, which is normally impractical for interactive use due to its huge computational burden. We greatly alleviate this burden by a hierarchical structure named a transportation tree that clusters similar emitting samples together within a perceptually acceptable error bound. Furthermore, by exploiting the coherence in time as well as in space, we incrementally adjust the clusters rather than computing them from scratch in each frame. With a pre-computed visibility map, we are able to efficiently estimate the indirect illumination in parallel on graphics hardware, by simply summing up the radiance shoots from cluster representatives, plus a small number of operations of merging and splitting on clusters. With relighting based on the time-varying clusters, interactive update of global illumination effects with multi-bounced indirect lighting is demonstrated in applications to material animation and scene decoration.展开更多
Image-based virtual try-on systems have significant commercial value in online garment shopping.However,prior methods fail to appropriately handle details,so are defective in maintaining the original appearance of org...Image-based virtual try-on systems have significant commercial value in online garment shopping.However,prior methods fail to appropriately handle details,so are defective in maintaining the original appearance of organizational items including arms,the neck,and in-shop garments.We propose a novel high fidelity virtual try-on network to generate realistic results.Specifically,a distributed pipeline is used for simultaneous generation of organizational items.First,the in-shop garment is warped using thin plate splines(TPS)to give a coarse shape reference,and then a corresponding target semantic map is generated,which can adaptively respond to the distribution of different items triggered by different garments.Second,organizational items are componentized separately using our novel semantic map-based image adjustment network(SMIAN)to avoid interference between body parts.Finally,all components are integrated to generatethe overall result by SMIAN.A priori dual-modalinformation is incorporated in the tail layers of SMIAN to improve the convergence rate of the network.Experiments demonstrate that the proposed method can retain better details of condition information than current methods.Our method achieves convincing quantitative and qualitative results on existing benchmark datasets.展开更多
基金Supported by the National Natural Science Foundation of China (No. 60403044, No. 60373070) and partly funded by Microsoft Research Asia: Project 2004-Image-01.
文摘A simple and effective image inpainting method is proposed in this paper, which is proved to be suitable for different kinds of target regions with shapes from little scraps to large unseemly objects in a wide range of images. It is an important improvement upon the traditional image inpainting techniques. By introducing a new bijeetive-mapping term into the matching cost function, the artificial repetition problem in the final inpainting image is practically solved. In addition, by adopting an inpainting error map, not only the target pixels are refined gradually during the inpainting process but also the overlapped target patches are combined more seamlessly than previous method. Finally, the inpainting time is dramatically decreased by using a new acceleration method in the matching process.
文摘Image generation is a hot topic in the academic recently,and has been applied to AI drawing,which can bring Vivid AI paintings without labor costs.In image generation,we represent the image as a random vector,assuming that the images of the natural scene obey an unknown distribution,we hope to estimate its distribution through some observation samples.Especially,with the development of GAN(Generative Adversarial Network),The generator and discriminator improve the model capability through adversarial,the quality of the generated image is also increasing.The image quality generated by the existing GAN based image generation model is so well-paint that it can be passed for genuine one.Based on the brief introduction of the concept ofGAN,this paper analyzes themain ideas of image synthesis,studies the representative SOTA GAN based Image synthesis method.
基金supported by the National Key Technology R&D Program(No.2016YFB1001402)the National Natural Science Foundation of China(No.61521002)+2 种基金the Joint NSFC-ISF Research Program(No.61561146393)Research Grant of Beijing Higher Institution Engineering Research Center and Tsinghua-Tencent Joint Laboratory for Internet Innovation Technologysupported by the EPSRC CDE(No.EP/L016540/1)
文摘This paper presents a survey of image synthesis and editing with Generative Adversarial Networks(GANs). GANs consist of two deep networks, a generator and a discriminator, which are trained in a competitive way. Due to the power of deep networks and the competitive training manner, GANs are capable of producing reasonable and realistic images, and have shown great capability in many image synthesis and editing applications.This paper surveys recent GAN papers regarding topics including, but not limited to, texture synthesis, image inpainting, image-to-image translation, and image editing.
基金supported by the National Natural Science Foundation of China(Project Nos.61521002 and 61772298)。
文摘In many applications of computer graphics,art,and design,it is desirable for a user to provide intuitive non-image input,such as text,sketch,stroke,graph,or layout,and have a computer system automatically generate photo-realistic images according to that input.While classically,works that allow such automatic image content generation have followed a framework of image retrieval and composition,recent advances in deep generative models such as generative adversarial networks(GANs),variational autoencoders(VAEs),and flow-based methods have enabled more powerful and versatile image generation approaches.This paper reviews recent works for image synthesis given intuitive user input,covering advances in input versatility,image generation methodology,benchmark datasets,and evaluation metrics.This motivates new perspectives on input representation and interactivity,cross fertilization between major image generation paradigms,and evaluation and comparison of generation methods.
基金supported by the Key Technological Innovation Projects of Hubei Province of China under Grant No.2018AAA062the Wuhan Science and Technology Plan Project of Hubei Province of China under Grant No.2017010201010109,the National Key Research and Development Program of China under Grant No.2017YFB1002600the National Natural Science Foundation of China under Grant Nos.61672390 and 61972298.
文摘Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem.It needs to solve several difficult tasks across the fields of natural language processing and computer vision.We model it as a combination of semantic entity recognition,object retrieval and recombination,and objects’status optimization.To reach a satisfactory result,we propose a comprehensive pipeline to convert the input text to its visual counterpart.The pipeline includes text processing,foreground objects and background scene retrieval,image synthesis using constrained MCMC,and post-processing.Firstly,we roughly divide the objects parsed from the input text into foreground objects and background scenes.Secondly,we retrieve the required foreground objects from the foreground object dataset segmented from Microsoft COCO dataset,and retrieve an appropriate background scene image from the background image dataset extracted from the Internet.Thirdly,in order to ensure the rationality of foreground objects’positions and sizes in the image synthesis step,we design a cost function and use the Markov Chain Monte Carlo(MCMC)method as the optimizer to solve this constrained layout problem.Finally,to make the image look natural and harmonious,we further use Poisson-based and relighting-based methods to blend foreground objects and background scene image in the post-processing step.The synthesized results and comparison results based on Microsoft COCO dataset prove that our method outperforms some of the state-of-the-art methods based on generative adversarial networks(GANs)in visual quality of generated scene images.
基金supported by the National Science Foundation for Young Scientists of China(Grant No.61806060)2019-2021,the Basic and Applied Basic Research Foundation of Guangdong Province(2021A1515220140)the Youth Innovation Project of Sun Yat-sen University Cancer Center(QNYCPY32).
文摘In recent years,radiotherapy based only on Magnetic Resonance(MR)images has become a hot spot for radiotherapy planning research in the current medical field.However,functional computed tomography(CT)is still needed for dose calculation in the clinic.Recent deep-learning approaches to synthesized CT images from MR images have raised much research interest,making radiotherapy based only on MR images possible.In this paper,we proposed a novel unsupervised image synthesis framework with registration networks.This paper aims to enforce the constraints between the reconstructed image and the input image by registering the reconstructed image with the input image and registering the cycle-consistent image with the input image.Furthermore,this paper added ConvNeXt blocks to the network and used large kernel convolutional layers to improve the network’s ability to extract features.This research used the collected head and neck data of 180 patients with nasopharyngeal carcinoma to experiment and evaluate the training model with four evaluation metrics.At the same time,this research made a quantitative comparison of several commonly used model frameworks.We evaluate the model performance in four evaluation metrics which achieve Mean Absolute Error(MAE),Root Mean Square Error(RMSE),Peak Signal-to-Noise Ratio(PSNR),and Structural Similarity(SSIM)are 18.55±1.44,86.91±4.31,33.45±0.74 and 0.960±0.005,respectively.Compared with other methods,MAE decreased by 2.17,RMSE decreased by 7.82,PSNR increased by 0.76,and SSIM increased by 0.011.The results show that the model proposed in this paper outperforms other methods in the quality of image synthesis.The work in this paper is of guiding significance to the study of MR-only radiotherapy planning.
基金This research was funded by the National Natural Science Foundation of China(Project No.42172161)Heilongjiang Provincial Natural Science Foundation of China(Project No.LH2020F003)+1 种基金Heilongjiang Provincial Department of Education Project of China(Project No.UNPYSCT-2020144)Northeast Petroleum University Guided Innovation Fund(2021YDL-12).
文摘Intelligent identification of sandstone slice images using deep learning technology is the development trend of mineral identification,and accurate mineral particle segmentation is the most critical step for intelligent identification.A typical identification model requires many training samples to learn as many distinguishable features as possible.However,limited by the difficulty of data acquisition,the high cost of labeling,and privacy protection,this has led to a sparse sample number and cannot meet the training requirements of deep learning image identification models.In order to increase the number of samples and improve the training effect of deep learning models,this paper proposes a tight sandstone image data augmentation method by combining the advantages of the data deformation method and the data oversampling method in the Putaohua reservoir in the Sanzhao Sag of the Songliao Basin as the target area.First,the Style Generative Adversarial Network(StyleGAN)is improved to generate high-resolution tight sandstone images to improve data diversity.Second,we improve the Automatic Data Augmentation(AutoAugment)algorithm to search for the optimal augmentation strategy to expand the data scale.Finally,we design comparison experiments to demonstrate that this method has obvious advantages in generating image quality and improving the identification effect of deep learning models in real application scenarios.
基金supported by the National Research Foundation of Korea (NRF)grant funded by the Korean government (MSIT) (No.2022R1A2C1004657,Contribution Rate:50%)Culture,Sports and Tourism R&D Program through the Korea Creative Content Agency grant funded by Ministry of Culture Sports and Tourism in 2024 (Project Name:Developing Professionals for R&D in Contents Production Based on Generative Ai and Cloud,Project Number:RS-2024-00352578,Contribution Rate:50%).
文摘Arbitrary style transfer aims to perceptually reflect the style of a reference image in artistic creations with visual aesthetics.Traditional style transfer models,particularly those using adaptive instance normalization(AdaIN)layer,rely on global statistics,which often fail to capture the spatially local color distribution,leading to outputs that lack variation despite geometric transformations.To address this,we introduce Patchified AdaIN,a color-inspired style transfer method that applies AdaIN to localized patches,utilizing local statistics to capture the spatial color distribution of the reference image.This approach enables enhanced color awareness in style transfer,adapting dynamically to geometric transformations by leveraging local image statistics.Since Patchified AdaIN builds on AdaIN,it integrates seamlessly into existing frameworks without the need for additional training,allowing users to control the output quality through adjustable blending parameters.Our comprehensive experiments demonstrate that Patchified AdaIN can reflect geometric transformations(e.g.,translation,rotation,flipping)of images for style transfer,thereby achieving superior results compared to state-of-the-art methods.Additional experiments show the compatibility of Patchified AdaIN for integration into existing networks to enable spatial color-aware arbitrary style transfer by replacing the conventional AdaIN layer with the Patchified AdaIN layer.
文摘Images that are taken underwater mostly present color shift with hazy effects due to the special property of water.Underwater image enhancement methods are proposed to handle this issue.However,their enhancement results are only evaluated on a small number of underwater images.The lack of a sufficiently large and diverse dataset for efficient evaluation of underwater image enhancement methods provokes the present paper.The present paper proposes an organized method to synthesize diverse underwater images,which can function as a benchmark dataset.The present synthesis is based on the underwater image formation model,which describes the physical degradation process.The indoor RGB-D image dataset is used as the seed for underwater style image generation.The ambient light is simulated based on the statistical mean value of real-world underwater images.Attenuation coefficients for diverse water types are carefully selected.Finally,in total 14490 underwater images of 10 water types are synthesized.Based on the synthesized database,state-of-the-art image enhancement methods are appropriately evaluated.Besides,the large diverse underwater image database is beneficial in the development of learning-based methods.
基金This work was partly supported by the ETH Zürich Fund(OK),and by Huawei grants.
文摘While recent years have witnessed a dramatic upsurge of exploiting deep neural networks toward solving image denoising,existing methods mostly rely on simple noise assumptions,such as additive white Gaussian noise(AWGN),JPEG compression noise and camera sensor noise,and a general-purpose blind denoising method for real images remains unsolved.In this paper,we attempt to solve this problem from the perspective of network architecture design and training data synthesis.Specifically,for the network architecture design,we propose a swin-conv block to incorporate the local modeling ability of residual convolutional layer and non-local modeling ability of swin transformer block,and then plug it as the main building block into the widely-used image-to-image translation UNet architecture.For the training data synthesis,we design a practical noise degradation model which takes into consideration different kinds of noise(including Gaussian,Poisson,speckle,JPEG compression,and processed camera sensor noises)and resizing,and also involves a random shuffle strategy and a double degradation strategy.Extensive experiments on AGWN removal and real image denoising demonstrate that the new network architecture design achieves state-of-the-art performance and the new degradation model can help to significantly improve the practicability.We believe our work can provide useful insights into current denoising research.The source code is available at https://github.com/cszn/SCUNet.
基金the National Natural Science Foundation of China(Grant No.62106101)the Natural Science Foundation of Jiangsu Province(Grant No.BK20210180).
文摘Transformers have dominated the field of natural language processing and have recently made an impact in the area of computer vision.In the field of medical image analysis,transformers have also been successfully used in to full-stack clinical applications,including image synthesis/reconstruction,registration,segmentation,detection,and diagnosis.This paper aimed to promote awareness of the applications of transformers in medical image analysis.Specifically,we first provided an overview of the core concepts of the attention mechanism built into transformers and other basic components.Second,we reviewed various transformer architectures tailored for medical image applications and discuss their limitations.Within this review,we investigated key challenges including the use of transformers in different learning paradigms,improving model efficiency,and coupling with other techniques.We hope this review would provide a comprehensive picture of transformers to readers with an interest in medical image analysis.
文摘Recently,the evolution of Generative Adversarial Networks(GANs)has embarked on a journey of revolutionizing the field of artificial and computational intelligence.To improve the generating ability of GANs,various loss functions are introduced to measure the degree of similarity between the samples generated by the generator and the real data samples,and the effectiveness of the loss functions in improving the generating ability of GANs.In this paper,we present a detailed survey for the loss functions used in GANs,and provide a critical analysis on the pros and cons of these loss functions.First,the basic theory of GANs along with the training mechanism are introduced.Then,the most commonly used loss functions in GANs are introduced and analyzed.Third,the experimental analyses and comparison of these loss functions are presented in different GAN architectures.Finally,several suggestions on choosing suitable loss functions for image synthesis tasks are given.
基金supported in part by the Beijing Municipal Natural Science Foundation,China(No.4222054)in part by the National Natural Science Foundation of China(Nos.62276263 and 62076240)the Youth Innovation Promotion Association CAS,China(No.Y2023143).
文摘One-shot face reenactment is a challenging task due to the identity mismatch between source and driving faces.Most existing methods fail to completely eliminate the interference of driving subjects’identity information,which may lead to face shape distortion and undermine the realism of reenactment results.To solve this problem,in this paper,we propose using a 3D morphable model(3DMM)for explicit facial semantic decomposition and identity disentanglement.Instead of using 3D coefficients alone for reenactment control,we take advantage of the generative ability of 3DMM to render textured face proxies.These proxies contain abundant yet compact geometric and semantic information of human faces,which enables us to compute the face motion field between source and driving images by estimating the dense correspondence.In this way,we can approximate reenactment results by warping source images according to the motion field,and a generative adversarial network(GAN)is adopted to further improve the visual quality of warping results.Extensive experiments on various datasets demonstrate the advantages of the proposed method over existing state-of-the-art benchmarks in both identity preservation and reenactment fulfillment.
基金the National Science Foundation for support under Grants CCF 0811493 and CCF 0747220the General Financial Grant from the China Postdoctoral Science Foundation(No.2015M580100)
文摘This paper surveys the state-of-the-art of research in patch-based synthesis. Patch-based methods synthesize output images by copying small regions from exemplar imagery. This line of research originated from an area called "texture synthesis", which focused on creating regular or semi-regular textures from small exemplars. However, more recently, much research has focused on synthesis of larger and more diverse imagery, such as photos, photo collections, videos, and light fields. Additionally, recent research has focused on customizing the synthesis process for particular problem domains, such as synthesizing artistic or decorative brushes, synthesis of rich materials, and synthesis for 3D fabrication. This report investigates recent papers that follow these themes, with a particular emphasis on papers published since 2009,when the last survey in this area was published. This survey can serve as a tutorial for readers who are not yet familiar with these topics, as well as provide comparisons between these papers, and highlight some open problems in this area.
基金the National Natural Science Foundation of China (Nos.69873004,60071002), the National '863' High-Tech Programme of of China (
文摘Generation of photo-realistic images of human hair is a challenging topic in computer graphics. The difficulty in solving the Problem in this aspect comes mainly from the extremely large number of hairs and the high complexity of the hair shapes. Regarding to the modeling and rendering of hair-type objects, Kajiya proposed a so-called texel model for producing furry surfaces. However, Kajiya's model could be only used for the generation of short hairs. In this paper, a concise and practical approach is presented to solve the problem of rendering long hairs, and in particular the method of rendering the smooth segmental texels for the generation of long hairs is addressed.
基金Supported by the National Basic Research Program of China (Grant No. 2009CB320802)the National Natural Science Foundation of China(Grant No. 60833007)+1 种基金the National High-Tech Research & Development Progran of China (Grant No. 2008AA01Z301)the ResearchGrant of the University of Macao
文摘This paper presents an interactive graphics processing unit (GPU)-based relighting system in which local lighting condition, surface materials and viewing direction can all be changed on the fly. To support these changes, we simulate the lighting transportation process at run time, which is normally impractical for interactive use due to its huge computational burden. We greatly alleviate this burden by a hierarchical structure named a transportation tree that clusters similar emitting samples together within a perceptually acceptable error bound. Furthermore, by exploiting the coherence in time as well as in space, we incrementally adjust the clusters rather than computing them from scratch in each frame. With a pre-computed visibility map, we are able to efficiently estimate the indirect illumination in parallel on graphics hardware, by simply summing up the radiance shoots from cluster representatives, plus a small number of operations of merging and splitting on clusters. With relighting based on the time-varying clusters, interactive update of global illumination effects with multi-bounced indirect lighting is demonstrated in applications to material animation and scene decoration.
基金supported by Young Talents Programme of Scientific Research Program of Hubei Education Department(Project No.Q20201709)Research on the Key Technology of Flexible Intelligent Manufacturing of Clothing based on Digital Twin of Hubei Key Research and Development Program(Project No.2021BAA042)Open Topic of Engineering Research Center of Hubei Province for Clothing Information(Project No.900204).
文摘Image-based virtual try-on systems have significant commercial value in online garment shopping.However,prior methods fail to appropriately handle details,so are defective in maintaining the original appearance of organizational items including arms,the neck,and in-shop garments.We propose a novel high fidelity virtual try-on network to generate realistic results.Specifically,a distributed pipeline is used for simultaneous generation of organizational items.First,the in-shop garment is warped using thin plate splines(TPS)to give a coarse shape reference,and then a corresponding target semantic map is generated,which can adaptively respond to the distribution of different items triggered by different garments.Second,organizational items are componentized separately using our novel semantic map-based image adjustment network(SMIAN)to avoid interference between body parts.Finally,all components are integrated to generatethe overall result by SMIAN.A priori dual-modalinformation is incorporated in the tail layers of SMIAN to improve the convergence rate of the network.Experiments demonstrate that the proposed method can retain better details of condition information than current methods.Our method achieves convincing quantitative and qualitative results on existing benchmark datasets.