Measurement of bloodflow velocity is key to understanding physiology and pathology in vivo.While most measurements are performed at the middle of the blood vessel,little research has been done on characterizing the in...Measurement of bloodflow velocity is key to understanding physiology and pathology in vivo.While most measurements are performed at the middle of the blood vessel,little research has been done on characterizing the instantaneous bloodflow velocity distribution.This is mainly due to the lack of measurement technology with high spatial and temporal resolution.Here,we tackle this problem with our recently developed dual-wavelength line-scan third-harmonic generation(THG)imaging technology.Simultaneous acquisition of dual-wavelength THG line-scanning signals enables measurement of bloodflow velocities at two radially symmetric positions in both venules and arterioles in mouse brain in vivo.Our results clearly show that the instantaneous bloodflow velocity is not symmetric under general conditions.展开更多
Generative adversarial networks(GANs)with gaming abilities have been widely applied in image generation.However,gamistic generators and discriminators may reduce the robustness of the obtained GANs in image generation...Generative adversarial networks(GANs)with gaming abilities have been widely applied in image generation.However,gamistic generators and discriminators may reduce the robustness of the obtained GANs in image generation under varying scenes.Enhancing the relation of hierarchical information in a generation network and enlarging differences of different network architectures can facilitate more structural information to improve the generation effect for image generation.In this paper,we propose an enhanced GAN via improving a generator for image generation(EIGGAN).EIGGAN applies a spatial attention to a generator to extract salient information to enhance the truthfulness of the generated images.Taking into relation the context account,parallel residual operations are fused into a generation network to extract more structural information from the different layers.Finally,a mixed loss function in a GAN is exploited to make a tradeoff between speed and accuracy to generate more realistic images.Experimental results show that the proposed method is superior to popular methods,i.e.,Wasserstein GAN with gradient penalty(WGAN-GP)in terms of many indexes,i.e.,Frechet Inception Distance,Learned Perceptual Image Patch Similarity,Multi-Scale Structural Similarity Index Measure,Kernel Inception Distance,Number of Statistically-Different Bins,Inception Score and some visual images for image generation.展开更多
In the context of high compression rates applied to Joint Photographic Experts Group(JPEG)images through lossy compression techniques,image-blocking artifacts may manifest.This necessitates the restoration of the imag...In the context of high compression rates applied to Joint Photographic Experts Group(JPEG)images through lossy compression techniques,image-blocking artifacts may manifest.This necessitates the restoration of the image to its original quality.The challenge lies in regenerating significantly compressed images into a state in which these become identifiable.Therefore,this study focuses on the restoration of JPEG images subjected to substantial degradation caused by maximum lossy compression using Generative Adversarial Networks(GAN).The generator in this network is based on theU-Net architecture.It features a newhourglass structure that preserves the characteristics of the deep layers.In addition,the network incorporates two loss functions to generate natural and high-quality images:Low Frequency(LF)loss and High Frequency(HF)loss.HF loss uses a pretrained VGG-16 network and is configured using a specific layer that best represents features.This can enhance the performance in the high-frequency region.In contrast,LF loss is used to handle the low-frequency region.The two loss functions facilitate the generation of images by the generator,which can mislead the discriminator while accurately generating high-and low-frequency regions.Consequently,by removing the blocking effects frommaximum lossy compressed images,images inwhich identities could be recognized are generated.This study represents a significant improvement over previous research in terms of the image resolution performance.展开更多
Chinese shadow puppetry has been recognized as a world intangible cultural heritage.However,it faces substantial challenges in its preservation and advancement due to the intricate and labor-intensive nature of crafti...Chinese shadow puppetry has been recognized as a world intangible cultural heritage.However,it faces substantial challenges in its preservation and advancement due to the intricate and labor-intensive nature of crafting shadow puppets.To ensure the inheritance and development of this cultural heritage,it is imperative to enable traditional art to flourish in the digital era.This paper presents an Interactive Collaborative Creation System for shadow puppets,designed to facilitate the creation of high-quality shadow puppet images with greater ease.The system comprises four key functions:Image contour extraction,intelligent reference recommendation,generation network,and color adjustment,all aimed at assisting users in various aspects of the creative process,including drawing,inspiration,and content generation.Additionally,we propose an enhanced algorithm called Smooth Generative Adversarial Networks(SmoothGAN),which exhibits more stable gradient training and a greater capacity for generating high-resolution shadow puppet images.Furthermore,we have built a new dataset comprising high-quality shadow puppet images to train the shadow puppet generation model.Both qualitative and quantitative experimental results demonstrate that SmoothGAN significantly improves the quality of image generation,while our system efficiently assists users in creating high-quality shadow puppet images,with a SUS scale score of 84.4.This study provides a valuable theoretical and practical reference for the digital creation of shadow puppet art.展开更多
Near infrared-visible(NIR-VIS)face recognition is to match an NIR face image to a VIS image.The main challenges of NIR-VIS face recognition are the gap caused by cross-modality and the lack of sufficient paired NIR-VI...Near infrared-visible(NIR-VIS)face recognition is to match an NIR face image to a VIS image.The main challenges of NIR-VIS face recognition are the gap caused by cross-modality and the lack of sufficient paired NIR-VIS face images to train models.This paper focuses on the generation of paired NIR-VIS face images and proposes a dual variational generator based on ResNeSt(RS-DVG).RS-DVG can generate a large number of paired NIR-VIS face images from noise,and these generated NIR-VIS face images can be used as the training set together with the real NIR-VIS face images.In addition,a triplet loss function is introduced and a novel triplet selection method is proposed specifically for the training of the current face recognition model,which maximizes the inter-class distance and minimizes the intra-class distance in the input face images.The method proposed in this paper was evaluated on the datasets CASIA NIR-VIS 2.0 and BUAA-VisNir,and relatively good results were obtained.展开更多
A new nonlinear optical third\|harmonic imaging technology in reflected fashion in bio\|tissues by using cascading effect, a process whereby the second\|order effects combine to contribute to a...A new nonlinear optical third\|harmonic imaging technology in reflected fashion in bio\|tissues by using cascading effect, a process whereby the second\|order effects combine to contribute to a third\|order nonlinear process, has been analyzed. The performance of the reflected optical third harmonic imaging enhanced by cascading effect in bio\|tissues is analyzed with the semi\|classical theory. The microscopic understanding of the enhancement of cascaded optical third\|harmonic imaging in reflected manner in bio\|tissues has been discussed.Some i deas for further enhancement is given.展开更多
Artificial intelligence generated content(AIGC)has emerged as an indispensable tool for producing large-scale content in various forms,such as images,thanks to the significant role that AI plays in imitation and produ...Artificial intelligence generated content(AIGC)has emerged as an indispensable tool for producing large-scale content in various forms,such as images,thanks to the significant role that AI plays in imitation and production.However,interpretability and controllability remain challenges.Existing AI methods often face challenges in producing images that are both flexible and controllable while considering causal relationships within the images.To address this issue,we have developed a novel method for causal controllable image generation(CCIG)that combines causal representation learning with bi-directional generative adversarial networks(GANs).This approach enables humans to control image attributes while considering the rationality and interpretability of the generated images and also allows for the generation of counterfactual images.The key of our approach,CCIG,lies in the use of a causal structure learning module to learn the causal relationships between image attributes and joint optimization with the encoder,generator,and joint discriminator in the image generation module.By doing so,we can learn causal representations in image’s latent space and use causal intervention operations to control image generation.We conduct extensive experiments on a real-world dataset,CelebA.The experimental results illustrate the effectiveness of CCIG.展开更多
Medical image generation has recently garnered significant interest among researchers.However,the primary generative models,such as Generative Adversarial Networks(GANs),often encounter challenges during training,incl...Medical image generation has recently garnered significant interest among researchers.However,the primary generative models,such as Generative Adversarial Networks(GANs),often encounter challenges during training,including mode collapse.To address these issues,we proposed the AECOT-GAN model(Autoencoder-based Conditional Optimal Transport Generative Adversarial Network)for the generation of medical images belonging to specific categories.The training process of our model comprises three fundamental components.The training process of our model encompasses three fundamental components.First,we employ an autoencoder model to obtain a low-dimensional manifold representation of real images.Second,we apply extended semi-discrete optimal transport to map Gaussian noise distribution to the latent space distribution and obtain corresponding labels effectively.This procedure leads to the generation of new latent codes with known labels.Finally,we integrate a GAN to train the decoder further to generate medical images.To evaluate the performance of the AE-COT-GAN model,we conducted experiments on two medical image datasets,namely DermaMNIST and BloodMNIST.The model’s performance was compared with state-of-the-art generative models.Results show that the AE-COT-GAN model had excellent performance in generating medical images.Moreover,it effectively addressed the common issues associated with traditional GANs.展开更多
Image paragraph generation aims to generate a long description composed of multiple sentences,which is different from traditional image captioning containing only one sentence.Most of previous methods are dedicated to...Image paragraph generation aims to generate a long description composed of multiple sentences,which is different from traditional image captioning containing only one sentence.Most of previous methods are dedicated to extracting rich features from image regions,and ignore modelling the visual relationships.In this paper,we propose a novel method to generate a paragraph by modelling visual relationships comprehensively.First,we parse an image into a scene graph,where each node represents a specific object and each edge denotes the relationship between two objects.Second,we enrich the object features by implicitly encoding visual relationships through a graph convolutional network(GCN).We further explore high-order relations between different relation features using another graph convolutional network.In addition,we obtain the linguistic features by projecting the predicted object labels and their relationships into a semantic embedding space.With these features,we present an attention-based topic generation network to select relevant features and produce a set of topic vectors,which are then utilized to generate multiple sentences.We evaluate the proposed method on the Stanford image-paragraph dataset which is currently the only available dataset for image paragraph generation,and our method achieves competitive performance in comparison with other state-of-the-art(SOTA)methods.展开更多
With the development of computer graphics,realistic computer graphics(CG)have become more and more common in our field of vision.This rendered image is invisible to the naked eye.How to effectively identify CG and nat...With the development of computer graphics,realistic computer graphics(CG)have become more and more common in our field of vision.This rendered image is invisible to the naked eye.How to effectively identify CG and natural images(NI)has been become a new issue in the field of digital forensics.In recent years,a series of deep learning network frameworks have shown great advantages in the field of images,which provides a good choice for us to solve this problem.This paper aims to track the latest developments and applications of deep learning in the field of CG and NI forensics in a timely manner.Firstly,it introduces the background of deep learning and the knowledge of convolutional neural networks.The purpose is to understand the basic model structure of deep learning applications in the image field,and then outlines the mainstream framework;secondly,it briefly introduces the application of deep learning in CG and NI forensics,and finally points out the problems of deep learning in this field and the prospects for the future.展开更多
We present a new optical microscope in which the light transmitted by a sample-scanned transmission confocal microscope is frequency-tripled by SiOx nanocrystallites in lieu of being transmitted by a confocal pinhole....We present a new optical microscope in which the light transmitted by a sample-scanned transmission confocal microscope is frequency-tripled by SiOx nanocrystallites in lieu of being transmitted by a confocal pinhole. This imaging technique offers an increased contrast and a high scattered light rejection. It is demonstrated that the contrast close to the Sparrow resolution limit is enhanced and the sectioning power are increased with respect to the linear confocal detection mode. An experimental implementation is presented and compared with the conventional linear confocal mode.展开更多
In this paper an evaluation of the influence of luminance L* at the L*a*b* color space during color segmentation is presented. A comparative study is made between the behavior of segmentation in color images using onl...In this paper an evaluation of the influence of luminance L* at the L*a*b* color space during color segmentation is presented. A comparative study is made between the behavior of segmentation in color images using only the Euclidean metric of a* and b* and an adaptive color similarity function defined as a product of Gaussian functions in a modified HSI color space. For the evaluation synthetic images were particularly designed to accurately assess the performance of the color segmentation. The testing system can be used either to explore the behavior of a similarity function (or metric) in different color spaces or to explore different metrics (or similarity functions) in the same color space. From the results is obtained that the color parameters a* and b* are not independent of the luminance parameter L* as one might initially assume.展开更多
As computer graphics technology supports pursuing a photorealistic style,replicated artworks with a photorealistic style overwhelmingly predominate in the computer-generated art circle.Along with the progression of ge...As computer graphics technology supports pursuing a photorealistic style,replicated artworks with a photorealistic style overwhelmingly predominate in the computer-generated art circle.Along with the progression of generative technology,this trend may make generative art a virtual world of photorealistic fake,in which the single criterion of expressive style imperils art into the context of a single boring stereotype.This article focuses on the issue of style diversity and its technical feasibility by artistic experiments of generating flower images in StyleGAN.The author insisted that photo both technology and artistic style should not be confined merely for realistic purposes.This proposition was validated in the GAN generation experiment by changing the training materials.展开更多
The difficulty of bumblebee data collecting and the laborious nature of bumblebee data annotation sometimes result in a lack of training data,which impairs the effectiveness of deep learning based counting methods.Giv...The difficulty of bumblebee data collecting and the laborious nature of bumblebee data annotation sometimes result in a lack of training data,which impairs the effectiveness of deep learning based counting methods.Given that it is challenging to produce the detailed background information in the generated bumblebee images using current data augmentation methods,in this paper,a joint multi-scale convolutional neural network and multi-channel attention based generative adversarial network(MMGAN)is proposed.MMGAN generates the bumblebee image in accordance with the corresponding density map marking the bumblebee positions.Specifically,the multi-scale convolutional neural network(CNN)module utilizes multiple convolution kernels to completely extract features of different scales from the input bumblebee image and density map.To generate various targets in the generated image,the multi-channel attention module builds numerous intermediate generation layers and attention maps.These targets are then stacked to produce a bumblebee image with a specific number of bumblebees.The proposed model obtains the greatest performance in bumblebee image generating tasks,and such generated bumblebee images considerably improve the efficiency of deep learning based counting methods in bumblebee counting applications.展开更多
For traffic object detection in foggy environment based on convolutional neural network(CNN),data sets in fog-free environment are generally used to train the network directly.As a result,the network cannot learn the ...For traffic object detection in foggy environment based on convolutional neural network(CNN),data sets in fog-free environment are generally used to train the network directly.As a result,the network cannot learn the object characteristics in the foggy environment in the training set,and the detection effect is not good.To improve the traffic object detection in foggy environment,we propose a method of generating foggy images on fog-free images from the perspective of data set construction.First,taking the KITTI objection detection data set as an original fog-free image,we generate the depth image of the original image by using improved Monodepth unsupervised depth estimation method.Then,a geometric prior depth template is constructed to fuse the image entropy taken as weight with the depth image.After that,a foggy image is acquired from the depth image based on the atmospheric scattering model.Finally,we take two typical object-detection frameworks,that is,the two-stage object-detection Fster region-based convolutional neural network(Faster-RCNN)and the one-stage object-detection network YOLOv4,to train the original data set,the foggy data set and the mixed data set,respectively.According to the test results on RESIDE-RTTS data set in the outdoor natural foggy environment,the model under the training on the mixed data set shows the best effect.The mean average precision(mAP)values are increased by 5.6%and by 5.0%under the YOLOv4 model and the Faster-RCNN network,respectively.It is proved that the proposed method can effectively improve object identification ability foggy environment.展开更多
We introduce a phase-only hologram generation method based on an integral imaging, and propose an enhancement method in representable depth interval. The computational integral imaging reconstruction method is modifie...We introduce a phase-only hologram generation method based on an integral imaging, and propose an enhancement method in representable depth interval. The computational integral imaging reconstruction method is modified based on optical flow to obtain depth-slice images for the focused objects only. A phaseonly hologram for' multiple plane images is generated using the iterative Fresnel transform algorithm. In addition, a division method in hologram plane is proposed for enhancement in the representable minimum depth interval.展开更多
Prompt learning has attracted broad attention in computer vision since the large pre-trained visionlanguagemodels (VLMs) exploded. Based on the close relationship between vision and language information builtby VLM, p...Prompt learning has attracted broad attention in computer vision since the large pre-trained visionlanguagemodels (VLMs) exploded. Based on the close relationship between vision and language information builtby VLM, prompt learning becomes a crucial technique in many important applications such as artificial intelligencegenerated content (AIGC). In this survey, we provide a progressive and comprehensive review of visual promptlearning as related to AIGC. We begin by introducing VLM, the foundation of visual prompt learning. Then, wereview the vision prompt learning methods and prompt-guided generative models, and discuss how to improve theefficiency of adapting AIGC models to specific downstream tasks. Finally, we provide some promising researchdirections concerning prompt learning.展开更多
A prototype expert system for generating image processing programs using the subroutine pack- age SPIDER is described in this paper.Based on an interactive dialog,the system can generate a complete application program...A prototype expert system for generating image processing programs using the subroutine pack- age SPIDER is described in this paper.Based on an interactive dialog,the system can generate a complete application program using SPIDER routines.展开更多
We propose a systematic analysis of the neglected spectral bias in the frequency domain in this paper.Traditional generative adversarial networks(GANs)try to fulfill the details of images by designing specific network...We propose a systematic analysis of the neglected spectral bias in the frequency domain in this paper.Traditional generative adversarial networks(GANs)try to fulfill the details of images by designing specific network architectures or losses,focusing on generating visually qualitative images.The convolution theorem shows that image processing in the frequency domain is parallelizable and performs better and faster than that in the spatial domain.However,there is little work about discussing the bias of frequency features between the generated images and the real ones.In this paper,we first empirically demonstrate the general distribution bias across datasets and GANs with different sampling methods.Then,we explain the causes of the spectral bias through the deduction that reconsiders the sampling process of the GAN generator.Based on these studies,we provide a low-spectral-bias hybrid generative model to reduce the spectral bias and improve the quality of the generated images.展开更多
Recent diffusion-based AI art platforms can create impressive images from simple text descriptions.This makes them powerful tools for concept design in any discipline that requires creativity in visual design tasks.Th...Recent diffusion-based AI art platforms can create impressive images from simple text descriptions.This makes them powerful tools for concept design in any discipline that requires creativity in visual design tasks.This is also true for early stages of architectural design with multiple stages of ideation,sketching and modelling.In this paper,we investigate how applicable diffusion-based models already are to these tasks.We research the applicability of the platforms Midjourney,DALL·E 2 and Stable Diffusion to a series of common use cases in architectural design to determine which are already solvable or might soon be.Our novel contributions are:(i)a comparison of the capabilities of public AI art platforms;(ii)a specification of the requirements for AI art platforms in supporting common use cases in civil engineering and architecture;(iii)an analysis of 85 million Midjourney queries with Natural Language Processing(NLP)methods to extract common usage patterns.From this we derived(iv)a workflow for creating images for interior designs and(v)a workflow for creating views for exterior design that combines the strengths of the individual platforms.展开更多
基金funded by the National Natural Science Foundation of China(Grant/Award Numbers 62075135 and 61975126)the Science and Technology Innovation Commission of Shenzhen(Grant/Award Numbers JCYJ20190808174819083 and JCYJ20190808175201640)Shenzhen Science and Technology Planning Project(ZDSYS 20210623092006020).
文摘Measurement of bloodflow velocity is key to understanding physiology and pathology in vivo.While most measurements are performed at the middle of the blood vessel,little research has been done on characterizing the instantaneous bloodflow velocity distribution.This is mainly due to the lack of measurement technology with high spatial and temporal resolution.Here,we tackle this problem with our recently developed dual-wavelength line-scan third-harmonic generation(THG)imaging technology.Simultaneous acquisition of dual-wavelength THG line-scanning signals enables measurement of bloodflow velocities at two radially symmetric positions in both venules and arterioles in mouse brain in vivo.Our results clearly show that the instantaneous bloodflow velocity is not symmetric under general conditions.
基金supported in part by the Science and Technology Development Fund,Macao S.A.R(FDCT)0028/2023/RIA1,in part by Leading Talents in Gusu Innovation and Entrepreneurship Grant ZXL2023170in part by the TCL Science and Technology Innovation Fund under Grant D5140240118in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2021A1515110079.
文摘Generative adversarial networks(GANs)with gaming abilities have been widely applied in image generation.However,gamistic generators and discriminators may reduce the robustness of the obtained GANs in image generation under varying scenes.Enhancing the relation of hierarchical information in a generation network and enlarging differences of different network architectures can facilitate more structural information to improve the generation effect for image generation.In this paper,we propose an enhanced GAN via improving a generator for image generation(EIGGAN).EIGGAN applies a spatial attention to a generator to extract salient information to enhance the truthfulness of the generated images.Taking into relation the context account,parallel residual operations are fused into a generation network to extract more structural information from the different layers.Finally,a mixed loss function in a GAN is exploited to make a tradeoff between speed and accuracy to generate more realistic images.Experimental results show that the proposed method is superior to popular methods,i.e.,Wasserstein GAN with gradient penalty(WGAN-GP)in terms of many indexes,i.e.,Frechet Inception Distance,Learned Perceptual Image Patch Similarity,Multi-Scale Structural Similarity Index Measure,Kernel Inception Distance,Number of Statistically-Different Bins,Inception Score and some visual images for image generation.
基金supported by the Technology Development Program(S3344882)funded by the Ministry of SMEs and Startups(MSS,Korea).
文摘In the context of high compression rates applied to Joint Photographic Experts Group(JPEG)images through lossy compression techniques,image-blocking artifacts may manifest.This necessitates the restoration of the image to its original quality.The challenge lies in regenerating significantly compressed images into a state in which these become identifiable.Therefore,this study focuses on the restoration of JPEG images subjected to substantial degradation caused by maximum lossy compression using Generative Adversarial Networks(GAN).The generator in this network is based on theU-Net architecture.It features a newhourglass structure that preserves the characteristics of the deep layers.In addition,the network incorporates two loss functions to generate natural and high-quality images:Low Frequency(LF)loss and High Frequency(HF)loss.HF loss uses a pretrained VGG-16 network and is configured using a specific layer that best represents features.This can enhance the performance in the high-frequency region.In contrast,LF loss is used to handle the low-frequency region.The two loss functions facilitate the generation of images by the generator,which can mislead the discriminator while accurately generating high-and low-frequency regions.Consequently,by removing the blocking effects frommaximum lossy compressed images,images inwhich identities could be recognized are generated.This study represents a significant improvement over previous research in terms of the image resolution performance.
基金supported by the Scientific Research Foundation of Hangzhou City University under Grant No.X-202203the Zhejiang Provincial Natural Science Foundation of China under Grant No.LTGY24F030002.
文摘Chinese shadow puppetry has been recognized as a world intangible cultural heritage.However,it faces substantial challenges in its preservation and advancement due to the intricate and labor-intensive nature of crafting shadow puppets.To ensure the inheritance and development of this cultural heritage,it is imperative to enable traditional art to flourish in the digital era.This paper presents an Interactive Collaborative Creation System for shadow puppets,designed to facilitate the creation of high-quality shadow puppet images with greater ease.The system comprises four key functions:Image contour extraction,intelligent reference recommendation,generation network,and color adjustment,all aimed at assisting users in various aspects of the creative process,including drawing,inspiration,and content generation.Additionally,we propose an enhanced algorithm called Smooth Generative Adversarial Networks(SmoothGAN),which exhibits more stable gradient training and a greater capacity for generating high-resolution shadow puppet images.Furthermore,we have built a new dataset comprising high-quality shadow puppet images to train the shadow puppet generation model.Both qualitative and quantitative experimental results demonstrate that SmoothGAN significantly improves the quality of image generation,while our system efficiently assists users in creating high-quality shadow puppet images,with a SUS scale score of 84.4.This study provides a valuable theoretical and practical reference for the digital creation of shadow puppet art.
基金National Natural Science Foundation of China(No.62006039)National Key Research and Development Program of China(No.2019YFE0190500)。
文摘Near infrared-visible(NIR-VIS)face recognition is to match an NIR face image to a VIS image.The main challenges of NIR-VIS face recognition are the gap caused by cross-modality and the lack of sufficient paired NIR-VIS face images to train models.This paper focuses on the generation of paired NIR-VIS face images and proposes a dual variational generator based on ResNeSt(RS-DVG).RS-DVG can generate a large number of paired NIR-VIS face images from noise,and these generated NIR-VIS face images can be used as the training set together with the real NIR-VIS face images.In addition,a triplet loss function is introduced and a novel triplet selection method is proposed specifically for the training of the current face recognition model,which maximizes the inter-class distance and minimizes the intra-class distance in the input face images.The method proposed in this paper was evaluated on the datasets CASIA NIR-VIS 2.0 and BUAA-VisNir,and relatively good results were obtained.
文摘A new nonlinear optical third\|harmonic imaging technology in reflected fashion in bio\|tissues by using cascading effect, a process whereby the second\|order effects combine to contribute to a third\|order nonlinear process, has been analyzed. The performance of the reflected optical third harmonic imaging enhanced by cascading effect in bio\|tissues is analyzed with the semi\|classical theory. The microscopic understanding of the enhancement of cascaded optical third\|harmonic imaging in reflected manner in bio\|tissues has been discussed.Some i deas for further enhancement is given.
基金Project supported by the National Major Science and Technology Projects of China(No.2022YFB3303302)the National Natural Science Foundation of China(Nos.61977012 and 62207007)the Central Universities Project in China at Chongqing University(Nos.2021CDJYGRH011 and 2020CDJSK06PT14)。
文摘Artificial intelligence generated content(AIGC)has emerged as an indispensable tool for producing large-scale content in various forms,such as images,thanks to the significant role that AI plays in imitation and production.However,interpretability and controllability remain challenges.Existing AI methods often face challenges in producing images that are both flexible and controllable while considering causal relationships within the images.To address this issue,we have developed a novel method for causal controllable image generation(CCIG)that combines causal representation learning with bi-directional generative adversarial networks(GANs).This approach enables humans to control image attributes while considering the rationality and interpretability of the generated images and also allows for the generation of counterfactual images.The key of our approach,CCIG,lies in the use of a causal structure learning module to learn the causal relationships between image attributes and joint optimization with the encoder,generator,and joint discriminator in the image generation module.By doing so,we can learn causal representations in image’s latent space and use causal intervention operations to control image generation.We conduct extensive experiments on a real-world dataset,CelebA.The experimental results illustrate the effectiveness of CCIG.
基金the National Key R&D Program of China under Grant No.2022ZD0117000the National Institutes of Health,United States under award number 3R01LM012434-05S1 and 1R21EB029733-01A1the National Science Foundation,United States under Grant No.FAIN-2115095 and Grant No.CMMI-1762287.
文摘Medical image generation has recently garnered significant interest among researchers.However,the primary generative models,such as Generative Adversarial Networks(GANs),often encounter challenges during training,including mode collapse.To address these issues,we proposed the AECOT-GAN model(Autoencoder-based Conditional Optimal Transport Generative Adversarial Network)for the generation of medical images belonging to specific categories.The training process of our model comprises three fundamental components.The training process of our model encompasses three fundamental components.First,we employ an autoencoder model to obtain a low-dimensional manifold representation of real images.Second,we apply extended semi-discrete optimal transport to map Gaussian noise distribution to the latent space distribution and obtain corresponding labels effectively.This procedure leads to the generation of new latent codes with known labels.Finally,we integrate a GAN to train the decoder further to generate medical images.To evaluate the performance of the AE-COT-GAN model,we conducted experiments on two medical image datasets,namely DermaMNIST and BloodMNIST.The model’s performance was compared with state-of-the-art generative models.Results show that the AE-COT-GAN model had excellent performance in generating medical images.Moreover,it effectively addressed the common issues associated with traditional GANs.
基金supported in part by National Natural Science Foundation of China(Nos.61721004,61976214,62076078 and 62176246).
文摘Image paragraph generation aims to generate a long description composed of multiple sentences,which is different from traditional image captioning containing only one sentence.Most of previous methods are dedicated to extracting rich features from image regions,and ignore modelling the visual relationships.In this paper,we propose a novel method to generate a paragraph by modelling visual relationships comprehensively.First,we parse an image into a scene graph,where each node represents a specific object and each edge denotes the relationship between two objects.Second,we enrich the object features by implicitly encoding visual relationships through a graph convolutional network(GCN).We further explore high-order relations between different relation features using another graph convolutional network.In addition,we obtain the linguistic features by projecting the predicted object labels and their relationships into a semantic embedding space.With these features,we present an attention-based topic generation network to select relevant features and produce a set of topic vectors,which are then utilized to generate multiple sentences.We evaluate the proposed method on the Stanford image-paragraph dataset which is currently the only available dataset for image paragraph generation,and our method achieves competitive performance in comparison with other state-of-the-art(SOTA)methods.
基金supported by National Natural Science Foundation of China(62072250).
文摘With the development of computer graphics,realistic computer graphics(CG)have become more and more common in our field of vision.This rendered image is invisible to the naked eye.How to effectively identify CG and natural images(NI)has been become a new issue in the field of digital forensics.In recent years,a series of deep learning network frameworks have shown great advantages in the field of images,which provides a good choice for us to solve this problem.This paper aims to track the latest developments and applications of deep learning in the field of CG and NI forensics in a timely manner.Firstly,it introduces the background of deep learning and the knowledge of convolutional neural networks.The purpose is to understand the basic model structure of deep learning applications in the image field,and then outlines the mainstream framework;secondly,it briefly introduces the application of deep learning in CG and NI forensics,and finally points out the problems of deep learning in this field and the prospects for the future.
基金The Si0x nanocrystals and clusters were deposited by D. Scuderi, 0. Albert, A. Dos Santos and J. Etchepare at the L0A. We thank Bertrand Reynier, Unité de Mécanique, ENSTA, France, for sample characterization by electron microscopy.
文摘We present a new optical microscope in which the light transmitted by a sample-scanned transmission confocal microscope is frequency-tripled by SiOx nanocrystallites in lieu of being transmitted by a confocal pinhole. This imaging technique offers an increased contrast and a high scattered light rejection. It is demonstrated that the contrast close to the Sparrow resolution limit is enhanced and the sectioning power are increased with respect to the linear confocal detection mode. An experimental implementation is presented and compared with the conventional linear confocal mode.
文摘In this paper an evaluation of the influence of luminance L* at the L*a*b* color space during color segmentation is presented. A comparative study is made between the behavior of segmentation in color images using only the Euclidean metric of a* and b* and an adaptive color similarity function defined as a product of Gaussian functions in a modified HSI color space. For the evaluation synthetic images were particularly designed to accurately assess the performance of the color segmentation. The testing system can be used either to explore the behavior of a similarity function (or metric) in different color spaces or to explore different metrics (or similarity functions) in the same color space. From the results is obtained that the color parameters a* and b* are not independent of the luminance parameter L* as one might initially assume.
文摘As computer graphics technology supports pursuing a photorealistic style,replicated artworks with a photorealistic style overwhelmingly predominate in the computer-generated art circle.Along with the progression of generative technology,this trend may make generative art a virtual world of photorealistic fake,in which the single criterion of expressive style imperils art into the context of a single boring stereotype.This article focuses on the issue of style diversity and its technical feasibility by artistic experiments of generating flower images in StyleGAN.The author insisted that photo both technology and artistic style should not be confined merely for realistic purposes.This proposition was validated in the GAN generation experiment by changing the training materials.
文摘The difficulty of bumblebee data collecting and the laborious nature of bumblebee data annotation sometimes result in a lack of training data,which impairs the effectiveness of deep learning based counting methods.Given that it is challenging to produce the detailed background information in the generated bumblebee images using current data augmentation methods,in this paper,a joint multi-scale convolutional neural network and multi-channel attention based generative adversarial network(MMGAN)is proposed.MMGAN generates the bumblebee image in accordance with the corresponding density map marking the bumblebee positions.Specifically,the multi-scale convolutional neural network(CNN)module utilizes multiple convolution kernels to completely extract features of different scales from the input bumblebee image and density map.To generate various targets in the generated image,the multi-channel attention module builds numerous intermediate generation layers and attention maps.These targets are then stacked to produce a bumblebee image with a specific number of bumblebees.The proposed model obtains the greatest performance in bumblebee image generating tasks,and such generated bumblebee images considerably improve the efficiency of deep learning based counting methods in bumblebee counting applications.
文摘For traffic object detection in foggy environment based on convolutional neural network(CNN),data sets in fog-free environment are generally used to train the network directly.As a result,the network cannot learn the object characteristics in the foggy environment in the training set,and the detection effect is not good.To improve the traffic object detection in foggy environment,we propose a method of generating foggy images on fog-free images from the perspective of data set construction.First,taking the KITTI objection detection data set as an original fog-free image,we generate the depth image of the original image by using improved Monodepth unsupervised depth estimation method.Then,a geometric prior depth template is constructed to fuse the image entropy taken as weight with the depth image.After that,a foggy image is acquired from the depth image based on the atmospheric scattering model.Finally,we take two typical object-detection frameworks,that is,the two-stage object-detection Fster region-based convolutional neural network(Faster-RCNN)and the one-stage object-detection network YOLOv4,to train the original data set,the foggy data set and the mixed data set,respectively.According to the test results on RESIDE-RTTS data set in the outdoor natural foggy environment,the model under the training on the mixed data set shows the best effect.The mean average precision(mAP)values are increased by 5.6%and by 5.0%under the YOLOv4 model and the Faster-RCNN network,respectively.It is proved that the proposed method can effectively improve object identification ability foggy environment.
基金supported by the Brain Korea 21 Program (Information Technology of Seoul National University)
文摘We introduce a phase-only hologram generation method based on an integral imaging, and propose an enhancement method in representable depth interval. The computational integral imaging reconstruction method is modified based on optical flow to obtain depth-slice images for the focused objects only. A phaseonly hologram for' multiple plane images is generated using the iterative Fresnel transform algorithm. In addition, a division method in hologram plane is proposed for enhancement in the representable minimum depth interval.
基金Project supported by the National Natural Science Foundation of China(Nos.62306075 and 62101136)the China Postdoctoral Science Foundation(No.2022TQ0069)+2 种基金the Natural Science Foundation of Shanghai,China(No.21ZR1403600)the Shanghai Municipal of Science and Technology Project,China(No.20JC1419500)the Shanghai Center for Brain Science and Brain-Inspired Technology,China。
文摘Prompt learning has attracted broad attention in computer vision since the large pre-trained visionlanguagemodels (VLMs) exploded. Based on the close relationship between vision and language information builtby VLM, prompt learning becomes a crucial technique in many important applications such as artificial intelligencegenerated content (AIGC). In this survey, we provide a progressive and comprehensive review of visual promptlearning as related to AIGC. We begin by introducing VLM, the foundation of visual prompt learning. Then, wereview the vision prompt learning methods and prompt-guided generative models, and discuss how to improve theefficiency of adapting AIGC models to specific downstream tasks. Finally, we provide some promising researchdirections concerning prompt learning.
文摘A prototype expert system for generating image processing programs using the subroutine pack- age SPIDER is described in this paper.Based on an interactive dialog,the system can generate a complete application program using SPIDER routines.
基金supported in part by the National Key Research and Development Program of China under Grant no.2020YFB1806403.
文摘We propose a systematic analysis of the neglected spectral bias in the frequency domain in this paper.Traditional generative adversarial networks(GANs)try to fulfill the details of images by designing specific network architectures or losses,focusing on generating visually qualitative images.The convolution theorem shows that image processing in the frequency domain is parallelizable and performs better and faster than that in the spatial domain.However,there is little work about discussing the bias of frequency features between the generated images and the real ones.In this paper,we first empirically demonstrate the general distribution bias across datasets and GANs with different sampling methods.Then,we explain the causes of the spectral bias through the deduction that reconsiders the sampling process of the GAN generator.Based on these studies,we provide a low-spectral-bias hybrid generative model to reduce the spectral bias and improve the quality of the generated images.
文摘Recent diffusion-based AI art platforms can create impressive images from simple text descriptions.This makes them powerful tools for concept design in any discipline that requires creativity in visual design tasks.This is also true for early stages of architectural design with multiple stages of ideation,sketching and modelling.In this paper,we investigate how applicable diffusion-based models already are to these tasks.We research the applicability of the platforms Midjourney,DALL·E 2 and Stable Diffusion to a series of common use cases in architectural design to determine which are already solvable or might soon be.Our novel contributions are:(i)a comparison of the capabilities of public AI art platforms;(ii)a specification of the requirements for AI art platforms in supporting common use cases in civil engineering and architecture;(iii)an analysis of 85 million Midjourney queries with Natural Language Processing(NLP)methods to extract common usage patterns.From this we derived(iv)a workflow for creating images for interior designs and(v)a workflow for creating views for exterior design that combines the strengths of the individual platforms.