Artificial intelligence generated content(AIGC)has emerged as an indispensable tool for producing large-scale content in various forms,such as images,thanks to the significant role that AI plays in imitation and produ...Artificial intelligence generated content(AIGC)has emerged as an indispensable tool for producing large-scale content in various forms,such as images,thanks to the significant role that AI plays in imitation and production.However,interpretability and controllability remain challenges.Existing AI methods often face challenges in producing images that are both flexible and controllable while considering causal relationships within the images.To address this issue,we have developed a novel method for causal controllable image generation(CCIG)that combines causal representation learning with bi-directional generative adversarial networks(GANs).This approach enables humans to control image attributes while considering the rationality and interpretability of the generated images and also allows for the generation of counterfactual images.The key of our approach,CCIG,lies in the use of a causal structure learning module to learn the causal relationships between image attributes and joint optimization with the encoder,generator,and joint discriminator in the image generation module.By doing so,we can learn causal representations in image’s latent space and use causal intervention operations to control image generation.We conduct extensive experiments on a real-world dataset,CelebA.The experimental results illustrate the effectiveness of CCIG.展开更多
As computer graphics technology supports pursuing a photorealistic style,replicated artworks with a photorealistic style overwhelmingly predominate in the computer-generated art circle.Along with the progression of ge...As computer graphics technology supports pursuing a photorealistic style,replicated artworks with a photorealistic style overwhelmingly predominate in the computer-generated art circle.Along with the progression of generative technology,this trend may make generative art a virtual world of photorealistic fake,in which the single criterion of expressive style imperils art into the context of a single boring stereotype.This article focuses on the issue of style diversity and its technical feasibility by artistic experiments of generating flower images in StyleGAN.The author insisted that photo both technology and artistic style should not be confined merely for realistic purposes.This proposition was validated in the GAN generation experiment by changing the training materials.展开更多
The difficulty of bumblebee data collecting and the laborious nature of bumblebee data annotation sometimes result in a lack of training data,which impairs the effectiveness of deep learning based counting methods.Giv...The difficulty of bumblebee data collecting and the laborious nature of bumblebee data annotation sometimes result in a lack of training data,which impairs the effectiveness of deep learning based counting methods.Given that it is challenging to produce the detailed background information in the generated bumblebee images using current data augmentation methods,in this paper,a joint multi-scale convolutional neural network and multi-channel attention based generative adversarial network(MMGAN)is proposed.MMGAN generates the bumblebee image in accordance with the corresponding density map marking the bumblebee positions.Specifically,the multi-scale convolutional neural network(CNN)module utilizes multiple convolution kernels to completely extract features of different scales from the input bumblebee image and density map.To generate various targets in the generated image,the multi-channel attention module builds numerous intermediate generation layers and attention maps.These targets are then stacked to produce a bumblebee image with a specific number of bumblebees.The proposed model obtains the greatest performance in bumblebee image generating tasks,and such generated bumblebee images considerably improve the efficiency of deep learning based counting methods in bumblebee counting applications.展开更多
In the context of high compression rates applied to Joint Photographic Experts Group(JPEG)images through lossy compression techniques,image-blocking artifacts may manifest.This necessitates the restoration of the imag...In the context of high compression rates applied to Joint Photographic Experts Group(JPEG)images through lossy compression techniques,image-blocking artifacts may manifest.This necessitates the restoration of the image to its original quality.The challenge lies in regenerating significantly compressed images into a state in which these become identifiable.Therefore,this study focuses on the restoration of JPEG images subjected to substantial degradation caused by maximum lossy compression using Generative Adversarial Networks(GAN).The generator in this network is based on theU-Net architecture.It features a newhourglass structure that preserves the characteristics of the deep layers.In addition,the network incorporates two loss functions to generate natural and high-quality images:Low Frequency(LF)loss and High Frequency(HF)loss.HF loss uses a pretrained VGG-16 network and is configured using a specific layer that best represents features.This can enhance the performance in the high-frequency region.In contrast,LF loss is used to handle the low-frequency region.The two loss functions facilitate the generation of images by the generator,which can mislead the discriminator while accurately generating high-and low-frequency regions.Consequently,by removing the blocking effects frommaximum lossy compressed images,images inwhich identities could be recognized are generated.This study represents a significant improvement over previous research in terms of the image resolution performance.展开更多
Measurement of bloodflow velocity is key to understanding physiology and pathology in vivo.While most measurements are performed at the middle of the blood vessel,little research has been done on characterizing the in...Measurement of bloodflow velocity is key to understanding physiology and pathology in vivo.While most measurements are performed at the middle of the blood vessel,little research has been done on characterizing the instantaneous bloodflow velocity distribution.This is mainly due to the lack of measurement technology with high spatial and temporal resolution.Here,we tackle this problem with our recently developed dual-wavelength line-scan third-harmonic generation(THG)imaging technology.Simultaneous acquisition of dual-wavelength THG line-scanning signals enables measurement of bloodflow velocities at two radially symmetric positions in both venules and arterioles in mouse brain in vivo.Our results clearly show that the instantaneous bloodflow velocity is not symmetric under general conditions.展开更多
We propose a systematic analysis of the neglected spectral bias in the frequency domain in this paper.Traditional generative adversarial networks(GANs)try to fulfill the details of images by designing specific network...We propose a systematic analysis of the neglected spectral bias in the frequency domain in this paper.Traditional generative adversarial networks(GANs)try to fulfill the details of images by designing specific network architectures or losses,focusing on generating visually qualitative images.The convolution theorem shows that image processing in the frequency domain is parallelizable and performs better and faster than that in the spatial domain.However,there is little work about discussing the bias of frequency features between the generated images and the real ones.In this paper,we first empirically demonstrate the general distribution bias across datasets and GANs with different sampling methods.Then,we explain the causes of the spectral bias through the deduction that reconsiders the sampling process of the GAN generator.Based on these studies,we provide a low-spectral-bias hybrid generative model to reduce the spectral bias and improve the quality of the generated images.展开更多
Image paragraph generation aims to generate a long description composed of multiple sentences,which is different from traditional image captioning containing only one sentence.Most of previous methods are dedicated to...Image paragraph generation aims to generate a long description composed of multiple sentences,which is different from traditional image captioning containing only one sentence.Most of previous methods are dedicated to extracting rich features from image regions,and ignore modelling the visual relationships.In this paper,we propose a novel method to generate a paragraph by modelling visual relationships comprehensively.First,we parse an image into a scene graph,where each node represents a specific object and each edge denotes the relationship between two objects.Second,we enrich the object features by implicitly encoding visual relationships through a graph convolutional network(GCN).We further explore high-order relations between different relation features using another graph convolutional network.In addition,we obtain the linguistic features by projecting the predicted object labels and their relationships into a semantic embedding space.With these features,we present an attention-based topic generation network to select relevant features and produce a set of topic vectors,which are then utilized to generate multiple sentences.We evaluate the proposed method on the Stanford image-paragraph dataset which is currently the only available dataset for image paragraph generation,and our method achieves competitive performance in comparison with other state-of-the-art(SOTA)methods.展开更多
Near infrared-visible(NIR-VIS)face recognition is to match an NIR face image to a VIS image.The main challenges of NIR-VIS face recognition are the gap caused by cross-modality and the lack of sufficient paired NIR-VI...Near infrared-visible(NIR-VIS)face recognition is to match an NIR face image to a VIS image.The main challenges of NIR-VIS face recognition are the gap caused by cross-modality and the lack of sufficient paired NIR-VIS face images to train models.This paper focuses on the generation of paired NIR-VIS face images and proposes a dual variational generator based on ResNeSt(RS-DVG).RS-DVG can generate a large number of paired NIR-VIS face images from noise,and these generated NIR-VIS face images can be used as the training set together with the real NIR-VIS face images.In addition,a triplet loss function is introduced and a novel triplet selection method is proposed specifically for the training of the current face recognition model,which maximizes the inter-class distance and minimizes the intra-class distance in the input face images.The method proposed in this paper was evaluated on the datasets CASIA NIR-VIS 2.0 and BUAA-VisNir,and relatively good results were obtained.展开更多
With the development of computer graphics,realistic computer graphics(CG)have become more and more common in our field of vision.This rendered image is invisible to the naked eye.How to effectively identify CG and nat...With the development of computer graphics,realistic computer graphics(CG)have become more and more common in our field of vision.This rendered image is invisible to the naked eye.How to effectively identify CG and natural images(NI)has been become a new issue in the field of digital forensics.In recent years,a series of deep learning network frameworks have shown great advantages in the field of images,which provides a good choice for us to solve this problem.This paper aims to track the latest developments and applications of deep learning in the field of CG and NI forensics in a timely manner.Firstly,it introduces the background of deep learning and the knowledge of convolutional neural networks.The purpose is to understand the basic model structure of deep learning applications in the image field,and then outlines the mainstream framework;secondly,it briefly introduces the application of deep learning in CG and NI forensics,and finally points out the problems of deep learning in this field and the prospects for the future.展开更多
In this paper an evaluation of the influence of luminance L* at the L*a*b* color space during color segmentation is presented. A comparative study is made between the behavior of segmentation in color images using onl...In this paper an evaluation of the influence of luminance L* at the L*a*b* color space during color segmentation is presented. A comparative study is made between the behavior of segmentation in color images using only the Euclidean metric of a* and b* and an adaptive color similarity function defined as a product of Gaussian functions in a modified HSI color space. For the evaluation synthetic images were particularly designed to accurately assess the performance of the color segmentation. The testing system can be used either to explore the behavior of a similarity function (or metric) in different color spaces or to explore different metrics (or similarity functions) in the same color space. From the results is obtained that the color parameters a* and b* are not independent of the luminance parameter L* as one might initially assume.展开更多
A new nonlinear optical third\|harmonic imaging technology in reflected fashion in bio\|tissues by using cascading effect, a process whereby the second\|order effects combine to contribute to a...A new nonlinear optical third\|harmonic imaging technology in reflected fashion in bio\|tissues by using cascading effect, a process whereby the second\|order effects combine to contribute to a third\|order nonlinear process, has been analyzed. The performance of the reflected optical third harmonic imaging enhanced by cascading effect in bio\|tissues is analyzed with the semi\|classical theory. The microscopic understanding of the enhancement of cascaded optical third\|harmonic imaging in reflected manner in bio\|tissues has been discussed.Some i deas for further enhancement is given.展开更多
A prototype expert system for generating image processing programs using the subroutine pack- age SPIDER is described in this paper.Based on an interactive dialog,the system can generate a complete application program...A prototype expert system for generating image processing programs using the subroutine pack- age SPIDER is described in this paper.Based on an interactive dialog,the system can generate a complete application program using SPIDER routines.展开更多
Prompt learning has attracted broad attention in computer vision since the large pre-trained visionlanguagemodels (VLMs) exploded. Based on the close relationship between vision and language information builtby VLM, p...Prompt learning has attracted broad attention in computer vision since the large pre-trained visionlanguagemodels (VLMs) exploded. Based on the close relationship between vision and language information builtby VLM, prompt learning becomes a crucial technique in many important applications such as artificial intelligencegenerated content (AIGC). In this survey, we provide a progressive and comprehensive review of visual promptlearning as related to AIGC. We begin by introducing VLM, the foundation of visual prompt learning. Then, wereview the vision prompt learning methods and prompt-guided generative models, and discuss how to improve theefficiency of adapting AIGC models to specific downstream tasks. Finally, we provide some promising researchdirections concerning prompt learning.展开更多
Recent studies have shown remarkable success in face image generation task.However,existing approaches have limited diversity,quality and controllability in generating results.To address these issues,we propose a nove...Recent studies have shown remarkable success in face image generation task.However,existing approaches have limited diversity,quality and controllability in generating results.To address these issues,we propose a novel end-to-end learning framework to generate diverse,realistic and controllable face images guided by face masks.The face mask provides a good geometric constraint for a face by specifying the size and location of different components of the face,such as eyes,nose and mouse.The framework consists of four components:style encoder,style decoder,generator and discriminator.The style encoder generates a style code which represents the style of the result face;the generator translate the input face mask into a real face based on the style code;the style decoder learns to reconstruct the style code from the generated face image;and the discriminator classifies an input face image as real or fake.With the style code,the proposed model can generate different face images matching the input face mask,and by manipulating the face mask,we can finely control the generated face image.We empirically demonstrate the effectiveness of our approach on mask guided face image synthesis task.展开更多
We present a new optical microscope in which the light transmitted by a sample-scanned transmission confocal microscope is frequency-tripled by SiOx nanocrystallites in lieu of being transmitted by a confocal pinhole....We present a new optical microscope in which the light transmitted by a sample-scanned transmission confocal microscope is frequency-tripled by SiOx nanocrystallites in lieu of being transmitted by a confocal pinhole. This imaging technique offers an increased contrast and a high scattered light rejection. It is demonstrated that the contrast close to the Sparrow resolution limit is enhanced and the sectioning power are increased with respect to the linear confocal detection mode. An experimental implementation is presented and compared with the conventional linear confocal mode.展开更多
Full-parallax light-field is captured by a small-scale 3D image scanning system and applied to holographic display. A vertical camera array is scanned horizontally to capture full-parallax imagery, and the vertical vi...Full-parallax light-field is captured by a small-scale 3D image scanning system and applied to holographic display. A vertical camera array is scanned horizontally to capture full-parallax imagery, and the vertical views between cameras are interpolated by depth image-based rendering technique. An improved technique for depth estimation reduces the estimation error and high-density light-field is obtained. The captured data is employed for the calculation of computer hologram using ray-sampling plane. This technique enables high-resolution display even in deep 3D scene although a hologram is calculated from ray information, and thus it makes use of the important advantage of holographic 3D display.展开更多
In this paper, we propose an approach for generating rich fine-grained textual descriptions of images. In particular, we use an LSTM-in-LSTM(long short-term memory) architecture, which consists of an inner LSTM and an...In this paper, we propose an approach for generating rich fine-grained textual descriptions of images. In particular, we use an LSTM-in-LSTM(long short-term memory) architecture, which consists of an inner LSTM and an outer LSTM. The inner LSTM effectively encodes the long-range implicit contextual interaction between visual cues(i.e., the spatiallyconcurrent visual objects), while the outer LSTM generally captures the explicit multi-modal relationship between sentences and images(i.e., the correspondence of sentences and images). This architecture is capable of producing a long description by predicting one word at every time step conditioned on the previously generated word, a hidden vector(via the outer LSTM),and a context vector of fine-grained visual cues(via the inner LSTM). Our model outperforms state-of-theart methods on several benchmark datasets(Flickr8k,Flickr30 k, MSCOCO) when used to generate long rich fine-grained descriptions of given images in terms of four different metrics(BLEU, CIDEr, ROUGE-L, and METEOR).展开更多
We introduce a phase-only hologram generation method based on an integral imaging, and propose an enhancement method in representable depth interval. The computational integral imaging reconstruction method is modifie...We introduce a phase-only hologram generation method based on an integral imaging, and propose an enhancement method in representable depth interval. The computational integral imaging reconstruction method is modified based on optical flow to obtain depth-slice images for the focused objects only. A phaseonly hologram for' multiple plane images is generated using the iterative Fresnel transform algorithm. In addition, a division method in hologram plane is proposed for enhancement in the representable minimum depth interval.展开更多
Improving the generative and representational capabilities of auto-encoders is a hot research topic. However, it is a challenge to jointly and simultaneously optimize the bidirectional mapping between the encoder and ...Improving the generative and representational capabilities of auto-encoders is a hot research topic. However, it is a challenge to jointly and simultaneously optimize the bidirectional mapping between the encoder and the decoder/generator while ensuing convergence. Most existing auto-encoders cannot automatically trade off bidirectional mapping. In this work, we propose Bi-GAE, an unsupervised bidirectional generative auto-encoder based on bidirectional generative adversarial network (BiGAN). First, we introduce two terms that enhance information expansion in decoding to follow human visual models and to improve semantic-relevant feature representation capability in encoding. Furthermore, we embed a generative adversarial network (GAN) to improve representation while ensuring convergence. The experimental results show that Bi-GAE achieves competitive results in both generation and representation with stable convergence. Compared with its counterparts, the representational power of Bi-GAE improves the classification accuracy of high-resolution images by about 8.09%. In addition, Bi-GAE increases structural similarity index measure (SSIM) by 0.045, and decreases Fréchet inception distance (FID) by in the reconstruction of 512*512 images.展开更多
Transformers,the dominant architecture for natural language processing,have also recently attracted much attention from computational visual media researchers due to their capacity for long-range representation and hi...Transformers,the dominant architecture for natural language processing,have also recently attracted much attention from computational visual media researchers due to their capacity for long-range representation and high performance.Transformers are sequence-to-sequence models,which use a selfattention mechanism rather than the RNN sequential structure.Thus,such models can be trained in parallel and can represent global information.This study comprehensively surveys recent visual transformer works.We categorize them according to task scenario:backbone design,high-level vision,low-level vision and generation,and multimodal learning.Their key ideas are also analyzed.Differing from previous surveys,we mainly focus on visual transformer methods in low-level vision and generation.The latest works on backbone design are also reviewed in detail.For ease of understanding,we precisely describe the main contributions of the latest works in the form of tables.As well as giving quantitative comparisons,we also present image results for low-level vision and generation tasks.Computational costs and source code links for various important works are also given in this survey to assist further development.展开更多
基金Project supported by the National Major Science and Technology Projects of China(No.2022YFB3303302)the National Natural Science Foundation of China(Nos.61977012 and 62207007)the Central Universities Project in China at Chongqing University(Nos.2021CDJYGRH011 and 2020CDJSK06PT14)。
文摘Artificial intelligence generated content(AIGC)has emerged as an indispensable tool for producing large-scale content in various forms,such as images,thanks to the significant role that AI plays in imitation and production.However,interpretability and controllability remain challenges.Existing AI methods often face challenges in producing images that are both flexible and controllable while considering causal relationships within the images.To address this issue,we have developed a novel method for causal controllable image generation(CCIG)that combines causal representation learning with bi-directional generative adversarial networks(GANs).This approach enables humans to control image attributes while considering the rationality and interpretability of the generated images and also allows for the generation of counterfactual images.The key of our approach,CCIG,lies in the use of a causal structure learning module to learn the causal relationships between image attributes and joint optimization with the encoder,generator,and joint discriminator in the image generation module.By doing so,we can learn causal representations in image’s latent space and use causal intervention operations to control image generation.We conduct extensive experiments on a real-world dataset,CelebA.The experimental results illustrate the effectiveness of CCIG.
文摘As computer graphics technology supports pursuing a photorealistic style,replicated artworks with a photorealistic style overwhelmingly predominate in the computer-generated art circle.Along with the progression of generative technology,this trend may make generative art a virtual world of photorealistic fake,in which the single criterion of expressive style imperils art into the context of a single boring stereotype.This article focuses on the issue of style diversity and its technical feasibility by artistic experiments of generating flower images in StyleGAN.The author insisted that photo both technology and artistic style should not be confined merely for realistic purposes.This proposition was validated in the GAN generation experiment by changing the training materials.
文摘The difficulty of bumblebee data collecting and the laborious nature of bumblebee data annotation sometimes result in a lack of training data,which impairs the effectiveness of deep learning based counting methods.Given that it is challenging to produce the detailed background information in the generated bumblebee images using current data augmentation methods,in this paper,a joint multi-scale convolutional neural network and multi-channel attention based generative adversarial network(MMGAN)is proposed.MMGAN generates the bumblebee image in accordance with the corresponding density map marking the bumblebee positions.Specifically,the multi-scale convolutional neural network(CNN)module utilizes multiple convolution kernels to completely extract features of different scales from the input bumblebee image and density map.To generate various targets in the generated image,the multi-channel attention module builds numerous intermediate generation layers and attention maps.These targets are then stacked to produce a bumblebee image with a specific number of bumblebees.The proposed model obtains the greatest performance in bumblebee image generating tasks,and such generated bumblebee images considerably improve the efficiency of deep learning based counting methods in bumblebee counting applications.
基金supported by the Technology Development Program(S3344882)funded by the Ministry of SMEs and Startups(MSS,Korea).
文摘In the context of high compression rates applied to Joint Photographic Experts Group(JPEG)images through lossy compression techniques,image-blocking artifacts may manifest.This necessitates the restoration of the image to its original quality.The challenge lies in regenerating significantly compressed images into a state in which these become identifiable.Therefore,this study focuses on the restoration of JPEG images subjected to substantial degradation caused by maximum lossy compression using Generative Adversarial Networks(GAN).The generator in this network is based on theU-Net architecture.It features a newhourglass structure that preserves the characteristics of the deep layers.In addition,the network incorporates two loss functions to generate natural and high-quality images:Low Frequency(LF)loss and High Frequency(HF)loss.HF loss uses a pretrained VGG-16 network and is configured using a specific layer that best represents features.This can enhance the performance in the high-frequency region.In contrast,LF loss is used to handle the low-frequency region.The two loss functions facilitate the generation of images by the generator,which can mislead the discriminator while accurately generating high-and low-frequency regions.Consequently,by removing the blocking effects frommaximum lossy compressed images,images inwhich identities could be recognized are generated.This study represents a significant improvement over previous research in terms of the image resolution performance.
基金funded by the National Natural Science Foundation of China(Grant/Award Numbers 62075135 and 61975126)the Science and Technology Innovation Commission of Shenzhen(Grant/Award Numbers JCYJ20190808174819083 and JCYJ20190808175201640)Shenzhen Science and Technology Planning Project(ZDSYS 20210623092006020).
文摘Measurement of bloodflow velocity is key to understanding physiology and pathology in vivo.While most measurements are performed at the middle of the blood vessel,little research has been done on characterizing the instantaneous bloodflow velocity distribution.This is mainly due to the lack of measurement technology with high spatial and temporal resolution.Here,we tackle this problem with our recently developed dual-wavelength line-scan third-harmonic generation(THG)imaging technology.Simultaneous acquisition of dual-wavelength THG line-scanning signals enables measurement of bloodflow velocities at two radially symmetric positions in both venules and arterioles in mouse brain in vivo.Our results clearly show that the instantaneous bloodflow velocity is not symmetric under general conditions.
基金supported in part by the National Key Research and Development Program of China under Grant no.2020YFB1806403.
文摘We propose a systematic analysis of the neglected spectral bias in the frequency domain in this paper.Traditional generative adversarial networks(GANs)try to fulfill the details of images by designing specific network architectures or losses,focusing on generating visually qualitative images.The convolution theorem shows that image processing in the frequency domain is parallelizable and performs better and faster than that in the spatial domain.However,there is little work about discussing the bias of frequency features between the generated images and the real ones.In this paper,we first empirically demonstrate the general distribution bias across datasets and GANs with different sampling methods.Then,we explain the causes of the spectral bias through the deduction that reconsiders the sampling process of the GAN generator.Based on these studies,we provide a low-spectral-bias hybrid generative model to reduce the spectral bias and improve the quality of the generated images.
基金supported in part by National Natural Science Foundation of China(Nos.61721004,61976214,62076078 and 62176246).
文摘Image paragraph generation aims to generate a long description composed of multiple sentences,which is different from traditional image captioning containing only one sentence.Most of previous methods are dedicated to extracting rich features from image regions,and ignore modelling the visual relationships.In this paper,we propose a novel method to generate a paragraph by modelling visual relationships comprehensively.First,we parse an image into a scene graph,where each node represents a specific object and each edge denotes the relationship between two objects.Second,we enrich the object features by implicitly encoding visual relationships through a graph convolutional network(GCN).We further explore high-order relations between different relation features using another graph convolutional network.In addition,we obtain the linguistic features by projecting the predicted object labels and their relationships into a semantic embedding space.With these features,we present an attention-based topic generation network to select relevant features and produce a set of topic vectors,which are then utilized to generate multiple sentences.We evaluate the proposed method on the Stanford image-paragraph dataset which is currently the only available dataset for image paragraph generation,and our method achieves competitive performance in comparison with other state-of-the-art(SOTA)methods.
基金National Natural Science Foundation of China(No.62006039)National Key Research and Development Program of China(No.2019YFE0190500)。
文摘Near infrared-visible(NIR-VIS)face recognition is to match an NIR face image to a VIS image.The main challenges of NIR-VIS face recognition are the gap caused by cross-modality and the lack of sufficient paired NIR-VIS face images to train models.This paper focuses on the generation of paired NIR-VIS face images and proposes a dual variational generator based on ResNeSt(RS-DVG).RS-DVG can generate a large number of paired NIR-VIS face images from noise,and these generated NIR-VIS face images can be used as the training set together with the real NIR-VIS face images.In addition,a triplet loss function is introduced and a novel triplet selection method is proposed specifically for the training of the current face recognition model,which maximizes the inter-class distance and minimizes the intra-class distance in the input face images.The method proposed in this paper was evaluated on the datasets CASIA NIR-VIS 2.0 and BUAA-VisNir,and relatively good results were obtained.
基金supported by National Natural Science Foundation of China(62072250).
文摘With the development of computer graphics,realistic computer graphics(CG)have become more and more common in our field of vision.This rendered image is invisible to the naked eye.How to effectively identify CG and natural images(NI)has been become a new issue in the field of digital forensics.In recent years,a series of deep learning network frameworks have shown great advantages in the field of images,which provides a good choice for us to solve this problem.This paper aims to track the latest developments and applications of deep learning in the field of CG and NI forensics in a timely manner.Firstly,it introduces the background of deep learning and the knowledge of convolutional neural networks.The purpose is to understand the basic model structure of deep learning applications in the image field,and then outlines the mainstream framework;secondly,it briefly introduces the application of deep learning in CG and NI forensics,and finally points out the problems of deep learning in this field and the prospects for the future.
文摘In this paper an evaluation of the influence of luminance L* at the L*a*b* color space during color segmentation is presented. A comparative study is made between the behavior of segmentation in color images using only the Euclidean metric of a* and b* and an adaptive color similarity function defined as a product of Gaussian functions in a modified HSI color space. For the evaluation synthetic images were particularly designed to accurately assess the performance of the color segmentation. The testing system can be used either to explore the behavior of a similarity function (or metric) in different color spaces or to explore different metrics (or similarity functions) in the same color space. From the results is obtained that the color parameters a* and b* are not independent of the luminance parameter L* as one might initially assume.
文摘A new nonlinear optical third\|harmonic imaging technology in reflected fashion in bio\|tissues by using cascading effect, a process whereby the second\|order effects combine to contribute to a third\|order nonlinear process, has been analyzed. The performance of the reflected optical third harmonic imaging enhanced by cascading effect in bio\|tissues is analyzed with the semi\|classical theory. The microscopic understanding of the enhancement of cascaded optical third\|harmonic imaging in reflected manner in bio\|tissues has been discussed.Some i deas for further enhancement is given.
文摘A prototype expert system for generating image processing programs using the subroutine pack- age SPIDER is described in this paper.Based on an interactive dialog,the system can generate a complete application program using SPIDER routines.
基金Project supported by the National Natural Science Foundation of China(Nos.62306075 and 62101136)the China Postdoctoral Science Foundation(No.2022TQ0069)+2 种基金the Natural Science Foundation of Shanghai,China(No.21ZR1403600)the Shanghai Municipal of Science and Technology Project,China(No.20JC1419500)the Shanghai Center for Brain Science and Brain-Inspired Technology,China。
文摘Prompt learning has attracted broad attention in computer vision since the large pre-trained visionlanguagemodels (VLMs) exploded. Based on the close relationship between vision and language information builtby VLM, prompt learning becomes a crucial technique in many important applications such as artificial intelligencegenerated content (AIGC). In this survey, we provide a progressive and comprehensive review of visual promptlearning as related to AIGC. We begin by introducing VLM, the foundation of visual prompt learning. Then, wereview the vision prompt learning methods and prompt-guided generative models, and discuss how to improve theefficiency of adapting AIGC models to specific downstream tasks. Finally, we provide some promising researchdirections concerning prompt learning.
基金This work is supported by the National Key Research and Development Program of China(2018YFF0214700).
文摘Recent studies have shown remarkable success in face image generation task.However,existing approaches have limited diversity,quality and controllability in generating results.To address these issues,we propose a novel end-to-end learning framework to generate diverse,realistic and controllable face images guided by face masks.The face mask provides a good geometric constraint for a face by specifying the size and location of different components of the face,such as eyes,nose and mouse.The framework consists of four components:style encoder,style decoder,generator and discriminator.The style encoder generates a style code which represents the style of the result face;the generator translate the input face mask into a real face based on the style code;the style decoder learns to reconstruct the style code from the generated face image;and the discriminator classifies an input face image as real or fake.With the style code,the proposed model can generate different face images matching the input face mask,and by manipulating the face mask,we can finely control the generated face image.We empirically demonstrate the effectiveness of our approach on mask guided face image synthesis task.
基金The Si0x nanocrystals and clusters were deposited by D. Scuderi, 0. Albert, A. Dos Santos and J. Etchepare at the L0A. We thank Bertrand Reynier, Unité de Mécanique, ENSTA, France, for sample characterization by electron microscopy.
文摘We present a new optical microscope in which the light transmitted by a sample-scanned transmission confocal microscope is frequency-tripled by SiOx nanocrystallites in lieu of being transmitted by a confocal pinhole. This imaging technique offers an increased contrast and a high scattered light rejection. It is demonstrated that the contrast close to the Sparrow resolution limit is enhanced and the sectioning power are increased with respect to the linear confocal detection mode. An experimental implementation is presented and compared with the conventional linear confocal mode.
基金partly supported by the JSPS Grant-in-Aid for Scientific Research #17300032
文摘Full-parallax light-field is captured by a small-scale 3D image scanning system and applied to holographic display. A vertical camera array is scanned horizontally to capture full-parallax imagery, and the vertical views between cameras are interpolated by depth image-based rendering technique. An improved technique for depth estimation reduces the estimation error and high-density light-field is obtained. The captured data is employed for the calculation of computer hologram using ray-sampling plane. This technique enables high-resolution display even in deep 3D scene although a hologram is calculated from ray information, and thus it makes use of the important advantage of holographic 3D display.
基金supported in part by the National Basic Research Program of China(No.2012CB316400)National Natural Science Foundation of China(Nos.61472353 and 61572431)+2 种基金China Knowledge Centre for Engineering Sciences and Technology,the Fundamental Research Funds for the Central Universities2015 Qianjiang Talents Program of Zhejiang Provincesupported in part by the US NSF(No.CCF1017828)
文摘In this paper, we propose an approach for generating rich fine-grained textual descriptions of images. In particular, we use an LSTM-in-LSTM(long short-term memory) architecture, which consists of an inner LSTM and an outer LSTM. The inner LSTM effectively encodes the long-range implicit contextual interaction between visual cues(i.e., the spatiallyconcurrent visual objects), while the outer LSTM generally captures the explicit multi-modal relationship between sentences and images(i.e., the correspondence of sentences and images). This architecture is capable of producing a long description by predicting one word at every time step conditioned on the previously generated word, a hidden vector(via the outer LSTM),and a context vector of fine-grained visual cues(via the inner LSTM). Our model outperforms state-of-theart methods on several benchmark datasets(Flickr8k,Flickr30 k, MSCOCO) when used to generate long rich fine-grained descriptions of given images in terms of four different metrics(BLEU, CIDEr, ROUGE-L, and METEOR).
基金supported by the Brain Korea 21 Program (Information Technology of Seoul National University)
文摘We introduce a phase-only hologram generation method based on an integral imaging, and propose an enhancement method in representable depth interval. The computational integral imaging reconstruction method is modified based on optical flow to obtain depth-slice images for the focused objects only. A phaseonly hologram for' multiple plane images is generated using the iterative Fresnel transform algorithm. In addition, a division method in hologram plane is proposed for enhancement in the representable minimum depth interval.
基金supported by the Program of Technology Innovation of the Science and Technology Commission of Shanghai Municipality under Grant No.21511104700the Artificial Intelligence Technology Support Project of the Science and Technology Commission of Shanghai Municipality under Grant No.22DZ1100103the Shanghai Informatization Development Special Project under Grant No.202001030.
文摘Improving the generative and representational capabilities of auto-encoders is a hot research topic. However, it is a challenge to jointly and simultaneously optimize the bidirectional mapping between the encoder and the decoder/generator while ensuing convergence. Most existing auto-encoders cannot automatically trade off bidirectional mapping. In this work, we propose Bi-GAE, an unsupervised bidirectional generative auto-encoder based on bidirectional generative adversarial network (BiGAN). First, we introduce two terms that enhance information expansion in decoding to follow human visual models and to improve semantic-relevant feature representation capability in encoding. Furthermore, we embed a generative adversarial network (GAN) to improve representation while ensuring convergence. The experimental results show that Bi-GAE achieves competitive results in both generation and representation with stable convergence. Compared with its counterparts, the representational power of Bi-GAE improves the classification accuracy of high-resolution images by about 8.09%. In addition, Bi-GAE increases structural similarity index measure (SSIM) by 0.045, and decreases Fréchet inception distance (FID) by in the reconstruction of 512*512 images.
基金supported by National Key R&D Program of China under Grant No.2020AAA0106200National Natural Science Foundation of China under Grant Nos.61832016 and U20B2070.
文摘Transformers,the dominant architecture for natural language processing,have also recently attracted much attention from computational visual media researchers due to their capacity for long-range representation and high performance.Transformers are sequence-to-sequence models,which use a selfattention mechanism rather than the RNN sequential structure.Thus,such models can be trained in parallel and can represent global information.This study comprehensively surveys recent visual transformer works.We categorize them according to task scenario:backbone design,high-level vision,low-level vision and generation,and multimodal learning.Their key ideas are also analyzed.Differing from previous surveys,we mainly focus on visual transformer methods in low-level vision and generation.The latest works on backbone design are also reviewed in detail.For ease of understanding,we precisely describe the main contributions of the latest works in the form of tables.As well as giving quantitative comparisons,we also present image results for low-level vision and generation tasks.Computational costs and source code links for various important works are also given in this survey to assist further development.