This paper proposes an artificial intelligence-based robust information hiding algorithm to address the issue of confidential information being susceptible to noise attacks during transmission.The algorithm we designe...This paper proposes an artificial intelligence-based robust information hiding algorithm to address the issue of confidential information being susceptible to noise attacks during transmission.The algorithm we designed aims to mitigate the impact of various noise attacks on the integrity of secret information during transmission.The method we propose involves encoding secret images into stylized encrypted images and applies adversarial transfer to both the style and content features of the original and embedded data.This process effectively enhances the concealment and imperceptibility of confidential information,thereby improving the security of such information during transmission and reducing security risks.Furthermore,we have designed a specialized attack layer to simulate real-world attacks and common noise scenarios encountered in practical environments.Through adversarial training,the algorithm is strengthened to enhance its resilience against attacks and overall robustness,ensuring better protection against potential threats.Experimental results demonstrate that our proposed algorithm successfully enhances the concealment and unknowability of secret information while maintaining embedding capacity.Additionally,it ensures the quality and fidelity of the stego image.The method we propose not only improves the security and robustness of information hiding technology but also holds practical application value in protecting sensitive data and ensuring the invisibility of confidential information.展开更多
Visual illustration transformation from real-world to cartoon images is one of the famous and challenging tasks in computer vision.Image-to-image translation from real-world to cartoon domains poses issues such as a l...Visual illustration transformation from real-world to cartoon images is one of the famous and challenging tasks in computer vision.Image-to-image translation from real-world to cartoon domains poses issues such as a lack of paired training samples,lack of good image translation,low feature extraction from the previous domain images,and lack of high-quality image translation from the traditional generator algorithms.To solve the above-mentioned issues,paired independent model,high-quality dataset,Bayesian-based feature extractor,and an improved generator must be proposed.In this study,we propose a high-quality dataset to reduce the effect of paired training samples on the model’s performance.We use a Bayesian Very Deep Convolutional Network(VGG)-based feature extractor to improve the performance of the standard feature extractor because Bayesian inference regu-larizes weights well.The generator from the Cartoon Generative Adversarial Network(GAN)is modified by introducing a depthwise convolution layer and channel attention mechanism to improve the performance of the original generator.We have used the Fréchet inception distance(FID)score and user preference score to evaluate the performance of the model.The FID scores obtained for the generated cartoon and real-world images are 107 and 76 for the TCC style,and 137 and 57 for the Hayao style,respectively.User preference score is also calculated to evaluate the quality of generated images and our proposed model acquired a high preference score compared to other models.We achieved stunning results in producing high-quality cartoon images,demonstrating the proposed model’s effectiveness in transferring style between authentic images and cartoon images.展开更多
The objective of style transfer is to maintain the content of an image while transferring the style of another image.However,conventional methods face challenges in preserving facial features,especially in Korean port...The objective of style transfer is to maintain the content of an image while transferring the style of another image.However,conventional methods face challenges in preserving facial features,especially in Korean portraits where elements like the“Gat”(a traditional Korean hat)are prevalent.This paper proposes a deep learning network designed to perform style transfer that includes the“Gat”while preserving the identity of the face.Unlike traditional style transfer techniques,the proposed method aims to preserve the texture,attire,and the“Gat”in the style image by employing image sharpening and face landmark,with the GAN.The color,texture,and intensity were extracted differently based on the characteristics of each block and layer of the pre-trained VGG-16,and only the necessary elements during training were preserved using a facial landmark mask.The head area was presented using the eyebrow area to transfer the“Gat”.Furthermore,the identity of the face was retained,and style correlation was considered based on the Gram matrix.To evaluate performance,we introduced a metric using PSNR and SSIM,with an emphasis on median values through new weightings for style transfer in Korean portraits.Additionally,we have conducted a survey that evaluated the content,style,and naturalness of the transferred results,and based on the assessment,we can confidently conclude that our method to maintain the integrity of content surpasses the previous research.Our approach,enriched by landmarks preservation and diverse loss functions,including those related to“Gat”,outperformed previous researches in facial identity preservation.展开更多
In recent years,deep generative models have been successfully applied to perform artistic painting style transfer(APST).The difficulties might lie in the loss of reconstructing spatial details and the inefficiency of ...In recent years,deep generative models have been successfully applied to perform artistic painting style transfer(APST).The difficulties might lie in the loss of reconstructing spatial details and the inefficiency of model convergence caused by the irreversible en-decoder methodology of the existing models.Aiming to this,this paper proposes a Flow-based architecture with both the en-decoder sharing a reversible network configuration.The proposed APST-Flow can efficiently reduce model uncertainty via a compact analysis-synthesis methodology,thereby the generalization performance and the convergence stability are improved.For the generator,a Flow-based network using Wavelet additive coupling(WAC)layers is implemented to extract multi-scale content features.Also,a style checker is used to enhance the global style consistency by minimizing the error between the reconstructed and the input images.To enhance the generated salient details,a loss of adaptive stroke edge is applied in both the global and local model training.The experimental results show that the proposed method improves PSNR by 5%,SSIM by 6.2%,and decreases Style Error by 29.4%over the existing models on the ChipPhi set.The competitive results verify that APST-Flow achieves high-quality generation with less content deviation and enhanced generalization,thereby can be further applied to more APST scenes.展开更多
In recent years,speech synthesis systems have allowed for the produc-tion of very high-quality voices.Therefore,research in this domain is now turning to the problem of integrating emotions into speech.However,the met...In recent years,speech synthesis systems have allowed for the produc-tion of very high-quality voices.Therefore,research in this domain is now turning to the problem of integrating emotions into speech.However,the method of con-structing a speech synthesizer for each emotion has some limitations.First,this method often requires an emotional-speech data set with many sentences.Such data sets are very time-intensive and labor-intensive to complete.Second,training each of these models requires computers with large computational capabilities and a lot of effort and time for model tuning.In addition,each model for each emotion failed to take advantage of data sets of other emotions.In this paper,we propose a new method to synthesize emotional speech in which the latent expressions of emotions are learned from a small data set of professional actors through a Flow-tron model.In addition,we provide a new method to build a speech corpus that is scalable and whose quality is easy to control.Next,to produce a high-quality speech synthesis model,we used this data set to train the Tacotron 2 model.We used it as a pre-trained model to train the Flowtron model.We applied this method to synthesize Vietnamese speech with sadness and happiness.Mean opi-nion score(MOS)assessment results show that MOS is 3.61 for sadness and 3.95 for happiness.In conclusion,the proposed method proves to be more effec-tive for a high degree of automation and fast emotional sentence generation,using a small emotional-speech data set.展开更多
With the advent of deep learning,self-driving schemes based on deep learning are becoming more and more popular.Robust perception-action models should learn from data with different scenarios and real behaviors,while ...With the advent of deep learning,self-driving schemes based on deep learning are becoming more and more popular.Robust perception-action models should learn from data with different scenarios and real behaviors,while current end-to-end model learning is generally limited to training of massive data,innovation of deep network architecture,and learning in-situ model in a simulation environment.Therefore,we introduce a new image style transfer method into data augmentation,and improve the diversity of limited data by changing the texture,contrast ratio and color of the image,and then it is extended to the scenarios that the model has been unobserved before.Inspired by rapid style transfer and artistic style neural algorithms,we propose an arbitrary style generation network architecture,including style transfer network,style learning network,style loss network and multivariate Gaussian distribution function.The style embedding vector is randomly sampled from the multivariate Gaussian distribution and linearly interpolated with the embedded vector predicted by the input image on the style learning network,which provides a set of normalization constants for the style transfer network,and finally realizes the diversity of the image style.In order to verify the effectiveness of the method,image classification and simulation experiments were performed separately.Finally,we built a small-sized smart car experiment platform,and apply the data augmentation technology based on image style transfer drive to the experiment of automatic driving for the first time.The experimental results show that:(1)The proposed scheme can improve the prediction accuracy of the end-to-end model and reduce the model’s error accumulation;(2)the method based on image style transfer provides a new scheme for data augmentation technology,and also provides a solution for the high cost that many deep models rely heavily on a large number of label data.展开更多
Color transfer between images uses the statistics information of image effectively. We present a novel approach of local color transfer between images based on the simple statistics and locally linear embedding. A ske...Color transfer between images uses the statistics information of image effectively. We present a novel approach of local color transfer between images based on the simple statistics and locally linear embedding. A sketching interface is proposed for quickly and easily specifying the color correspondences between target and source image. The user can specify the corre- spondences of local region using scribes, which more accurately transfers the target color to the source image while smoothly preserving the boundaries, and exhibits more natural output results. Our algorithm is not restricted to one-to-one image color transfer and can make use of more than one target images to transfer the color in different regions in the source image. Moreover, our algorithm does not require to choose the same color style and image size between source and target images. We propose the sub-sampling to reduce the computational load. Comparing with other approaches, our algorithm is much better in color blending in the input data. Our approach preserves the other color details in the source image. Various experimental results show that our approach specifies the correspondences of local color region in source and target images. And it expresses the intention of users and generates more actual and natural results of visual effect.展开更多
The performance and accuracy of computer vision systems are affected by noise in different forms.Although numerous solutions and algorithms have been presented for dealing with every type of noise,a comprehensive tech...The performance and accuracy of computer vision systems are affected by noise in different forms.Although numerous solutions and algorithms have been presented for dealing with every type of noise,a comprehensive technique that can cover all the diverse noises and mitigate their damaging effects on the performance and precision of various systems is still missing.In this paper,we have focused on the stability and robustness of one computer vision branch(i.e.,visual object tracking).We have demonstrated that,without imposing a heavy computational load on a model or changing its algorithms,the drop in the performance and accuracy of a system when it is exposed to an unseen noise-laden test dataset can be prevented by simply applying the style transfer technique on the train dataset and training the model with a combination of these and the original untrained data.To verify our proposed approach,it is applied on a generic object tracker by using regression networks.This method’s validity is confirmed by testing it on an exclusive benchmark comprising 50 image sequences,with each sequence containing 15 types of noise at five different intensity levels.The OPE curves obtained show a 40%increase in the robustness of the proposed object tracker against noise,compared to the other trackers considered.展开更多
The technology for image-to-image style transfer(a prevalent image processing task)has developed rapidly.The purpose of style transfer is to extract a texture from the source image domain and transfer it to the target...The technology for image-to-image style transfer(a prevalent image processing task)has developed rapidly.The purpose of style transfer is to extract a texture from the source image domain and transfer it to the target image domain using a deep neural network.However,the existing methods typically have a large computational cost.To achieve efficient style transfer,we introduce a novel Ghost module into the GANILLA architecture to produce more feature maps from cheap operations.Then we utilize an attention mechanism to transform images with various styles.We optimize the original generative adversarial network(GAN)by using more efficient calculation methods for image-to-illustration translation.The experimental results show that our proposed method is similar to human vision and still maintains the quality of the image.Moreover,our proposed method overcomes the high computational cost and high computational resource consumption for style transfer.By comparing the results of subjective and objective evaluation indicators,our proposed method has shown superior performance over existing methods.展开更多
Aiming at the problems of image semantic content distortion and blurred foreground and background boundaries during the transfer process of convolutional neural image stylization,we propose a convolutional neural arti...Aiming at the problems of image semantic content distortion and blurred foreground and background boundaries during the transfer process of convolutional neural image stylization,we propose a convolutional neural artistic stylization algorithm for suppressing image distortion.Firstly,the VGG-19 network model is used to extract the feature map from the input content image and style image and to reconstruct the content and style.Then the transfer of the input content image and style image to the output image is constrained in the local affine transformation of the color space.And the Laplacian matting matrix is constructed by combining the local affine of the input image RGB channel.For each output blocks,affine transformation maps the RGB value of the input image to the corresponding output and position,which realizes the constraint of semantic content and the control of spatial layout.Finally,the synthesized image is superimposed on the white noise image and updated iteratively with the back propagation algorithm to minimize the loss function to complete the image stylization.Experimental results show that the method can generate images with obvious foreground and background edges,clear texture,restrained semantic content-distortion,realized spatial constraint and color mapping of the transfer images,and made the stylized images visually satisfactory.展开更多
The complex geometric features of subsurface fractures at different scales makes mesh generation challenging and/or expensive.In this paper,we make use of neural style transfer(NST),a machine learning technique,to gen...The complex geometric features of subsurface fractures at different scales makes mesh generation challenging and/or expensive.In this paper,we make use of neural style transfer(NST),a machine learning technique,to generate mesh from rock fracture images.In this new approach,we use digital rock fractures at multiple scales that represent’content’and define uniformly shaped and sized triangles to represent’style’.The 19-layer convolutional neural network(CNN)learns the content from the rock image,including lower-level features(such as edges and corners)and higher-level features(such as rock,fractures,or other mineral fillings),and learns the style from the triangular grids.By optimizing the cost function to achieve approximation to represent both the content and the style,numerical meshes can be generated and optimized.We utilize the NST to generate meshes for rough fractures with asperities formed in rock,a network of fractures embedded in rock,and a sand aggregate with multiple grains.Based on the examples,we show that this new NST technique can make mesh generation and optimization much more efficient by achieving a good balance between the density of the mesh and the presentation of the geometric features.Finally,we discuss future applications of this approach and perspectives of applying machine learning to bridge the gaps between numerical modeling and experiments.展开更多
Most of the existing virtual scenarios built for the digital protection of Chinese classical private gardens are too modern in expression style to show the aesthetic significance of their historical period.Considering...Most of the existing virtual scenarios built for the digital protection of Chinese classical private gardens are too modern in expression style to show the aesthetic significance of their historical period.Considering the aesthetic commonality between traditional Chinese landscape paintings and classical private gardens and referring to image style transfer,here,a deep neural network was proposed to transfer the aesthetic style from landscape paintings to the virtual scenario of classical private gardens.The network consisted of two parts:style prediction and style transfer.The style prediction network was used to obtain style representation from style paintings,and the style transfer network was used to transfer style representation to the content scenario.The pre-trained network was then embedded into the scenario rendering pipeline and combined with the screen post-processing method to realise the stylised expression of the virtual scenario.To verify the feasibility of this methodology,a virtual scenario of the Humble Administrator’s Garden was used as the content scenario andfive garden landscape paintings from different time periods and painting styles were selected for the case study.The results demonstrated that this methodology could effectively achieve the aesthetic style transfer of a virtual scenario.展开更多
To generate dance that temporally and aesthetically matches the music is a challenging problem in three aspects.First,the generated motion should be beats-aligned to the local musical features.Second,the global aesthe...To generate dance that temporally and aesthetically matches the music is a challenging problem in three aspects.First,the generated motion should be beats-aligned to the local musical features.Second,the global aesthetic style should be matched between motion and music.And third,the generated motion should be diverse and non-self-repeating.To address these challenges,we propose ReChoreoNet,which re-choreographs high-quality dance motion for a given piece of music.A data-driven learning strategy is proposed to efficiently correlate the temporal connections between music and motion in a progressively learned cross-modality embedding space.The beats-aligned content motion will be subsequently used as autoregressive context and control signal to control a normalizing-flow model,which transfers the style of a prototype motion to the final generated dance.In addition,we present an aesthetically labelled music-dance repertoire(MDR)for both efficient learning of the cross-modality embedding,and understanding of the aesthetic connections between music and motion.We demonstrate that our repertoire-based framework is robustly extensible in both content and style.Both quantitative and qualitative experiments have been carried out to validate the efficiency of our proposed model.展开更多
The standard approach to tackling computer vision problems is to train deep convolutional neural network(CNN)models using large-scale image datasets that are representative of the target task.However,in many scenarios...The standard approach to tackling computer vision problems is to train deep convolutional neural network(CNN)models using large-scale image datasets that are representative of the target task.However,in many scenarios,it is often challenging to obtain sufficient image data for the target task.Data augmentation is a way to mitigate this challenge.A common practice is to explicitly transform existing images in desired ways to create the required volume and variability of training data necessary to achieve good generalization performance.In situations where data for the target domain are not accessible,a viable workaround is to synthesize training data from scratch,i.e.,synthetic data augmentation.This paper presents an extensive review of synthetic data augmentation techniques.It covers data synthesis approaches based on realistic 3D graphics modelling,neural style transfer(NST),differential neural rendering,and generative modelling using generative adversarial networks(GANs)and variational autoencoders(VAEs).For each of these classes of methods,we focus on the important data generation and augmentation techniques,general scope of application and specific use-cases,as well as existing limitations and possible workarounds.Additionally,we provide a summary of common synthetic datasets for training computer vision models,highlighting the main features,application domains and supported tasks.Finally,we discuss the effectiveness of synthetic data augmentation methods.Since this is the first paper to explore synthetic data augmentation methods in great detail,we are hoping to equip readers with the necessary background information and in-depth knowledge of existing methods and their attendant issues.展开更多
Traditional information hiding techniques achieve information hiding by modifying carrier data,which can easily leave detectable traces that may be detected by steganalysis tools.Especially in image transmission,both ...Traditional information hiding techniques achieve information hiding by modifying carrier data,which can easily leave detectable traces that may be detected by steganalysis tools.Especially in image transmission,both geometric and non-geometric attacks can cause subtle changes in the pixels of the image during transmission.To overcome these challenges,we propose a constructive robust image steganography technique based on style transformation.Unlike traditional steganography,our algorithm does not involve any direct modifications to the carrier data.In this study,we constructed a mapping dictionary by setting the correspondence between binary codes and image categories and then used the mapping dictionary to map secret information to secret images.Through image semantic segmentation and style transfer techniques,we combined the style of secret images with the content of public images to generate stego images.This type of stego image can resist interference during public channel transmission,ensuring the secure transmission of information.At the receiving end,we input the stego image into a trained secret image reconstruction network,which can effectively reconstruct the original secret image and further recover the secret information through a mapping dictionary to ensure the security,accuracy,and efficient decoding of the information.The experimental results show that this constructive information hiding method based on style transfer improves the security of information hiding,enhances the robustness of the algorithm to various attacks,and ensures information security.展开更多
Arbitrary style transfer aims to perceptually reflect the style of a reference image in artistic creations with visual aesthetics.Traditional style transfer models,particularly those using adaptive instance normalizat...Arbitrary style transfer aims to perceptually reflect the style of a reference image in artistic creations with visual aesthetics.Traditional style transfer models,particularly those using adaptive instance normalization(AdaIN)layer,rely on global statistics,which often fail to capture the spatially local color distribution,leading to outputs that lack variation despite geometric transformations.To address this,we introduce Patchified AdaIN,a color-inspired style transfer method that applies AdaIN to localized patches,utilizing local statistics to capture the spatial color distribution of the reference image.This approach enables enhanced color awareness in style transfer,adapting dynamically to geometric transformations by leveraging local image statistics.Since Patchified AdaIN builds on AdaIN,it integrates seamlessly into existing frameworks without the need for additional training,allowing users to control the output quality through adjustable blending parameters.Our comprehensive experiments demonstrate that Patchified AdaIN can reflect geometric transformations(e.g.,translation,rotation,flipping)of images for style transfer,thereby achieving superior results compared to state-of-the-art methods.Additional experiments show the compatibility of Patchified AdaIN for integration into existing networks to enable spatial color-aware arbitrary style transfer by replacing the conventional AdaIN layer with the Patchified AdaIN layer.展开更多
Regional facial image synthesis conditioned on a semantic mask has achieved great attention in the field of computational visual media.However,the appearances of different regions may be inconsistent with each other a...Regional facial image synthesis conditioned on a semantic mask has achieved great attention in the field of computational visual media.However,the appearances of different regions may be inconsistent with each other after performing regional editing.In this paper,we focus on harmonized regional style transfer for facial images.A multi-scale encoder is proposed for accurate style code extraction.The key part of our work is a multi-region style attention module.It adapts multiple regional style embeddings from a reference image to a target image,to generate a harmonious result.We also propose style mapping networks for multi-modal style synthesis.We further employ an invertible flow model which can serve as mapping network to fine-tune the style code by inverting the code to latent space.Experiments on three widely used face datasets were used to evaluate our model by transferring regional facial appearance between datasets.The results show that our model can reliably perform style transfer and multimodal manipulation,generating output comparable to the state of the art.展开更多
This paper aims to conduct a comprehensive study on facial-sketch synthesis(FSS).However,due to the high cost of obtaining hand-drawn sketch datasets,there is a lack of a complete benchmark for assessing the developme...This paper aims to conduct a comprehensive study on facial-sketch synthesis(FSS).However,due to the high cost of obtaining hand-drawn sketch datasets,there is a lack of a complete benchmark for assessing the development of FSS algorithms over the last decade.We first introduce a high-quality dataset for FSS,named FS2K,which consists of 2104 image-sketch pairs spanning three types of sketch styles,image backgrounds,lighting conditions,skin colors,and facial attributes.FS2K differs from previous FSS datasets in difficulty,diversity,and scalability and should thus facilitate the progress of FSS research.Second,we present the largest-scale FSS investigation by reviewing 89 classic methods,including 25 handcrafted feature-based facial-sketch synthesis approaches,29 general translation methods,and 35 image-to-sketch approaches.In addition,we elaborate comprehensive experiments on the existing 19 cutting-edge models.Third,we present a simple baseline for FSS,named FSGAN.With only two straightforward components,i.e.,facialaware masking and style-vector expansion,our FSGAN surpasses the performance of all previous state-of-the-art models on the proposed FS2K dataset by a large margin.Finally,we conclude with lessons learned over the past years and point out several unsolved challenges.Our code is available at https://github.com/DengPingFan/FSGAN.展开更多
Vision Transformer has shown impressive performance on the image classification tasks.Observing that most existing visual style transfer(VST)algorithms are based on the texture-biased convolution neural network(CNN),h...Vision Transformer has shown impressive performance on the image classification tasks.Observing that most existing visual style transfer(VST)algorithms are based on the texture-biased convolution neural network(CNN),here raises the question of whether the shape-biased Vision Transformer can perform style transfer as CNN.In this work,we focus on comparing and analyzing the shape bias between CNN-and transformer-based models from the view of VST tasks.For comprehensive comparisons,we propose three kinds of transformer-based visual style transfer(Tr-VST)methods(Tr-NST for optimization-based VST,Tr-WCT for reconstruction-based VST and Tr-AdaIN for perceptual-based VST).By engaging three mainstream VST methods in the transformer pipeline,we show that transformer-based models pre-trained on ImageNet are not proper for style transfer methods.Due to the strong shape bias of the transformer-based models,these Tr-VST methods cannot render style patterns.We further analyze the shape bias by considering the influence of the learned parameters and the structure design.Results prove that with proper style supervision,the transformer can learn similar texture-biased features as CNN does.With the reduced shape bias in the transformer encoder,Tr-VST methods can generate higher-quality results compared with state-of-the-art VST methods.展开更多
Recently,there has been an upsurge of activity in image-based non-photorealistic rendering(NPR),and in particular portrait image stylisation,due to the advent of neural style transfer(NST).However,the state of perform...Recently,there has been an upsurge of activity in image-based non-photorealistic rendering(NPR),and in particular portrait image stylisation,due to the advent of neural style transfer(NST).However,the state of performance evaluation in this field is poor,especially compared to the norms in the computer vision and machine learning communities.Unfortunately,the task of evaluating image stylisation is thus far not well defined,since it involves subjective,perceptual,and aesthetic aspects.To make progress towards a solution,this paper proposes a new structured,threelevel,benchmark dataset for the evaluation of stylised portrait images.Rigorous criteria were used for its construction,and its consistency was validated by user studies.Moreover,a new methodology has been developed for evaluating portrait stylisation algorithms,which makes use of the different benchmark levels as well as annotations provided by user studies regarding the characteristics of the faces.We perform evaluation for a wide variety of image stylisation methods(both portrait-specific and general purpose,and also both traditional NPR approaches and NST)using the new benchmark dataset.展开更多
基金the National Natural Science Foundation of China(Nos.62272478,61872384)Natural Science Foundation of Shanxi Province(No.2023-JC-YB-584)+1 种基金National Natural Science Foundation of China(No.62172436)Engineering University of PAP’s Funding for Scientific Research Innovation Team,Engineering University of PAP’s Funding for Key Researcher(No.KYGG202011).
文摘This paper proposes an artificial intelligence-based robust information hiding algorithm to address the issue of confidential information being susceptible to noise attacks during transmission.The algorithm we designed aims to mitigate the impact of various noise attacks on the integrity of secret information during transmission.The method we propose involves encoding secret images into stylized encrypted images and applies adversarial transfer to both the style and content features of the original and embedded data.This process effectively enhances the concealment and imperceptibility of confidential information,thereby improving the security of such information during transmission and reducing security risks.Furthermore,we have designed a specialized attack layer to simulate real-world attacks and common noise scenarios encountered in practical environments.Through adversarial training,the algorithm is strengthened to enhance its resilience against attacks and overall robustness,ensuring better protection against potential threats.Experimental results demonstrate that our proposed algorithm successfully enhances the concealment and unknowability of secret information while maintaining embedding capacity.Additionally,it ensures the quality and fidelity of the stego image.The method we propose not only improves the security and robustness of information hiding technology but also holds practical application value in protecting sensitive data and ensuring the invisibility of confidential information.
文摘Visual illustration transformation from real-world to cartoon images is one of the famous and challenging tasks in computer vision.Image-to-image translation from real-world to cartoon domains poses issues such as a lack of paired training samples,lack of good image translation,low feature extraction from the previous domain images,and lack of high-quality image translation from the traditional generator algorithms.To solve the above-mentioned issues,paired independent model,high-quality dataset,Bayesian-based feature extractor,and an improved generator must be proposed.In this study,we propose a high-quality dataset to reduce the effect of paired training samples on the model’s performance.We use a Bayesian Very Deep Convolutional Network(VGG)-based feature extractor to improve the performance of the standard feature extractor because Bayesian inference regu-larizes weights well.The generator from the Cartoon Generative Adversarial Network(GAN)is modified by introducing a depthwise convolution layer and channel attention mechanism to improve the performance of the original generator.We have used the Fréchet inception distance(FID)score and user preference score to evaluate the performance of the model.The FID scores obtained for the generated cartoon and real-world images are 107 and 76 for the TCC style,and 137 and 57 for the Hayao style,respectively.User preference score is also calculated to evaluate the quality of generated images and our proposed model acquired a high preference score compared to other models.We achieved stunning results in producing high-quality cartoon images,demonstrating the proposed model’s effectiveness in transferring style between authentic images and cartoon images.
基金supported by Metaverse Lab Program funded by the Ministry of Science and ICT(MSIT),and the Korea Radio Promotion Association(RAPA).
文摘The objective of style transfer is to maintain the content of an image while transferring the style of another image.However,conventional methods face challenges in preserving facial features,especially in Korean portraits where elements like the“Gat”(a traditional Korean hat)are prevalent.This paper proposes a deep learning network designed to perform style transfer that includes the“Gat”while preserving the identity of the face.Unlike traditional style transfer techniques,the proposed method aims to preserve the texture,attire,and the“Gat”in the style image by employing image sharpening and face landmark,with the GAN.The color,texture,and intensity were extracted differently based on the characteristics of each block and layer of the pre-trained VGG-16,and only the necessary elements during training were preserved using a facial landmark mask.The head area was presented using the eyebrow area to transfer the“Gat”.Furthermore,the identity of the face was retained,and style correlation was considered based on the Gram matrix.To evaluate performance,we introduced a metric using PSNR and SSIM,with an emphasis on median values through new weightings for style transfer in Korean portraits.Additionally,we have conducted a survey that evaluated the content,style,and naturalness of the transferred results,and based on the assessment,we can confidently conclude that our method to maintain the integrity of content surpasses the previous research.Our approach,enriched by landmarks preservation and diverse loss functions,including those related to“Gat”,outperformed previous researches in facial identity preservation.
基金support from National Natural Science Foundation of China(62062048).
文摘In recent years,deep generative models have been successfully applied to perform artistic painting style transfer(APST).The difficulties might lie in the loss of reconstructing spatial details and the inefficiency of model convergence caused by the irreversible en-decoder methodology of the existing models.Aiming to this,this paper proposes a Flow-based architecture with both the en-decoder sharing a reversible network configuration.The proposed APST-Flow can efficiently reduce model uncertainty via a compact analysis-synthesis methodology,thereby the generalization performance and the convergence stability are improved.For the generator,a Flow-based network using Wavelet additive coupling(WAC)layers is implemented to extract multi-scale content features.Also,a style checker is used to enhance the global style consistency by minimizing the error between the reconstructed and the input images.To enhance the generated salient details,a loss of adaptive stroke edge is applied in both the global and local model training.The experimental results show that the proposed method improves PSNR by 5%,SSIM by 6.2%,and decreases Style Error by 29.4%over the existing models on the ChipPhi set.The competitive results verify that APST-Flow achieves high-quality generation with less content deviation and enhanced generalization,thereby can be further applied to more APST scenes.
基金funded by the Hanoi University of Science and Technology(HUST)under grant number T2018-PC-210.
文摘In recent years,speech synthesis systems have allowed for the produc-tion of very high-quality voices.Therefore,research in this domain is now turning to the problem of integrating emotions into speech.However,the method of con-structing a speech synthesizer for each emotion has some limitations.First,this method often requires an emotional-speech data set with many sentences.Such data sets are very time-intensive and labor-intensive to complete.Second,training each of these models requires computers with large computational capabilities and a lot of effort and time for model tuning.In addition,each model for each emotion failed to take advantage of data sets of other emotions.In this paper,we propose a new method to synthesize emotional speech in which the latent expressions of emotions are learned from a small data set of professional actors through a Flow-tron model.In addition,we provide a new method to build a speech corpus that is scalable and whose quality is easy to control.Next,to produce a high-quality speech synthesis model,we used this data set to train the Tacotron 2 model.We used it as a pre-trained model to train the Flowtron model.We applied this method to synthesize Vietnamese speech with sadness and happiness.Mean opi-nion score(MOS)assessment results show that MOS is 3.61 for sadness and 3.95 for happiness.In conclusion,the proposed method proves to be more effec-tive for a high degree of automation and fast emotional sentence generation,using a small emotional-speech data set.
基金the National Natural Science Foundation of China(51965008)Science and Technology projects of Guizhou[2018]2168Excellent Young Researcher Project of Guizhou[2017]5630.
文摘With the advent of deep learning,self-driving schemes based on deep learning are becoming more and more popular.Robust perception-action models should learn from data with different scenarios and real behaviors,while current end-to-end model learning is generally limited to training of massive data,innovation of deep network architecture,and learning in-situ model in a simulation environment.Therefore,we introduce a new image style transfer method into data augmentation,and improve the diversity of limited data by changing the texture,contrast ratio and color of the image,and then it is extended to the scenarios that the model has been unobserved before.Inspired by rapid style transfer and artistic style neural algorithms,we propose an arbitrary style generation network architecture,including style transfer network,style learning network,style loss network and multivariate Gaussian distribution function.The style embedding vector is randomly sampled from the multivariate Gaussian distribution and linearly interpolated with the embedded vector predicted by the input image on the style learning network,which provides a set of normalization constants for the style transfer network,and finally realizes the diversity of the image style.In order to verify the effectiveness of the method,image classification and simulation experiments were performed separately.Finally,we built a small-sized smart car experiment platform,and apply the data augmentation technology based on image style transfer drive to the experiment of automatic driving for the first time.The experimental results show that:(1)The proposed scheme can improve the prediction accuracy of the end-to-end model and reduce the model’s error accumulation;(2)the method based on image style transfer provides a new scheme for data augmentation technology,and also provides a solution for the high cost that many deep models rely heavily on a large number of label data.
基金supported by the National Natural Science Foundation of China(61672482,11626253)the One Hundred Talent Project of the Chinese Academy of Sciences
文摘Color transfer between images uses the statistics information of image effectively. We present a novel approach of local color transfer between images based on the simple statistics and locally linear embedding. A sketching interface is proposed for quickly and easily specifying the color correspondences between target and source image. The user can specify the corre- spondences of local region using scribes, which more accurately transfers the target color to the source image while smoothly preserving the boundaries, and exhibits more natural output results. Our algorithm is not restricted to one-to-one image color transfer and can make use of more than one target images to transfer the color in different regions in the source image. Moreover, our algorithm does not require to choose the same color style and image size between source and target images. We propose the sub-sampling to reduce the computational load. Comparing with other approaches, our algorithm is much better in color blending in the input data. Our approach preserves the other color details in the source image. Various experimental results show that our approach specifies the correspondences of local color region in source and target images. And it expresses the intention of users and generates more actual and natural results of visual effect.
文摘The performance and accuracy of computer vision systems are affected by noise in different forms.Although numerous solutions and algorithms have been presented for dealing with every type of noise,a comprehensive technique that can cover all the diverse noises and mitigate their damaging effects on the performance and precision of various systems is still missing.In this paper,we have focused on the stability and robustness of one computer vision branch(i.e.,visual object tracking).We have demonstrated that,without imposing a heavy computational load on a model or changing its algorithms,the drop in the performance and accuracy of a system when it is exposed to an unseen noise-laden test dataset can be prevented by simply applying the style transfer technique on the train dataset and training the model with a combination of these and the original untrained data.To verify our proposed approach,it is applied on a generic object tracker by using regression networks.This method’s validity is confirmed by testing it on an exclusive benchmark comprising 50 image sequences,with each sequence containing 15 types of noise at five different intensity levels.The OPE curves obtained show a 40%increase in the robustness of the proposed object tracker against noise,compared to the other trackers considered.
基金This work was funded by the China Postdoctoral Science Foundation(No.2019M661319)Heilongjiang Postdoctoral Scientific Research Developmental Foundation(No.LBH-Q17042)+1 种基金Fundamental Research Funds for the Central Universities(3072020CFQ0602,3072020CF0604,3072020CFP0601)2019 Industrial Internet Innovation and Development Engineering(KY1060020002,KY10600200008).
文摘The technology for image-to-image style transfer(a prevalent image processing task)has developed rapidly.The purpose of style transfer is to extract a texture from the source image domain and transfer it to the target image domain using a deep neural network.However,the existing methods typically have a large computational cost.To achieve efficient style transfer,we introduce a novel Ghost module into the GANILLA architecture to produce more feature maps from cheap operations.Then we utilize an attention mechanism to transform images with various styles.We optimize the original generative adversarial network(GAN)by using more efficient calculation methods for image-to-illustration translation.The experimental results show that our proposed method is similar to human vision and still maintains the quality of the image.Moreover,our proposed method overcomes the high computational cost and high computational resource consumption for style transfer.By comparing the results of subjective and objective evaluation indicators,our proposed method has shown superior performance over existing methods.
基金National Natural Science Foundation of China(No.61861025)。
文摘Aiming at the problems of image semantic content distortion and blurred foreground and background boundaries during the transfer process of convolutional neural image stylization,we propose a convolutional neural artistic stylization algorithm for suppressing image distortion.Firstly,the VGG-19 network model is used to extract the feature map from the input content image and style image and to reconstruct the content and style.Then the transfer of the input content image and style image to the output image is constrained in the local affine transformation of the color space.And the Laplacian matting matrix is constructed by combining the local affine of the input image RGB channel.For each output blocks,affine transformation maps the RGB value of the input image to the corresponding output and position,which realizes the constraint of semantic content and the control of spatial layout.Finally,the synthesized image is superimposed on the white noise image and updated iteratively with the back propagation algorithm to minimize the loss function to complete the image stylization.Experimental results show that the method can generate images with obvious foreground and background edges,clear texture,restrained semantic content-distortion,realized spatial constraint and color mapping of the transfer images,and made the stylized images visually satisfactory.
基金supported by Laboratory Directed Research and Development(LDRD)funding from Berkeley Laboratoryby the US Department of Energy(DOE),including the Office of Basic Energy Sciences,Chemical Sciences,Geosciences,and Biosciences Division and the Office of Nuclear Energy,Spent Fuel and Waste Disposition Campaign,both under Contract No.DEAC02-05CH11231 with Berkeley Laboratory。
文摘The complex geometric features of subsurface fractures at different scales makes mesh generation challenging and/or expensive.In this paper,we make use of neural style transfer(NST),a machine learning technique,to generate mesh from rock fracture images.In this new approach,we use digital rock fractures at multiple scales that represent’content’and define uniformly shaped and sized triangles to represent’style’.The 19-layer convolutional neural network(CNN)learns the content from the rock image,including lower-level features(such as edges and corners)and higher-level features(such as rock,fractures,or other mineral fillings),and learns the style from the triangular grids.By optimizing the cost function to achieve approximation to represent both the content and the style,numerical meshes can be generated and optimized.We utilize the NST to generate meshes for rough fractures with asperities formed in rock,a network of fractures embedded in rock,and a sand aggregate with multiple grains.Based on the examples,we show that this new NST technique can make mesh generation and optimization much more efficient by achieving a good balance between the density of the mesh and the presentation of the geometric features.Finally,we discuss future applications of this approach and perspectives of applying machine learning to bridge the gaps between numerical modeling and experiments.
基金This work was supported by the Key Project of the National Natural Science Foundation of China(NSFC)under Grant 41930104National Key R&D Program of China under Grant 2021 YFE0112300+1 种基金Postgraduate Research&Practice Innovation Program of Jiangsu Province under Grant KYCX21_1336China Scholarship Council under Grant 202206860019.
文摘Most of the existing virtual scenarios built for the digital protection of Chinese classical private gardens are too modern in expression style to show the aesthetic significance of their historical period.Considering the aesthetic commonality between traditional Chinese landscape paintings and classical private gardens and referring to image style transfer,here,a deep neural network was proposed to transfer the aesthetic style from landscape paintings to the virtual scenario of classical private gardens.The network consisted of two parts:style prediction and style transfer.The style prediction network was used to obtain style representation from style paintings,and the style transfer network was used to transfer style representation to the content scenario.The pre-trained network was then embedded into the scenario rendering pipeline and combined with the screen post-processing method to realise the stylised expression of the virtual scenario.To verify the feasibility of this methodology,a virtual scenario of the Humble Administrator’s Garden was used as the content scenario andfive garden landscape paintings from different time periods and painting styles were selected for the case study.The results demonstrated that this methodology could effectively achieve the aesthetic style transfer of a virtual scenario.
基金supported by the Theme-based Research Scheme,Research Grants Council of Hong Kong,China(T45-205/21-N).
文摘To generate dance that temporally and aesthetically matches the music is a challenging problem in three aspects.First,the generated motion should be beats-aligned to the local musical features.Second,the global aesthetic style should be matched between motion and music.And third,the generated motion should be diverse and non-self-repeating.To address these challenges,we propose ReChoreoNet,which re-choreographs high-quality dance motion for a given piece of music.A data-driven learning strategy is proposed to efficiently correlate the temporal connections between music and motion in a progressively learned cross-modality embedding space.The beats-aligned content motion will be subsequently used as autoregressive context and control signal to control a normalizing-flow model,which transfers the style of a prototype motion to the final generated dance.In addition,we present an aesthetically labelled music-dance repertoire(MDR)for both efficient learning of the cross-modality embedding,and understanding of the aesthetic connections between music and motion.We demonstrate that our repertoire-based framework is robustly extensible in both content and style.Both quantitative and qualitative experiments have been carried out to validate the efficiency of our proposed model.
文摘The standard approach to tackling computer vision problems is to train deep convolutional neural network(CNN)models using large-scale image datasets that are representative of the target task.However,in many scenarios,it is often challenging to obtain sufficient image data for the target task.Data augmentation is a way to mitigate this challenge.A common practice is to explicitly transform existing images in desired ways to create the required volume and variability of training data necessary to achieve good generalization performance.In situations where data for the target domain are not accessible,a viable workaround is to synthesize training data from scratch,i.e.,synthetic data augmentation.This paper presents an extensive review of synthetic data augmentation techniques.It covers data synthesis approaches based on realistic 3D graphics modelling,neural style transfer(NST),differential neural rendering,and generative modelling using generative adversarial networks(GANs)and variational autoencoders(VAEs).For each of these classes of methods,we focus on the important data generation and augmentation techniques,general scope of application and specific use-cases,as well as existing limitations and possible workarounds.Additionally,we provide a summary of common synthetic datasets for training computer vision models,highlighting the main features,application domains and supported tasks.Finally,we discuss the effectiveness of synthetic data augmentation methods.Since this is the first paper to explore synthetic data augmentation methods in great detail,we are hoping to equip readers with the necessary background information and in-depth knowledge of existing methods and their attendant issues.
基金the National Natural Science Foundation of China(Nos.62272478,61872384,62172436,62102451)Natural Science Foundation of Shanxi Province(No.2023-JC-YB-584)Engineering University of PAP’s Funding for Key Researcher(No.KYGG202011).
文摘Traditional information hiding techniques achieve information hiding by modifying carrier data,which can easily leave detectable traces that may be detected by steganalysis tools.Especially in image transmission,both geometric and non-geometric attacks can cause subtle changes in the pixels of the image during transmission.To overcome these challenges,we propose a constructive robust image steganography technique based on style transformation.Unlike traditional steganography,our algorithm does not involve any direct modifications to the carrier data.In this study,we constructed a mapping dictionary by setting the correspondence between binary codes and image categories and then used the mapping dictionary to map secret information to secret images.Through image semantic segmentation and style transfer techniques,we combined the style of secret images with the content of public images to generate stego images.This type of stego image can resist interference during public channel transmission,ensuring the secure transmission of information.At the receiving end,we input the stego image into a trained secret image reconstruction network,which can effectively reconstruct the original secret image and further recover the secret information through a mapping dictionary to ensure the security,accuracy,and efficient decoding of the information.The experimental results show that this constructive information hiding method based on style transfer improves the security of information hiding,enhances the robustness of the algorithm to various attacks,and ensures information security.
基金supported by the National Research Foundation of Korea (NRF)grant funded by the Korean government (MSIT) (No.2022R1A2C1004657,Contribution Rate:50%)Culture,Sports and Tourism R&D Program through the Korea Creative Content Agency grant funded by Ministry of Culture Sports and Tourism in 2024 (Project Name:Developing Professionals for R&D in Contents Production Based on Generative Ai and Cloud,Project Number:RS-2024-00352578,Contribution Rate:50%).
文摘Arbitrary style transfer aims to perceptually reflect the style of a reference image in artistic creations with visual aesthetics.Traditional style transfer models,particularly those using adaptive instance normalization(AdaIN)layer,rely on global statistics,which often fail to capture the spatially local color distribution,leading to outputs that lack variation despite geometric transformations.To address this,we introduce Patchified AdaIN,a color-inspired style transfer method that applies AdaIN to localized patches,utilizing local statistics to capture the spatial color distribution of the reference image.This approach enables enhanced color awareness in style transfer,adapting dynamically to geometric transformations by leveraging local image statistics.Since Patchified AdaIN builds on AdaIN,it integrates seamlessly into existing frameworks without the need for additional training,allowing users to control the output quality through adjustable blending parameters.Our comprehensive experiments demonstrate that Patchified AdaIN can reflect geometric transformations(e.g.,translation,rotation,flipping)of images for style transfer,thereby achieving superior results compared to state-of-the-art methods.Additional experiments show the compatibility of Patchified AdaIN for integration into existing networks to enable spatial color-aware arbitrary style transfer by replacing the conventional AdaIN layer with the Patchified AdaIN layer.
基金partly supported by the National Key R&D Program of China(No.2020YFA0714100)the National Natural Science Foundation of China(Nos.61872162,62102162,61832016,U20B2070).
文摘Regional facial image synthesis conditioned on a semantic mask has achieved great attention in the field of computational visual media.However,the appearances of different regions may be inconsistent with each other after performing regional editing.In this paper,we focus on harmonized regional style transfer for facial images.A multi-scale encoder is proposed for accurate style code extraction.The key part of our work is a multi-region style attention module.It adapts multiple regional style embeddings from a reference image to a target image,to generate a harmonious result.We also propose style mapping networks for multi-modal style synthesis.We further employ an invertible flow model which can serve as mapping network to fine-tune the style code by inverting the code to latent space.Experiments on three widely used face datasets were used to evaluate our model by transferring regional facial appearance between datasets.The results show that our model can reliably perform style transfer and multimodal manipulation,generating output comparable to the state of the art.
基金supported by the Grant-in-Aid for Japan Society for the Promotion of Science Fellows, Japan (No. 21F50377)
文摘This paper aims to conduct a comprehensive study on facial-sketch synthesis(FSS).However,due to the high cost of obtaining hand-drawn sketch datasets,there is a lack of a complete benchmark for assessing the development of FSS algorithms over the last decade.We first introduce a high-quality dataset for FSS,named FS2K,which consists of 2104 image-sketch pairs spanning three types of sketch styles,image backgrounds,lighting conditions,skin colors,and facial attributes.FS2K differs from previous FSS datasets in difficulty,diversity,and scalability and should thus facilitate the progress of FSS research.Second,we present the largest-scale FSS investigation by reviewing 89 classic methods,including 25 handcrafted feature-based facial-sketch synthesis approaches,29 general translation methods,and 35 image-to-sketch approaches.In addition,we elaborate comprehensive experiments on the existing 19 cutting-edge models.Third,we present a simple baseline for FSS,named FSGAN.With only two straightforward components,i.e.,facialaware masking and style-vector expansion,our FSGAN surpasses the performance of all previous state-of-the-art models on the proposed FS2K dataset by a large margin.Finally,we conclude with lessons learned over the past years and point out several unsolved challenges.Our code is available at https://github.com/DengPingFan/FSGAN.
基金the National Key Research and Development Program of China under Grant No.2020AAA0106200the National Natural Science Foundation of China under Grant Nos.62102162,61832016,U20B2070,and 6210070958+1 种基金the CASIA-Tencent Youtu Joint Research Projectthe Open Projects Program of the National Laboratory of Pattern Recognition.
文摘Vision Transformer has shown impressive performance on the image classification tasks.Observing that most existing visual style transfer(VST)algorithms are based on the texture-biased convolution neural network(CNN),here raises the question of whether the shape-biased Vision Transformer can perform style transfer as CNN.In this work,we focus on comparing and analyzing the shape bias between CNN-and transformer-based models from the view of VST tasks.For comprehensive comparisons,we propose three kinds of transformer-based visual style transfer(Tr-VST)methods(Tr-NST for optimization-based VST,Tr-WCT for reconstruction-based VST and Tr-AdaIN for perceptual-based VST).By engaging three mainstream VST methods in the transformer pipeline,we show that transformer-based models pre-trained on ImageNet are not proper for style transfer methods.Due to the strong shape bias of the transformer-based models,these Tr-VST methods cannot render style patterns.We further analyze the shape bias by considering the influence of the learned parameters and the structure design.Results prove that with proper style supervision,the transformer can learn similar texture-biased features as CNN does.With the reduced shape bias in the transformer encoder,Tr-VST methods can generate higher-quality results compared with state-of-the-art VST methods.
文摘Recently,there has been an upsurge of activity in image-based non-photorealistic rendering(NPR),and in particular portrait image stylisation,due to the advent of neural style transfer(NST).However,the state of performance evaluation in this field is poor,especially compared to the norms in the computer vision and machine learning communities.Unfortunately,the task of evaluating image stylisation is thus far not well defined,since it involves subjective,perceptual,and aesthetic aspects.To make progress towards a solution,this paper proposes a new structured,threelevel,benchmark dataset for the evaluation of stylised portrait images.Rigorous criteria were used for its construction,and its consistency was validated by user studies.Moreover,a new methodology has been developed for evaluating portrait stylisation algorithms,which makes use of the different benchmark levels as well as annotations provided by user studies regarding the characteristics of the faces.We perform evaluation for a wide variety of image stylisation methods(both portrait-specific and general purpose,and also both traditional NPR approaches and NST)using the new benchmark dataset.