Generating realistic and synthetic video from text is a highly challenging task due to the multitude of issues involved,including digit deformation,noise interference between frames,blurred output,and the need for tem...Generating realistic and synthetic video from text is a highly challenging task due to the multitude of issues involved,including digit deformation,noise interference between frames,blurred output,and the need for temporal coherence across frames.In this paper,we propose a novel approach for generating coherent videos of moving digits from textual input using a Deep Deconvolutional Generative Adversarial Network(DD-GAN).The DDGAN comprises a Deep Deconvolutional Neural Network(DDNN)as a Generator(G)and a modified Deep Convolutional Neural Network(DCNN)as a Discriminator(D)to ensure temporal coherence between adjacent frames.The proposed research involves several steps.First,the input text is fed into a Long Short Term Memory(LSTM)based text encoder and then smoothed using Conditioning Augmentation(CA)techniques to enhance the effectiveness of the Generator(G).Next,using a DDNN to generate video frames by incorporating enhanced text and random noise and modifying a DCNN to act as a Discriminator(D),effectively distinguishing between generated and real videos.This research evaluates the quality of the generated videos using standard metrics like Inception Score(IS),Fréchet Inception Distance(FID),Fréchet Inception Distance for video(FID2vid),and Generative Adversarial Metric(GAM),along with a human study based on realism,coherence,and relevance.By conducting experiments on Single-Digit Bouncing MNIST GIFs(SBMG),Two-Digit Bouncing MNIST GIFs(TBMG),and a custom dataset of essential mathematics videos with related text,this research demonstrates significant improvements in both metrics and human study results,confirming the effectiveness of DD-GAN.This research also took the exciting challenge of generating preschool math videos from text,handling complex structures,digits,and symbols,and achieving successful results.The proposed research demonstrates promising results for generating coherent videos from textual input.展开更多
Image denoising is often used as a preprocessing step in computer vision tasks,which can help improve the accuracy of image processing models.Due to the imperfection of imaging systems,transmission media and recording...Image denoising is often used as a preprocessing step in computer vision tasks,which can help improve the accuracy of image processing models.Due to the imperfection of imaging systems,transmission media and recording equipment,digital images are often contaminated with various noises during their formation,which troubles the visual effects and even hinders people’s normal recognition.The pollution of noise directly affects the processing of image edge detection,feature extraction,pattern recognition,etc.,making it difficult for people to break through the bottleneck by modifying the model.Many traditional filtering methods have shown poor performance since they do not have optimal expression and adaptation for specific images.Meanwhile,deep learning technology opens up new possibilities for image denoising.In this paper,we propose a novel neural network which is based on generative adversarial networks for image denoising.Inspired by U-net,our method employs a novel symmetrical encoder-decoder based generator network.The encoder adopts convolutional neural networks to extract features,while the decoder outputs the noise in the images by deconvolutional neural networks.Specially,shortcuts are added between designated layers,which can preserve image texture details and prevent gradient explosions.Besides,in order to improve the training stability of the model,we add Wasserstein distance in loss function as an optimization.We use the peak signal-to-noise ratio(PSNR)to evaluate our model and we can prove the effectiveness of it with experimental results.When compared to the state-of-the-art approaches,our method presents competitive performance.展开更多
基金supported by the General Program of the National Natural Science Foundation of China(Grant No.61977029).
文摘Generating realistic and synthetic video from text is a highly challenging task due to the multitude of issues involved,including digit deformation,noise interference between frames,blurred output,and the need for temporal coherence across frames.In this paper,we propose a novel approach for generating coherent videos of moving digits from textual input using a Deep Deconvolutional Generative Adversarial Network(DD-GAN).The DDGAN comprises a Deep Deconvolutional Neural Network(DDNN)as a Generator(G)and a modified Deep Convolutional Neural Network(DCNN)as a Discriminator(D)to ensure temporal coherence between adjacent frames.The proposed research involves several steps.First,the input text is fed into a Long Short Term Memory(LSTM)based text encoder and then smoothed using Conditioning Augmentation(CA)techniques to enhance the effectiveness of the Generator(G).Next,using a DDNN to generate video frames by incorporating enhanced text and random noise and modifying a DCNN to act as a Discriminator(D),effectively distinguishing between generated and real videos.This research evaluates the quality of the generated videos using standard metrics like Inception Score(IS),Fréchet Inception Distance(FID),Fréchet Inception Distance for video(FID2vid),and Generative Adversarial Metric(GAM),along with a human study based on realism,coherence,and relevance.By conducting experiments on Single-Digit Bouncing MNIST GIFs(SBMG),Two-Digit Bouncing MNIST GIFs(TBMG),and a custom dataset of essential mathematics videos with related text,this research demonstrates significant improvements in both metrics and human study results,confirming the effectiveness of DD-GAN.This research also took the exciting challenge of generating preschool math videos from text,handling complex structures,digits,and symbols,and achieving successful results.The proposed research demonstrates promising results for generating coherent videos from textual input.
基金supported by the National Natural Science Foundation of China(61872231,61701297)the Major Program of the National Social Science Foundation of China(Grant No.20&ZD130).
文摘Image denoising is often used as a preprocessing step in computer vision tasks,which can help improve the accuracy of image processing models.Due to the imperfection of imaging systems,transmission media and recording equipment,digital images are often contaminated with various noises during their formation,which troubles the visual effects and even hinders people’s normal recognition.The pollution of noise directly affects the processing of image edge detection,feature extraction,pattern recognition,etc.,making it difficult for people to break through the bottleneck by modifying the model.Many traditional filtering methods have shown poor performance since they do not have optimal expression and adaptation for specific images.Meanwhile,deep learning technology opens up new possibilities for image denoising.In this paper,we propose a novel neural network which is based on generative adversarial networks for image denoising.Inspired by U-net,our method employs a novel symmetrical encoder-decoder based generator network.The encoder adopts convolutional neural networks to extract features,while the decoder outputs the noise in the images by deconvolutional neural networks.Specially,shortcuts are added between designated layers,which can preserve image texture details and prevent gradient explosions.Besides,in order to improve the training stability of the model,we add Wasserstein distance in loss function as an optimization.We use the peak signal-to-noise ratio(PSNR)to evaluate our model and we can prove the effectiveness of it with experimental results.When compared to the state-of-the-art approaches,our method presents competitive performance.