Automated Video Generation of Moving Digits from Text Using Deep Deconvolutional Generative Adversarial Network

下载PDF

导出

摘要 Generating realistic and synthetic video from text is a highly challenging task due to the multitude of issues involved,including digit deformation,noise interference between frames,blurred output,and the need for temporal coherence across frames.In this paper,we propose a novel approach for generating coherent videos of moving digits from textual input using a Deep Deconvolutional Generative Adversarial Network(DD-GAN).The DDGAN comprises a Deep Deconvolutional Neural Network(DDNN)as a Generator(G)and a modified Deep Convolutional Neural Network(DCNN)as a Discriminator(D)to ensure temporal coherence between adjacent frames.The proposed research involves several steps.First,the input text is fed into a Long Short Term Memory(LSTM)based text encoder and then smoothed using Conditioning Augmentation(CA)techniques to enhance the effectiveness of the Generator(G).Next,using a DDNN to generate video frames by incorporating enhanced text and random noise and modifying a DCNN to act as a Discriminator(D),effectively distinguishing between generated and real videos.This research evaluates the quality of the generated videos using standard metrics like Inception Score(IS),Fréchet Inception Distance(FID),Fréchet Inception Distance for video(FID2vid),and Generative Adversarial Metric(GAM),along with a human study based on realism,coherence,and relevance.By conducting experiments on Single-Digit Bouncing MNIST GIFs(SBMG),Two-Digit Bouncing MNIST GIFs(TBMG),and a custom dataset of essential mathematics videos with related text,this research demonstrates significant improvements in both metrics and human study results,confirming the effectiveness of DD-GAN.This research also took the exciting challenge of generating preschool math videos from text,handling complex structures,digits,and symbols,and achieving successful results.The proposed research demonstrates promising results for generating coherent videos from textual input.

作者 Anwar Ullah Xinguo Yu Muhammad Numan

机构地区 National Engineering Research Center for E-Learning Wollongong Joint Institute

出处《Computers, Materials & Continua》 SCIE EI 2023年第11期2359-2383,共25页 计算机、材料和连续体（英文）

基金 supported by the General Program of the National Natural Science Foundation of China(Grant No.61977029).

关键词 Generative Adversarial Network(GAN) deconvolutional neural network convolutional neural network Inception Score(IS) temporal coherence Fréchet Inception Distance(FID) Generative Adversarial Metric(GAM)

分类号 TP183 [自动化与计算机技术—控制理论与控制工程] TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1杨红,张贺,靳少宁.融合卷积与多头注意力的人体姿态迁移模型[J].计算机应用,2023,43(11):3403-3410. 被引量：1
2Susmi Anna Thomas,Jayesh Cherusseri.Strategically designing layered two-dimensional SnS_(2)-based hybrid electrodes: A futuristic option for low-cost supercapacitors[J].Journal of Energy Chemistry,2023(10):394-417.
3Instructions for contributions[J].Applied Mathematics(A Journal of Chinese Universities),2023,38(4).
4宗子杨,何军,宦海,李庆勇.基于欧式距离对偶的对抗性无监督域适应算法研究[J].电子测量技术,2023,46(14):95-101.
5Xin Xie,Tingting Yu,Xiang Li,Nan Zhang,Leonard J.Foster,Cheng Peng,Wei Huang,Gu He.Recent advances in targeting the“undruggable”proteins:from drugdiscover to clinical trials[J].Signal Transduction and Targeted Therapy,2023,8(10):4477-4547.
6吴飞,宋一波,季一木,胥熙,王木森,荆晓远.面向全局不平衡问题的基于贡献度的联邦学习方法[J].计算机科学,2023,50(12):343-348.
7Tanmay Jain,Debomita Ghosh,Dusmanta Kumar Mohanta.Augmentation of situational awareness by fault passage indicators in distribution network incorporating network reconfiguration[J].Protection and Control of Modern Power Systems,2019,4(1):281-294. 被引量：1
8谢晓燕,HE Wanqi,ZHU Yun,YU Jinhao.Neural network hyperparameter optimization based on improved particle swarm optimization[J].High Technology Letters,2023,29(4):427-433.
9Zhen Zhen,Jian Gao.Chinese Cyber Threat Intelligence Named Entity Recognition via RoBERTa-wwm-RDCNN-CRF[J].Computers, Materials & Continua,2023,77(10):299-323.
10History of Chinese Sinology Research Series (in total 16 volumes) to simultaneously publish in Chinese and English in 2025[J].China Book International,2023(5):63-63.

Computers, Materials & Continua

2023年第11期

浏览历史

内容加载中请稍等...

Automated Video Generation of Moving Digits from Text Using Deep Deconvolutional Generative Adversarial Network

相关作者

相关机构

相关主题

浏览历史