改进深度卷积生成式对抗网络的文本生成图像

Text-to-image synthesis based on modified deep convolutional generative adversarial network

下载PDF

导出

摘要针对深度卷积生成式对抗网络(DCGAN)模型高维文本输入表示的稀疏性导致以文本为条件生成的图像结构缺失和图像不真实的问题,提出了一种改进深度卷积生成式对抗网络模型CA-DCGAN。采用深度卷积网络和循环文本编码器对输入的文本进行编码,得到文本的特征向量表示。引入条件增强(CA)模型,通过文本特征向量的均值和协方差矩阵产生附加的条件变量,代替原来的高维文本特征向量。将条件变量与随机噪声结合作为生成器的输入,并在生成器的损失中额外加入KL损失正则化项,避免模型训练过拟合,使模型可以更好的收敛,在判别器中使用谱约束(SN)层,防止其梯度下降太快造成生成器与判别器不平衡训练而发生模式崩溃的问题。实验验证结果表明:所提模型在Oxford-102-flowers和CUB-200数据集上生成的图像质量较alignDRAW、GAN-CLS、GAN-INT-CLS、StackGAN(64×64)、StackGAN-v1(64×64)模型更好且接近于真实样本,初始得分值最低分别提高了10.9%和5.6%,最高分别提高了41.4%和37.5%,FID值最低分别降低了11.4%和8.4%,最高分别降低了43.9%和42.5%,进一步表明了所提模型的有效性。 When high-dimensional texts are adopted as input,images generated by the previously proposed deep convolutional generative adversarial network(DCGAN)model usually suffer from distortions and structure degradation due to the sparsity of texts,which seriously poses a negative impact on the generative performance.To address this issue,an improved deep convolutional generative adversarial network model,CA-DCGAN is proposed.Technically,a deep convolutional network and a recurrent text encoder are simultaneously employed to encode the input text so that the corresponding text embedding representation can be obtained.Then,a conditional augmentation(CA)model is introduced to generate an additional condition variable to replace the original high-dimensional text feature.Finally,the conditional variable and random noise are combined as the input of the generator.Meanwhile,to avoid over-fitting and promote the convergence,we introduce a KL regularization term into the generator’s loss.Moreover,we adopt a spectral normalization(SN)layer in the discriminator to prevent the mode collapse caused by the unbalanced training due to the fast gradient descent of the discriminator.The experimental verification results show that the proposed model on the Oxford-102-flowers and CUB-200 datasets is better than that of alignDRAW,GAN-CLS,GAN-INT-CLS,StackGAN(64×64),StackGAN-vl(64×64)in terms of the quality of generated images.The results show that the lowest inception score increased by 10.9%and 5.6%respectively,the highest inception score increased by 41.4%and 37.5%respectively,while the lowest FID index value decreased by 11.4%and 8.4%respectively,the highest FID index value decreased by 43.9%and 42.5%respectively,which further validate the effectiveness of the proposed method.

作者李云红朱绵云任劼苏雪平周小计于惠康 LI Yunhong;ZHU Mianyun;REN Jie;SU Xueping;ZHOU Xiaoji;YU Huikang(School of Electronics and Information,Xi’an Polytechnic University,Xi’an 710048,China)

机构地区西安工程大学电子信息学院

出处《北京航空航天大学学报》 EI CAS CSCD 北大核心 2023年第8期1875-1883,共9页 Journal of Beijing University of Aeronautics and Astronautics

基金国家自然科学基金(61902301) 陕西省自然科学基础研究计划重点项目(2022JZ-35)。

关键词深度卷积生成式对抗网络文本生成图像文本特征表示条件增强 KL正则化 deep convolutional generative adversarial network text-to-image synthesis text feature representation conditional augmentation KL regularization

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1陆婷婷,李潇,张尧,阎岩,杨卫东.基于三维点云模型的空间目标光学图像生成技术[J].北京航空航天大学学报,2020,46(2):274-286. 被引量：11
2牛蒙蒙,沈明瑞,秦波,厉大维,刘阔,王永青.基于GAN的刀具状态监测数据集增强方法[J].组合机床与自动化加工技术,2021(4):113-115. 被引量：1

二级参考文献3

1张玥,李清毅,许晓霞.月球表面地形数学建模方法[J].航天器环境工程,2007,24(6):341-343. 被引量：9
2刘天羽,李国正.滚动轴承故障诊断中数据不均衡问题的研究[J].计算机工程与科学,2010,32(5):150-153. 被引量：7
3乔俊飞,潘广源,韩红桂.一种连续型深度信念网的设计与应用[J].自动化学报,2015,41(12):2138-2146. 被引量：21

共引文献10

1陆婷婷,张尧,阎岩,杨利民,杨卫东.一种基于自动特征学习的陨石坑区域检测算法[J].北京航空航天大学学报,2021,47(5):939-952. 被引量：2
2陈凯,张志刚.基于光空间变换技术的虚拟动画三维场景设计[J].激光杂志,2021,42(11):190-195. 被引量：6
3张雄,徐高鹏,熊健,李鑫,杨岩.残余应力无损检测点定位及实现方法[J].机械设计,2022,39(1):91-97. 被引量：1
4刘红雨,李力恒,王晓磊.基于Canny-Harris特征点的图像目标尺度方向跟踪算法[J].信息技术,2022,46(7):46-50.
5王艳贞,王晓芬,胡海晓.基于空间光学技术的彩色图像增强方法[J].激光杂志,2022,43(7):75-79. 被引量：1
6邓强,宣继涛.散射环境下光学图像成像系统设计[J].激光杂志,2023,44(4):190-195.
7张丽伟,刘琼.基于人工智能技术的紫外光通信调制方法[J].激光杂志,2023,44(5):107-111.
8刘倩,陈辉,曾健友.基于计算机视觉的可见光图像彩色成像研究[J].激光杂志,2023,44(7):105-109.
9张晓华,蔡巍,武宇平,杜维柱,薛文祥,卢毅.基于海量点云数据的输电线路三维建模研究[J].信息技术,2023,47(9):143-147.
10何煊强,崔文涛.基于灰狼算法的室内三维空间图像分割方法研究[J].遵义师范学院学报,2023,25(5):81-85.

1崔仕林,闫蓉.基于SoftLexicon和注意力机制的中文因果关系抽取[J].中文信息学报,2023,37(4):81-89.
2陈善学,胡之源.基于空谱约束的加权稀疏柯西非负矩阵分解高光谱解混[J].激光与光电子学进展,2023,60(10):385-394.
3慎金花,陈红艺,张更平,秦乐洋.基于层次分类器的专利文本分类模型研究[J].情报杂志,2023,42(8):157-163. 被引量：2

北京航空航天大学学报

2023年第8期

浏览历史

内容加载中请稍等...

改进深度卷积生成式对抗网络的文本生成图像

参考文献2

二级参考文献3

共引文献10

相关作者

相关机构

相关主题

浏览历史