摘要
针对文本生成图像任务存在生成图像有目标结构不合理、图像纹理不清晰等问题,在注意力生成对抗网络(AttnGAN)的基础上提出了多层次分辨率递进生成对抗网络(MPRGAN)模型。首先,在低分辨率层采用语义分离-融合生成模块,将文本特征在自注意力机制引导下分离为3个特征向量,并用这些特征向量分别生成特征图谱;然后,将特征图谱融合为低分辨率图谱,并采用mask图像作为语义约束以提高低分辨率生成器的稳定性;最后,在高分辨率层采用分辨率递进残差结构,同时结合词注意力机制和像素混洗来进一步改善生成图像的质量。实验结果表明,在数据集CUB-200-2011和Oxford-102上,所提模型的IS分别达到了4.70和3.53,与AttnGAN相比分别提高了7.80%和3.82%。MPRGAN模型能够在一定程度上解决结构生成不稳定的问题,同时其生成的图像也更接近真实图像。
To address the problem that the results of text-to-image synthesis tasks have wrong target structures and unclear image textures,a Multi-level Progressive Resolution Generative Adversarial Network(MPRGAN)model was proposed based on Attentional Generative Adversarial Network(AttnGAN).Firstly,a semantic separation-fusion generation module was used in low-resolution layer,and the text feature was separated into three feature vectors by the guidance of selfattention mechanism and the feature vectors were used to generate feature maps respectively.Then,the feature maps were fused into low-resolution map,and the mask images were used as semantic constraints to improve the stability of the lowresolution generator.Finally,the progressive resolution residual structure was adopted in high-resolution layers.At the same time,the word attention mechanism and pixel shuffle were combined to further improve the quality of the generated images.Experimental results showed that,the Inception Score(IS)of the proposed model reaches 4.70 and 3.53 respectively on datasets of Caltech-UCSD Birds-200-2011(CUB-200-2011)and 102 category flower dataset(Oxford-102),which are7.80%and 3.82%higher than those of AttnGAN,respectively.The MPRGAN model can solve the instability problem of structure generation to a certain extent,and the images generated by the proposed model is closer to the real images.
作者
许一宁
何小海
张津
卿粼波
XU Yining;HE Xiaohai;ZHANG Jin;QING Linbo(College of Electronics and Information Engineering,Sichuan University,Chengdu Sichuan 610065,China)
出处
《计算机应用》
CSCD
北大核心
2020年第12期3612-3617,共6页
journal of Computer Applications
基金
国家自然科学基金资助项目(61871278)
四川省科技计划项目(2018HH0143)
四川省教育厅项目(18ZB0355)
成都市产业集群协同创新项目(2016-XT00-00015-GX)。
关键词
文本生成图像
生成对抗网络
自注意力机制
残差结构
像素混洗
text-to-image synthesis
Generative Adversarial Network(GAN)
self-attention mechanism
residual structure
pixel shuffle