摘要
语义图像合成是图像翻译领域一个重要的研究和应用方向,旨在利用输入的语义图像(如语义分割图、地图、线稿图等)生成与图像描述相符的真实图像。针对基于生成对抗网络(GAN)的语义图像合成任务由于缺乏全局信息导致生成图像特征模糊、纹理细节缺乏关联性的问题,基于pix2pix网络提出一种结合外部注意力机制改进的全局信息增强语义图像合成方法。首先,在U-net结构的生成器上采样阶段引入外部注意力机制,增强生成图像像素间的空间相关性;其次,在生成器上采样层使用深度残差模块,在提高生成图像质量的同时,增强生成图像的多样性;最后,鉴别器引入全局信息以增强鉴别能力。在cityscape、landscape、edges2shoes数据集上进行实验,改进后的方法相比基线模型在FID指标上分别提升了57.37、26.74和1.78。实验结果表明,该模型能够有效利用全局信息增强生成图像纹理细节的关联性,提高图像质量。
Semantic image synthesis is an important application and research direction in the field of image translation.Its aim is to generate real images that are consistent with image descriptions using input semantic images,such as semantic segmentation maps,maps and sketches.In response to the problems of blurry image features and lack of correlation in texture details due to the lack of global information in semantic image synthesis tasks based on generative adversarial networks(GANs),this paper proposes a global information-enhanced semantic image synthesis method based on the pix2pix network model,combined with an external attention mechanism.Firstly,an external attention mechanism is introduced in the upsampling stage of the generator with a U-net structure to enhance the spatial correlation between generated image pixels.Secondly,deep residual modules are used in the upsampling layers of the generator to improve the quality of generated images while enhancing the diversity of the generated images.Finally,the discriminator incorporates global information to enhance its discrimination ability.Experimental evaluations on the Cityscape,Landscape,and Edges2shoes datasets demonstrate the effectiveness of the proposed model.Compared to the baseline model,the improved method achieves improvements of 57.37,26.74,and 1.78 in terms of the FID(Fréchet Inception Distance)metric for the Cityscape,Landscape,and Edges2shoes datasets,respectively.The results show that the proposed model can effectively utilize global information to enhance the correlation of texture details in generated images and improve the quality of generated images.
作者
刘勇
李俊岐
陈永强
LIU Yong;LI Junqi;CHEN Yongqiang(School of Computer&Artificial Intelligence,Wuhan Textile University;Engineering Research Center of Hubei Province for Clothing Information,Wuhan 430200,China)
出处
《软件导刊》
2024年第10期214-220,共7页
Software Guide
关键词
图像翻译
语义图像合成
生成对抗网络
深度学习
计算机视觉
image-to-image translation
semantic image synthesis
generative adversarial network
deep learning
computer vision