摘要
文本生成图像是指将语句形式的文本描述翻译成与文本具有相似语义的图像。在早期研究中,图像生成任务主要基于关键字或语句的检索来实现与文本匹配的视觉内容的对齐。随着生成对抗网络的出现,文本生成图像的方法在视觉真实感、多样性和语义相似性方面取得了重大进展。生成对抗网络通过生成器和鉴别器之间的对抗来生成合理且真实的图像,并在图像修复和超分辨率生成等领域显示出良好的能力。在回顾并总结文本生成图像领域最新研究成果的基础上,文中提出了一种新的分类方法,即注意力增强、多阶段增强、场景布局增强和普适性增强,并讨论了文本生成图像面临的挑战和未来的发展方向。
The text-to-image synthesis refers to translating the text description in sentence form into an image with similar semantics to the text.In the early research,the task of image generation is mainly based on keyword or sentence retrieval to align the visual content matched with the text.With the generative adversarial network,the method of text-to-image synthesis has made great progress in visual realism,diversity and semantic similarity.The generative adversarial network generates reasonable and real images through the confrontation between generator and discriminator,and shows strong ability in the fields of image restoration and super-resolution generation.Based on the review and summary of the latest research results in the field of text-to-image synthesis,a new classification method is proposed:Attention enhancement,multi-stage enhancement,scene layout enhancement and universality enhancement.The challenges and future development direction of text-to-image synthesis are also discussed in this study.
作者
李乐阳
佟国香
赵迎志
罗琦
LI Yueyang;TONG Guoxiang;ZHAO Yingzhi;LUO Qi(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
出处
《电子科技》
2023年第10期39-55,共17页
Electronic Science and Technology
基金
国家重点研发计划项目(2018YFB1700902)。
关键词
图像生成
视觉内容对齐
文本匹配
生成器
鉴别器
语义相似性
生成对抗网络
场景布局
image generation
aligning the visual content
text matching
generator
discriminator
semantic similarity
generative adversarial network
scene layout