摘要
跨模态学习是人工智能领域中长期研究的课题之一,依据文本描述生成图像成为近几年的热门研究领域,主要任务是根据文本描述生成和文本高度相关性的图像。文中总结了文本到图像生成领域中的研究现状和最新进展,从生成框架上将生成模型分为生成对抗网络框架方法和非生成对抗网络方法,又根据训练策略将生成对抗网络框架方法细分为单阶段、多阶段和额外监督等类别,同时介绍了经典的一些非生成对抗网络方法。最后给出文本生成图像任务采用的数据集和评估标准,提出了当前方法的不足和尚未解决的问题,指出了未来的研究方法。
Cross-modal learning is one of the medium and long-term research topics in the field of artificial intelligence.Image generation based on text descriptions has become a hot research field in recent years.The main task is to generate images that are highly correlated with text based on text descriptions.This paper summarizes the research status and latest progress in the field of text-to-image generation.From the generation framework,the generation model is divided into generative adversarial network framework method and non-generative adversarial network method.According to the training strategy,the generative adversarial network framework method is subdivided into single-stage,multi-stage,and additional supervision categories,while introducing some classic non-generative adversarial network methods.Finally,the data set and evaluation standard used in the text generation image task are given,the shortcomings and unsolved problems of the current method are proposed,and the future research methods are pointed out.
作者
王鹏
WANG Peng(School of Artificial Intelligence(School of Future Technology),Nanjing University of Information Science and Technology,Nanjing 210044,China)
出处
《信息技术》
2024年第7期148-159,共12页
Information Technology
关键词
文本到图像生成
生成对抗网络
扩散模型
单阶段生成
多阶段生成
text to image generation
generative adversarial networks
diffusion models
single-stage generation
multi-stage generation