摘要
文本到图像生成方法采用自然语言与图像集特征的映射方式,根据自然语言描述生成相应图像,利用语言属性智能地实现视觉图像的通用性表达。基于卷积神经网络的深度学习技术是当前文本到图像生成的主流方法,为系统地了解该领域的研究现状和发展趋势,按照模型构建及技术实现形式的不同,将已有的技术方法分为直接图像法、分层体系结构法、注意力机制法、周期一致性法、自适应非条件模型法及附加监督法共六类。分别对这些方法进行总结归纳和讨论,论述其构建思路、模型特点、优势及局限性,并对主要的评价指标开展分析对比,最后讨论该技术在模型方法、评价方法和技术改进等方面面临的挑战及未来展望。
The text-to-image generation method, through using a natural language to map image set features, can generate corresponding images based on natural language descriptions, and use language attributes to intelligently realize the universal expression of visual images. Deep learning technology based on convolutional neural network is the current mainstream method of text-to-image generation. In order to systematically understand the research status and development trend of this field, according to the difference of model construction and technology realization form, the existing technical methods can be divided into six categories:direct text-to-image methods, stacked architecture methods, attention mechanism methods, cycle consistency methods, adapting unconditional model methods and additional supervision methods. In this paper, they are summarized and discussed separately. The construction ideas, model characteristics, advantages and limitations of these methods are discussed, and the main evaluation indicators are analyzed and compared. Finally, the challenges and future prospects of this technology are discussed in terms of model methods, evaluation methods and technological improvements.
作者
王宇昊
何彧
王铸
WANG Yuhao;HE Yu;WANG Zhu(Guizhou Tianyan Juheng Technology Co.,Ltd.,Guiyang,Guizhou 550081,China;College of Earth and Space Sciences,Peking University,Beijing 100871,China;College of Geography&Environmental Science,Guizhou Normal University,Guiyang 550025,China)
出处
《计算机工程与应用》
CSCD
北大核心
2022年第10期50-67,共18页
Computer Engineering and Applications
关键词
文本到图像生成方法
深度学习
卷积神经网络
评价指标
text-to-image generation method
deep learning
convolutional neural network
evaluation indicator