摘要
食物图像生成主要研究从一组特定的配料中生成膳食图像,该任务属于文本到图像任务的范畴。但由于与膳食图像相关的因素较复杂,生成逼真食品图像的类似工作迄今未能完全实现。现有的方法基于配料和烹饪信息利用生成对抗网络逐步产生高质量的样本,但不能覆盖整个分布,因此很难达到条件生成高质量图像的目的。扩散模型是一类基于似然性的模型,最近已被证明可以产生高质量的图像,同时提供理想的特性,如分布覆盖、固定训练目标和易于扩展。通过跨模态信息关联并引导扩散模型根据类别信息生成高质量食物图片。在Recipe1M数据集上的结果表明,模型性能比基线方法有显著的提升。
Research on food image generation primarily focuses on generating meal images from a specific set of ingredients,falling under the category of text-to-image tasks.However,due to the complexity associated with dietary images,similar efforts to generate realistic food images have yet to achieve complete success.Existing methods utilize generative adversarial networks(GANs)based on ingredient and cooking information to progressively generate high-quality samples.However,these methods may fail to cover the entire distribution,making it challenging to achieve the goal of conditionally generating high-quality images.Diffusion models,a class of likelihood-based models,have recently been demonstrated to generate high-quality images while offering desirable properties such as distribution coverage,fixed training objectives,and ease of scalability.This paper explores the utilization of cross-modal information association and guidance of diffusion models to generate high-quality food images based on category information.Results on the Recipe1M dataset demonstrate a significant improvement in model performance compared to baseline methods.
作者
徐桓程
Xu Huancheng(School of Computing and Artificial Intelligence,Southwest Jiaotong University,Chengdu 611756,China)
出处
《现代计算机》
2024年第16期69-73,共5页
Modern Computer
关键词
扩散模型
食谱
图像生成
diffusion models
recipe
image generation