摘要
为了提高生成图像质量,提出新的文本生成图像方法,整体框架采用单阶段文本生成图像主干.在原有模型只使用句子信息生成图像的基础上,使用注意力机制把单词信息融入图像特征,采用合理地融入更多文本信息的方式提高生成图像的质量.引入对比损失,使相同语义图像之间更加接近,不同语义图像之间更加疏远,从而更好地保证文本与生成图像之间的语义一致性.在生成器中采用动态卷积来增强生成器的表达能力.实验结果表明,所提方法在数据集CUB(Fréchet inception distance(FID)从12.10提升到10.36)和数据集COCO(FID从15.41提升到12.74)上都获得了较好的性能提升.
A novel text-to-image generation method was proposed to enhance the quality of generated images,utilizing single-stage text-to-image generation backbone.On the basis of the original model that exclusively used sentence information for image generation,an attention mechanism was employed to integrate word information into image features.The quality of generated images was improved by judiciously incorporating additional textual information in a reasonable manner.The introduction of contrast loss makes the same semantic images closer and different semantic images more distant,so as to better ensure the semantic consistency between the text and the generated image.Dynamic convolution was used in the generator to enhance the expression ability of the generator.Experimental results illustrate that the proposed method obtains substantial performance improvements in both the CUB(Fréchet inception distance(FID)from 12.10 to 10.36)and COCO(FID from 15.41 to 12.74)datasets.
作者
杨冰
那巍
向学勤
YANG Bing;NA Wei;XIANG Xue-qin(School of Computer Science and Technology,Hangzhou Dianzi University,Hangzhou 310018,China;Key Laboratory of Brain Machine Collaborative Intelligence of Zhejiang Province,Hangzhou Dianzi University,Hangzhou 310018,China;Hangzhou Lingban Technology Limited Company,Hangzhou 311121,China)
出处
《浙江大学学报(工学版)》
EI
CAS
CSCD
北大核心
2023年第12期2412-2420,共9页
Journal of Zhejiang University:Engineering Science
基金
浙江省基础公益研究计划(LGG22F020027)
国家自然科学基金资助项目(61633010,U1909202)。
关键词
文本生成图像
注意力机制
对比损失
语义一致性
动态卷积
text-to-image generation
attention mechanism
contrastive loss
semantics consistency
dynamic convolution