期刊文献+

基于激发和汇聚注意力的扩散模型生成对象的位置控制方法

Location control method for generated objects by diffusion model with exciting and pooling attention
下载PDF
导出
摘要 由于文本的模糊性和训练数据中位置信息的缺失,当前先进的扩散模型无法在文本提示的条件下准确控制生成对象在图像中的位置。针对这一问题,加入对象位置范围的空间条件,并基于U-Net中的交叉注意力图和图像空间布局的强关联性,提出一种注意力引导方法控制注意力图的生成,以控制对象的生成位置。具体地,基于稳定扩散(SD)模型,在U-Net层中的交叉注意力图生成的早期阶段,通过引入损失激发相应位置范围的高注意力值,减小范围外的平均注意力值,并在每一个去噪步骤中逐步优化隐空间中的噪声向量,从而控制注意力图的生成。实验结果表明,所提方法能明显控制一个或多个对象在生成图像中的位置,并在生成多个对象时能减少对象缺失、生成冗余对象和对象融合的现象。 Due to the ambiguity of text and the lack of location information in training data,current state-of-the-art diffusion model cannot accurately control the locations of generated objects in the image under the condition of text prompts.To address this issue,a spatial condition of the object’s location range was introduced,and an attention-guided method was proposed based on the strong correlation between the cross-attention map in U-Net and the image spatial layout to control the generation of the attention map,thus controlling the locations of the generated objects.Specifically,based on the Stable Diffusion(SD)model,in the early stage of the generation of the cross-attention map in the U-Net layer,a loss was introduced to stimulate high attention values in the corresponding location range,and reduce the average attention value outside the range.The noise vector in the latent space was optimized step by step in each denoising step to control the generation of the attention map.Experimental results show that the proposed method can significantly control the locations of one or more objects in the generated image,and when generating multiple objects,it can reduce the phenomenon of object omission,redundant object generation,and object fusion.
作者 徐劲松 朱明 李智强 郭世杰 XU Jinsong;ZHU Ming;LI Zhiqiang;GUO Shijie(College of Computer and Information Engineering,Hubei University,Wuhan Hubei 430062,China)
出处 《计算机应用》 CSCD 北大核心 2024年第4期1093-1098,共6页 journal of Computer Applications
基金 国家自然科学基金资助项目(62106069)。
关键词 注意力图 扩散模型 位置控制 文本引导 图像生成 attention map diffusion model location control text guidance image generation
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部