摘要
针对目前主流的图像编辑方法存在任务单一、操作不友好、保真度低等问题,提出一种基于扩散模型对图像进行高保真编辑的方法。该方法将目前主流的稳定扩散模型作为骨干网络,首先使用低秩适用(LoRA)方法对模型进行微调,使模型能够更好地重建原始图像;其次,使用微调后的模型将图片与简单的提示词通过设计的框架进行推理,最终生成编辑后图像。另外,在上述方法基础上扩展提出了双层U-Net结构用于特定需求的图像编辑任务以及视频合成。与领先的方法 Imagic、DiffEdit、InstructPix2Pix在Tedbench数据集上的对比实验结果显示:所提方法能够对图像进行包括非刚性编辑的多种编辑任务,可编辑性强;而且在学习感知块相似性(LPIPS)指数上比Imagic下降了30.38%,表明该方法具有更高的保真度。
Addressing the issues such as single task,user-unfriendliness,and low-fidelity in current mainstream image editing methods,a diffusion model-based method for high-fidelity image editing was proposed.In the method,with the mainstream stable diffusion model as the backbone network,initially,the model was fine-tuned using Low Rank Adaptation(LoRA)method,so that the model could better reconstruct the original images.Subsequently,the refined model was employed to infer images with simple prompts through a designed framework,ultimately generating edited images.Furthermore,a dual-layer U-Net structure was proposed extensively based on the aforementioned method for specific image editing tasks and video synthesis.Comparative experiments with leading methods Imagic,DiffEdit,and InstructPix2Pix on Tedbench dataset demonstrate that the proposed method can perform various editing tasks to images,including non-rigid editing,with strong editability,and it also has a 30.38%decrease in Learned Perceptual Image Patch Similarity(LPIPS)index compared to Imagic,indicating that the proposed method has a higher fidelity.
作者
刘雨生
肖学中
LIU Yusheng;XIAO Xuezhong(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing Jiangsu 210023,China)
出处
《计算机应用》
CSCD
北大核心
2024年第11期3574-3580,共7页
journal of Computer Applications
关键词
扩散模型
图像编辑
低秩适用
模型微调
U-Net
diffusion model
image editing
Low-Rank Adaptation(LoRA)
model fine-tuning
U-Net