基于扩散模型微调的高保真图像编辑

High-fidelity image editing based on fine-tuning of diffusion model

下载PDF

导出

摘要针对目前主流的图像编辑方法存在任务单一、操作不友好、保真度低等问题,提出一种基于扩散模型对图像进行高保真编辑的方法。该方法将目前主流的稳定扩散模型作为骨干网络,首先使用低秩适用(LoRA)方法对模型进行微调,使模型能够更好地重建原始图像;其次,使用微调后的模型将图片与简单的提示词通过设计的框架进行推理,最终生成编辑后图像。另外,在上述方法基础上扩展提出了双层U-Net结构用于特定需求的图像编辑任务以及视频合成。与领先的方法 Imagic、DiffEdit、InstructPix2Pix在Tedbench数据集上的对比实验结果显示:所提方法能够对图像进行包括非刚性编辑的多种编辑任务,可编辑性强;而且在学习感知块相似性(LPIPS)指数上比Imagic下降了30.38%,表明该方法具有更高的保真度。 Addressing the issues such as single task,user-unfriendliness,and low-fidelity in current mainstream image editing methods,a diffusion model-based method for high-fidelity image editing was proposed.In the method,with the mainstream stable diffusion model as the backbone network,initially,the model was fine-tuned using Low Rank Adaptation(LoRA)method,so that the model could better reconstruct the original images.Subsequently,the refined model was employed to infer images with simple prompts through a designed framework,ultimately generating edited images.Furthermore,a dual-layer U-Net structure was proposed extensively based on the aforementioned method for specific image editing tasks and video synthesis.Comparative experiments with leading methods Imagic,DiffEdit,and InstructPix2Pix on Tedbench dataset demonstrate that the proposed method can perform various editing tasks to images,including non-rigid editing,with strong editability,and it also has a 30.38%decrease in Learned Perceptual Image Patch Similarity(LPIPS)index compared to Imagic,indicating that the proposed method has a higher fidelity.

作者刘雨生肖学中 LIU Yusheng;XIAO Xuezhong(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing Jiangsu 210023,China)

机构地区南京邮电大学计算机学院、软件学院、网络空间安全学院

出处《计算机应用》 CSCD 北大核心 2024年第11期3574-3580,共7页 journal of Computer Applications

关键词扩散模型图像编辑低秩适用模型微调 U-Net diffusion model image editing Low-Rank Adaptation(LoRA) model fine-tuning U-Net

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1李晓丹,程刚,王学莹,侯静.基于图像生成和编辑的人工智能在艺术设计中的应用[J].上海轻工业,2024(6):56-58.
2常明.数字美术在当代艺术中的应用与发展趋势研究[J].中文科技期刊数据库（引文版）教育科学,2024(10):0021-0024.
3Elijah Omwansa Mariera.Onomatopoeic Infinitives and Nouns in EkeGusii:Evidence for imagic and relative iconicity[J].宏观语言学,2020,8(1):36-54. 被引量：2
4苏佳,贾欣雨,侯卫民.基于YOLO-J的PCB缺陷检测算法[J].计算机集成制造系统,2024,30(11):3984-3998.
5赖明曦,杜瑞颖,陈晶,何琨.一种去中心化且可追责的可编辑区块链方案[J].武汉大学学报（理学版）,2024,70(4):413-420.
6郭媛,陈晖.“参与式教学法”对检验科实习带教教学质量的影响[J].中文科技期刊数据库（文摘版）教育,2024(11):177-180.
7李佳杰,林云锋.框架核酸在口腔医学中的潜在应用[J].口腔医学,2024,44(9):641-647.
8陈鸿鹄,陶云帆,张举勇.三维穿衣人体重建综述——从传统方法到高保真模型[J].中国图象图形学报,2024,29(9):2566-2595.
9何红艳.牛羊饲养中精饲料与粗饲料的合理配比技术[J].农家科技,2024(33):115-117.
10张永定.视觉化呈现增强新闻照片感染力[J].新闻战线,2024(21):77-79.

计算机应用

2024年第11期

浏览历史

内容加载中请稍等...

基于扩散模型微调的高保真图像编辑

相关作者

相关机构

相关主题

浏览历史