期刊文献+

基于CLIP和双空间自适应归一化的图像翻译

Image-to-Image Translation Based on CLIP and Dual-Spatially Adaptive Normalization
下载PDF
导出
摘要 现有的图像翻译方法大多依赖数据集域标签来完成翻译任务,这种依赖往往限制了它们的应用范围。针对完全无监督图像翻译任务的方法能够解决域标签的限制问题,但是普遍存在源域信息丢失的现象。为了解决上述2个问题,提出一种基于对比学习语言-图像预训练(CLIP)的无监督图像翻译模型。首先,引入CLIP相似性损失对图像的风格特征施加约束,以在不使用数据集域标签的情况下增强模型传递图像风格信息的能力和准确性;其次,对自适应实例归一化(AdaIN)进行改进,设计一个新的双空间自适应归一化(DSAdaIN)模块,在特征的风格化阶段添加网络的学习和自适应交互过程,以加强对内容源域信息的保留;最后,设计一个鉴别器对比损失来平衡对抗网络损失的训练和优化过程。在多个公开数据集上的实验结果表明,与Star GANv2、Style DIS等模型相比,该模型可在准确传递图像风格信息的同时保留一定的源域信息,且在定量评估指标FID分数和KID分数上分别提升了近3.35和0.57×102,实现了较好的图像翻译性能。 Most existing image-to-image translation methods rely on dataset domain labels,which often limits their application.Although the current methods for truly unsupervised image-to-image translation tasks can address the limitations of domain labels,the loss of source-domain information remains widespread.To address these two problems simultaneously,an unsupervised image-to-image translation model based on Contrastive Language-Image Pre-training(CLIP)is proposed.First,constraints are placed on style features by introducing CLIP similarity loss to enhance the ability and accuracy of the model to convey image-style information without using dataset domain labels.Next,by improving the Adaptive Instance Normalization(AdaIN),a new Dual-Spatially Adaptive Instance Normalization(DSAdaIN)module is designed to add the learning and adaptive interaction processes of the network in the stylized stage of features to enhance the retention of content source domain information.Finally,the training and optimization processes for the adversarial network loss are balanced by designing a discriminator contrastive loss.Experimental results on multiple public datasets demonstrate that the proposed model can accurately transfer the image style information while retaining certain source domain information compared with other models such as StarGANv2 and StyleDIS,and it has improved the quantitative evaluation metrics Fréchet Inception Distance(FID)and Kernel Inception Distance(KID)scores by approximately 3.35 and 0.57×102 orders of magnitude,respectively,successfully achieving a good image-to-image translation performance.
作者 李田芳 普园媛 赵征鹏 徐丹 钱文华 LI Tianfang;PU Yuanyuan;ZHAO Zhengpeng;XU Dan;QIAN Wenhua(School of Information Science and Engineering,Yunnan University,Kunming 650504,Yunnan,China;University Key Laboratory of Internet of Things Technology and Application,Yunnan Province,Kunming 650500,Yunnan,China)
出处 《计算机工程》 CAS CSCD 北大核心 2024年第5期229-240,共12页 Computer Engineering
基金 国家自然科学基金(61163019,61271361,61761046,U1802271,61662087,62061049) 云南省科技厅项目(2014FA021,2018FB100) 云南省科技厅应用基础研究计划重点项目(202001BB050043,2019FA044) 云南省重大科技专项计划项目(202002AD080001) 云南省中青年学术技术带头人后备人才(2019HB121)。
关键词 图像翻译 生成对抗网络 对比学习语言-图像预训练模型 自适应实例归一化 对比学习 image-to-image translation Generative Adversarial Networks(GAN) Contrastive Language-Image Pre-training(CLIP)model Adaptive Instance Normalization(AdaIN) contrastive learning
  • 相关文献

参考文献3

二级参考文献12

共引文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部