摘要
为解决现有文本引导图像修复模型在处理文本图像融合时模态间信息缺乏高效融合导致修复结果不真实且语义一致性差的问题,提出一种通过条件批量归一化融合图像文本特征实现文本引导的图像修复模型BATF。首先,通过空间区域归一化编码器对破损和未破损区域分别归一化,减少了直接特征归一化对均值方差偏移的影响;其次,将提取的图像特征与文本特征向量通过深度仿射变换进行融合,增强了生成器网络特征图的视觉语义嵌入,使图像和文本特征得到更有效的融合;最后,为增强修复图像的纹理真实性及语义一致性,设计了一种高效鉴别器并引入了目标感知鉴别器。在CUB bird这个带有文本标签的数据集上进行定量和定性实验表明,提出模型在PSNR(peak signal-to-noise ratio)、SSIM (structural similarity)以及MAE(mean absolute error)度量指标分别达到了20.86、0.836和23.832。实验结果表明,BATF模型对比现有的MMFL和ALMR模型效果更好,修复的图像既符合给定文本属性的要求又具有高度语义一致性。
In order to solve the problem that the existing text guided image inpainting models lack efficient fusion of information between modes when dealing with text image fusion,resulting in unreal repair results and poor semantic consistency,this paper proposed a text guided image inpainting model BATF,which integrated image text features through conditional batch normalization.Firstly,it normalized the damaged and undamaged regions respectively by the spatial region normalization encoder to reduce the influence of direct feature normalization on the mean variance shift.Secondly,through the depth affine transformation,it fused the extracted image features and the text feature vectors to enhance the visual semantic embedding of the generator network feature map,so that the image and the features could be fused more effectively.Finally,it designed an efficient discriminator and introduced a target perception discriminator in this paper to enhance the texture authenticity and semantic consistency of the repaired image.Quantitative and qualitative experiments on CUB bird,a text-labeled dataset,show that the proposed model achieves 20.86,0.836,and 23.832 for PSNR,SSIM,and MAE,respectively.BATF model is better than the existing models MMFL and ALMR,and the repaired images both meet the requirements of given text attributes and have high semantic consistency.
作者
兰红
郭福城
Lan Hong;Guo Fucheng(College of Information Engineering,Jiangxi University of Science&Technology,Ganzhou Jiangxi 341000,China)
出处
《计算机应用研究》
CSCD
北大核心
2023年第7期2223-2228,共6页
Application Research of Computers
基金
2021年江西省研究生创新专项资金资助项目(YC2021-S582)。
关键词
文本引导
图像修复
文本图像融合
批量归一化
语义一致性
text guidance
image inpainting
text image fusion
batch normalization
semantic consistency