面向图像内补与外推问题的迭代预测统一框架

Unified framework with iterative prediction for imageinpainting and outpainting

导出

摘要目的图像内补与外推可看做根据已知区域绘制未知区域的问题,是计算机视觉领域研究热点。近年来,深度神经网络成为解决内补与外推问题的主流方法。然而,当前解决方法多分别对待内补与外推问题,导致二者难以统一处理;且模型多采用卷积神经网络(convolutional neural network,CNN)构建,受到视野局部性限制,较难绘制远距离内容。针对这两个问题,本文按照分而治之思想联合CNN与Transformer构建深度神经网络,提出图像内补与外推统一处理框架及模型。方法将内补与外推问题的解决过程分解为“表征、预测、合成”3个部分,表征与合成采用CNN完成,充分利用其局部相关性进行图像到特征映射和特征到图像重建;核心预测由Transformer实现,充分发挥其强大的全局上下文关系建模能力,并提出掩膜自增策略迭代预测特征,降低Transformer同时预测大范围未知区域特征的难度;最后引入对抗学习提升绘制图像逼真度。结果实验给出在多种数据集下内补与外推对比评测,结果显示本文方法各项性能指标均超越对比方法。通过消融实验发现,模型相比采用非分解方式具有更佳表现,说明分而治之思路功效显著。此外,对掩膜自增策略进行详细的实验分析,表明迭代预测方法可有效提升绘制能力。最后,探究了Transformer关键结构参数对模型性能的影响。结论本文提出一种迭代预测统一框架解决图像内补与外推问题,相较对比方法性能更佳,并且各部分设计对性能提升均有贡献,显示了迭代预测统一框架及方法在图像内补与外推问题上的应用价值与潜力。 Objective Image inpainting and outpainting tasks are significant challenges in the field of computer vision.They involve the filling of unknown regions in an image on the basis of information available in known regions.With itsadvancements,deep learning has become the mainstream approach for dealing with these tasks.However,existing solu⁃tions frequently regard inpainting and outpainting as separate problems,and thus,they lack the ability to adapt seamlesslybetween the two.Furthermore,convolutional neural networks(CNNs)are commonly used in these methods,but their limi⁃tation in capturing long-range content due to locality poses challenges.To address these issues,this study proposes a uni⁃fied framework that combines CNN and Transformer models on the basis of a divide-and-conquer strategy,aiming to dealwith image inpainting and outpainting effectively.Method Our proposed approach consists of three stages:representation,prediction,and synthesis.In the representation stage,CNNs are employed to map the input images to a set of meaningfulfeatures.This step leverages the local information processing capability of CNNs and enables the extraction of relevant fea⁃tures from the known regions of an image.We use a CNN encoder that incorporates partial convolutions and pixel normaliza⁃tion to reduce the introduction of irrelevant information from unknown regions.The extracted features obtained are thenpassed to the prediction stage.In the prediction stage,we utilize the Transformer architecture,which excels in modelingglobal context,to generate predictions for the unknown regions of an image.The Transformer has been proven to be highlyeffective in capturing long-range dependencies and contextual information in various domains,such as natural language pro⁃cessing.By incorporating a Transformer,we aim to enhance the model’s ability to predict accurate and coherent contentfor inpainting and outpainting tasks.To address the challenge of predicting features for large-range unknown regions in par⁃allel,we introduce a mask growth strategy.This strategy facilitates iterative feature prediction,wherein the model progres⁃sively predicts features for larger regions by gradually expanding the inpainting or outpainting task.This iterative processhelps the model refine its predictions and capture more related contextual information,leading to improved results.Finally,we reconstruct the complete image in the synthesis stage by combining the predicted features with the known fea⁃tures from the representation stage.This synthesis aims to generate visually appealing and realistic results by leveraging thestrengths of a CNN decoder that consists of multiple convolution residual blocks.Upsampling intervals are utilized,reduc⁃ing the difficulty of model optimization.Result To evaluate the effectiveness of our proposed method,we conduct compre⁃hensive experiments on diverse datasets that encompass objects and scenes for image inpainting and outpainting tasks.Wecompare our approach with state-of-the-art methods and utilize various evaluation metrics,including structural similarityindex measure,peak signal-to-noise ratio,and perceptual quality metrics.The experimental results demonstrate that ourunified framework surpasses existing methods across all evaluation metrics,demonstrating its superior performance.Thecombination of CNNs and a Transformer allows our model to capture local details and long-range dependencies,resulting inmore accurate and visually appealing inpainting and outpainting results.In addition,ablation studies are conducted to con⁃firm the effectiveness of each component of our method,including the framework structure and the mask growth strategy.Through ablation experiments,all three stages are confirmed to contribute to performance improvement,highlighting theapplicability of our method.Furthermore,we empirically investigate the effect of the head and layer numbers of the Trans⁃former model on overall performance,revealing that appropriate numbers of iterations,Transformer heads,and Trans⁃former layers can further enhance the framework’s performance.Conclusion This study introduces an iterative predictionunified framework for addressing image inpainting and outpainting challenges.Our proposed method outperforms existingapproaches in terms of performance,with each aspect of the design contributing to overall improvement.The combinationof CNNs and a Transformer enables our model to capture the local and global contexts,leading to more accurate and visu⁃ally coherent image inpainting and outpainting results.These findings underscore the practical value and potential of aniterative prediction unified framework and method in the field of image inpainting and outpainting.Future research direc⁃tions include exploring the application of our framework to other related tasks and further optimizing the model architecturefor enhanced efficiency and scalability.Moreover,an important aspect that can be explored to enhance our proposed frame⁃work is the integration of self-supervised learning techniques with large-scale datasets.This step can potentially improvethe robustness and generalization capability of our model for image inpainting and outpainting tasks.

作者郭冬升顾肇瑞郑冰董军宇郑海永 Guo Dongsheng;Gu Zhaorui;Zheng Bing;Dong Junyu;Zheng Haiyong(Faculty of Information Science and Engineering,Ocean University of China,Qingdao 266100,China;Inspur Academy of Science and Technology,Jinan 250101,China)

机构地区中国海洋大学信息科学与工程学部山东浪潮科学研究院有限公司

出处《中国图象图形学报》 CSCD 北大核心 2024年第2期491-505,共15页 Journal of Image and Graphics

基金国家自然科学基金项目(62171421) 山东省泰山学者青年专家计划(tsqn202306096)。

关键词图像内补图像外推分而治之迭代预测 TRANSFORMER 卷积神经网络(CNN) image inpainting image outpainting divide-and-conquer iterative prediction Transformer convolutionalneural network(CNN)

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1强振平,何丽波,陈旭,徐丹.深度学习图像修复方法综述[J].中国图象图形学报,2019,0(3):447-463. 被引量：45
2王倩娜,陈燚.面向图像修复的增强语义双解码器生成模型[J].中国图象图形学报,2022,27(10):2994-3009. 被引量：5

二级参考文献4

1刘华明,毕学慧,叶中付,王维兰.样本块搜索和优先权填充的弧形推进图像修复[J].中国图象图形学报,2016,21(8):993-1003. 被引量：17
2曾接贤,王璨.基于优先权改进和块划分的图像修复[J].中国图象图形学报,2017,22(9):1183-1193. 被引量：20
3刘坤华,王雪辉,谢玉婷,胡坚耀.Edge-guided GAN:边界信息引导的深度图像修复[J].中国图象图形学报,2021,26(1):186-197. 被引量：7
4强振平,何丽波,陈旭,徐丹.深度学习图像修复方法综述[J].中国图象图形学报,2019,0(3):447-463. 被引量：45

共引文献48

1李红蕾.计算机图形图像处理技术在文物保护领域的应用分析[J].计算机产品与流通,2019,8(12):9-9. 被引量：2
2董莉娜,王如琪,刘群.一种结合数据势能的图像补全方法[J].计算机应用研究,2020,37(S02):362-364.
3张柯,白富生,吴至友,皮家甜,赵立军.基于对抗生成网络的人脸照片去网纹技术[J].重庆师范大学学报（自然科学版）,2019,36(6):110-118. 被引量：4
4范新刚.基于深度学习的图像修复技术研究[J].江苏科技信息,2020,37(8):47-49. 被引量：1
5陈永,艾亚鹏,郭红光.改进曲率驱动模型的敦煌壁画修复算法[J].计算机辅助设计与图形学学报,2020,32(5):787-796. 被引量：20
6赵然.基于深度学习的图像修复方法综述[J].科技风,2020,0(18):130-130. 被引量：4
7赵卫东,秦锋.基于色阶阈值模型的Criminisi图像修复算法[J].重庆科技学院学报（自然科学版）,2020,22(4):70-75. 被引量：1
8张磬瀚,孙刘杰,王文举,李佳昕,刘丽.基于生成对抗网络的文物图像修复与评价[J].包装工程,2020,41(17):237-243. 被引量：10
9兰红,刘秦邑.图注意力网络的场景图到图像生成模型[J].中国图象图形学报,2020,25(8):1591-1603. 被引量：5
10孙劲光,杨忠伟,黄胜.全局与局部属性一致的图像修复模型[J].中国图象图形学报,2020,25(12):2505-2516. 被引量：8

1豪华拖拉机比拼:Fendt 1050、约翰迪尔9RX和Magnum 4004静态对比评测[J].农业机械,2023(7):32-32.
2杨凯,张淼,祁苗苗.铁路车辆监测图像识别模型训练及验证平台研究[J].铁路计算机应用,2023,32(6):26-30. 被引量：1
3杨明霞,朱星辉.基于马尔可夫决策过程的附加服务定价优化[J].哈尔滨商业大学学报（自然科学版）,2024,40(1):118-122.
4刘银.西门子3AP1-FG断路器动作速度测试方法仿真分析[J].电气技术与经济,2023(3):27-31.
5陈星,陈卓,杨博文,李翱翔.基于航迹消除与策略迭代的无人机集群区域目标搜索方法[J].指挥控制与仿真,2024,46(1):37-43.
6沈健.人工智能视频生成技术将引发全球电影制作革命[J].世界知识,2024(6):72-73.
7王爱萍,叶靓.基于DIS的影响滑动摩擦力大小因素的实验装置改进[J].中学物理,2024,42(6):41-45. 被引量：1
8陈佳妮,徐达文.利用可变预测的密文域可逆信息隐藏[J].中国图象图形学报,2024,29(1):95-110.
9张玲,穆文鹏,陈北京.面向图像拼接检测的自适应残差算法[J].中国图象图形学报,2024,29(2):419-429.
10范奕辰,张艳红.战红海,山东销售乘风破浪[J].加油站服务指南,2024(1):44-45.

中国图象图形学报

2024年第2期

浏览历史

内容加载中请稍等...

面向图像内补与外推问题的迭代预测统一框架

参考文献2

二级参考文献4

共引文献48

相关作者

相关机构

相关主题

浏览历史