结构先验指导的文本图像修复模型

Structure prior guided text image inpainting model

导出

摘要目的图像修复是根据图像中已知内容来自动恢复丢失内容的过程。目前基于深度学习的图像修复模型在自然图像和人脸图像修复上取得了一定效果,但是鲜有对文本图像修复的研究,其中保证结构连贯和纹理一致的方法也没有关注文字本身的修复。针对这一问题,提出了一种结构先验指导的文本图像修复模型。方法首先以Transformer为基础,构建一个结构先验重建网络,捕捉全局依赖关系重建文本骨架和边缘结构先验图像,然后提出一种新的静态到动态残差模块(static-to-dynamic residual block,StDRB),将静态特征转换到动态文本图像序列特征,并将其融合到编码器—解码器结构的修复网络中,在结构先验指导和梯度先验损失等联合损失的监督下,使修复后的文本笔划连贯,内容真实自然,达到有利于下游文本检测和识别任务的目的。结果实验在藏文和英文两种语言的合成数据集上,与4种图像修复模型进行了比较。结果表明,本文模型在主观视觉感受上达到了较好的效果,在藏文和英文数据集上的峰值信噪比和结构相似度分别达到了42.31 dB,98.10%和39.23 dB,98.55%,使用Tesser⁃act OCR(optical character recognition)识别修复后藏文图像中的文字的准确率达到了62.83%,使用Tesseract OCR、CRNN(convolutional recurrent neural network)以及ASTER(attentional scene text recognizer)识别修复后英文图像中的文字的准确率分别达到了85.13%,86.04%和76.71%,均优于对比模型。结论本文提出的文本图像修复模型借鉴了图像修复方法的思想,利用文本图像中文字本身的特性,取得了更加准确的文本图像修复结果。 Objective Image inpainting is a process of reconstructing the missing regions of corrupted images,which can make the images visually complete and semantically plausible.This process is widely used in many applications,such as object removal,old photo restoration,and image editing.Until now,deep-learning-based inpainting methods have achieved good performance on natural and human face images.Nevertheless,the methods used to ensure consistency in the image texture and structure have limitations in text image inpainting because they do not focus on the text itself.Mean⁃while,studies on text images have mainly concentrated on text image super-resolution,text detection,and text recognition.However,many ancient documents contain broken text regions,which present an obstacle for downstream detection or recognition tasks and for the digital protection of ancient literature.Therefore,reconstructing broken text on images is worthy of further study.This paper proposes a novel text image inpainting model guided by text structure prior to solve the above problem.Method First,the model proposes a structure prior reconstruction network.Given that the text skeleton contains important text stroke information and that the text edge contains texture and structure information,the network chooses both of these priors to guide the inpainting.Due to the limitation of convolutional neural network(CNN)receptive fields,the network applies Transformer to capture the long-term dependency of the text image and reconstructs robust and readable text skeleton and edge image based on the useful feature information extracted from the masked RGB image,the masked text skeleton,and the masked edge image.To reduce the computational cost caused by self-attention in Trans⁃former,the network first downsamples the input image and then sends the compressed features to sequential Transformer layers.The network then upsamples these features to recover the prior images.To construct an accurate text skeleton,the network is trained by the combination of binary cross-entropy loss function and Dice loss function.Second,to explore the sequence feature information of the text itself on the images,this paper designs a static-to-dynamic residual block(StDRB).The text image inpainting network adopts an encoder-decoder as the main architecture and integrates sequential StDRBs to enhance the inpainting performance.The text skeleton image and edge image contain significant text stroke and structure information about the whole image,and the StDRB module can make use of the prior information to effectively help the inpainting.In the first place,the input image is sent to the CNN encoder to obtain the static fused features.Then StDRB can convert the static fused features into dynamic text sequence features.By assuming that the text follows a pseudodynamic process from left to right and top to down,StDRB uses bi-directional gated recurrent units from the horizontal and vertical directions in parallel to extract useful text semantic information.The residual block also deepens the network and facilitates network convergence.Finally,the CNN decoder recovers the missing regions from the features to obtain the inpainting results.To make the restored text images visually realistic and semantically explicit,the network uses presetting parameters to combine several loss functions,such as adversarial,pixel reconstruction,perceptual,and style losses.Given that the aim of text image inpainting is to reconstruct the text stroke,the network also introduces gradient prior loss as one of the joint losses.The gradient prior loss uses the gradient field between the inpainted and ground truth images to restrict the network to generate a sharp text stroke contrast with backgrounds.The training set consists of Tibetan and Eng⁃lish text images that are randomly synthesized using corpus and noisy background images.All the input images are resized to 256×256 pixels for training.The model is implemented in PyTorch and accelerated using an NVIDIA GeForce GTX 1080Ti GPU.The model trains the structure prior reconstruction and text image inpainting networks in two stages to obtain the inpainting results.Result Due to the limited number of studies on text image inpainting,we compare our model with four natural image and face image inpainting models qualitatively and quantitatively.Both of the codes are official ver⁃sions.From the perspective of human vision,the proposed model obtains better holistic inpainting results than the other methods and achieves more detailed and accurate text reconstruction results in large missing regions.As quantitative evalu⁃ation metrics,this paper not only uses image quality evaluations that are widely used in previous inpainting methods but also uses optical character recognition(OCR)results for comparison.These results can effectively show the inpainting effect of broken text on images.Our model has a peak signal-to-noise ratio(PSNR)and structural similarity(SSIM)of 42.31 dB,98.10%on average in the Tibetan dataset,and 39.23 dB,98.55%on average in the English dataset.The character accuracy of Tesseract OCR for the Tibetan dataset is 62.83%,and the character accuracies of Tesseract OCR,convolutional recurrent neural network(CRNN),and attentional scene text recognizer(ASTER)for the English dataset are 85.13%,86.04%,and 76.71%,respectively.Our model obviously outperforms the other algorithms on both datasets.Conclusion This paper proposes a structure prior guided text image inpainting model that aims to reconstruct and use priors to guide text image inpainting.To obtain accurate priors,we use Transformer to improve the quality of our results.In the inpainting process,StDRBs that are integrated into the network extract useful text sequence information and boost the text inpainting performance.The model is also trained by using effective joint loss functions to improve its results.The results on Tibetan and English datasets prove the effectiveness of the proposed model.

作者刘雨轩赵启军潘帆高定国普布旦增 Liu Yuxuan;Zhao Qijun;Pan Fan;Gao Dingguo;Pubu Danzeng(College of Computer Science,Sichuan University,Chengdu 610065,China;School of Information Science and Technology,Tibet University,Lhasa 850011,China;Tibetan Information Technology Innovative Talent Cultivation Demonstration Base,Lhasa 850011,China;College of Electronic Information,Sichuan University,Chengdu 610065,China)

机构地区四川大学计算机学院西藏大学信息科学技术学院藏文信息技术创新人才培养示范基地四川大学电子信息学院

出处《中国图象图形学报》 CSCD 北大核心 2023年第12期3699-3712,共14页 Journal of Image and Graphics

基金国家自然科学基金项目(62066042,61971005-01,62166038)。

关键词图像修复文本图像修复结构先验静态到动态残差模块(StDRB) 联合损失 image inpainting text image inpainting structure prior static-to-dynamic residual block(StDRB) joint loss

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1段荧,龙华,瞿于荃,邵玉斌,杜庆治.基于部分卷积的文字图像不规则干扰修复算法研究[J].计算机工程与科学,2021,43(9):1634-1644. 被引量：3
2强振平,何丽波,陈旭,徐丹.深度学习图像修复方法综述[J].中国图象图形学报,2019,0(3):447-463. 被引量：45

二级参考文献8

1李倩.文档图像的二值化算法综述[J].中国传媒大学学报（自然科学版）,2008,15(4):66-70. 被引量：17
2顾弘,赵光宙,齐冬莲,孙赟,张建良.车牌识别中先验知识的嵌入及字符分割方法[J].中国图象图形学报,2010,15(5):749-756. 被引量：19
3刘华明,毕学慧,叶中付,王维兰.样本块搜索和优先权填充的弧形推进图像修复[J].中国图象图形学报,2016,21(8):993-1003. 被引量：17
4徐小力,蒋章雷,吴国新,王红军,王宁.基于拓扑特征和投影法的东巴象形文识别方法研究[J].电子测量与仪器学报,2017,31(1):150-154. 被引量：12
5曾接贤,王璨.基于优先权改进和块划分的图像修复[J].中国图象图形学报,2017,22(9):1183-1193. 被引量：20
6徐珊,吴开超,张晓丽.基于连通域分析的仪表数字区域定位方法研究[J].科研信息化技术与应用,2017,8(5):19-25. 被引量：3
7段正丽.古字画修复的难点技术[J].遗产与保护研究,2018,3(2):112-114. 被引量：4
8齐佳佳.基于现代数字技术的书画文物修复与保护研究[J].文物鉴定与鉴赏,2019(1):106-107. 被引量：7

共引文献46

1李红蕾.计算机图形图像处理技术在文物保护领域的应用分析[J].计算机产品与流通,2019,8(12):9-9. 被引量：1
2董莉娜,王如琪,刘群.一种结合数据势能的图像补全方法[J].计算机应用研究,2020,37(S02):362-364.
3张柯,白富生,吴至友,皮家甜,赵立军.基于对抗生成网络的人脸照片去网纹技术[J].重庆师范大学学报（自然科学版）,2019,36(6):110-118. 被引量：4
4范新刚.基于深度学习的图像修复技术研究[J].江苏科技信息,2020,37(8):47-49. 被引量：1
5陈永,艾亚鹏,郭红光.改进曲率驱动模型的敦煌壁画修复算法[J].计算机辅助设计与图形学学报,2020,32(5):787-796. 被引量：19
6赵然.基于深度学习的图像修复方法综述[J].科技风,2020,0(18):130-130. 被引量：4
7赵卫东,秦锋.基于色阶阈值模型的Criminisi图像修复算法[J].重庆科技学院学报（自然科学版）,2020,22(4):70-75. 被引量：1
8张磬瀚,孙刘杰,王文举,李佳昕,刘丽.基于生成对抗网络的文物图像修复与评价[J].包装工程,2020,41(17):237-243. 被引量：10
9兰红,刘秦邑.图注意力网络的场景图到图像生成模型[J].中国图象图形学报,2020,25(8):1591-1603. 被引量：5
10孙劲光,杨忠伟,黄胜.全局与局部属性一致的图像修复模型[J].中国图象图形学报,2020,25(12):2505-2516. 被引量：8

1张航.书刻之间:论古代篆书铭刻的相关问题[J].书法,2023(12):177-182.
2郑吉平,王美静,冷端杰.图像识别技术在线上教学中的研究与应用[J].工业和信息化教育,2023(12):85-88.
3仇龙.Trans-SegNet:一种基于Transformer的脑肿瘤图像分割网络[J].电脑知识与技术,2023,19(32):24-26. 被引量：1
4张斌和.基于tesseract.js Web图片文字搜索定位浏览器扩展[J].电脑知识与技术,2023,19(28):60-62.
5孙克雷,周志刚.基于自注意力和位置感知图模型的会话推荐[J].计算机工程与设计,2023,44(12):3722-3728.
6陈婷.帮助老年人适应养老院生活[J].家庭科技,2023(3):16-18.
7贺佳.新时代民族音乐教育创新教育路径探讨——基于普通师范院校的实践案例[J].戏剧之家,2023(36):187-189.
8王坤,刘益东.大学立德树人的知识生产:重申教育的复杂性思维[J].北京教育（高教）,2023(11):19-26.
9连芙蓉,朱玮.短视频平台中农村女性的实践主体性建构研究[J].新闻与传播评论,2024,77(1):82-92. 被引量：3

中国图象图形学报

2023年第12期

浏览历史

内容加载中请稍等...

结构先验指导的文本图像修复模型

参考文献2

二级参考文献8

共引文献46

相关作者

相关机构

相关主题

浏览历史