基于多边形特征池化与融合的复杂文本检测

Complex text region detection based on polygon feature pooling and the transformer

下载PDF

导出

摘要文本检测在图像理解中发挥着重要的作用。基于深度学习的文本检测是当前的主流算法,包括单阶段方法和双阶段方法两类,而且后者的检测精度往往高于前者。双阶段的检测方法通常包含感兴趣区域特征池化操作,为进一步的检测和识别任务提供特定维度的局部区域特征。然而对于弯曲文本等复杂文本区域来说,现有的基于矩形感兴趣区域的池化方法不再适用,而基于点特征替代区域特征的方法又损失了空间信息。针对该问题,提出了一种基于多边形特征池化和Transformer的复杂文本区域检测方法。首先,将复杂文本区域检测中感兴趣区域进行多边形特征池化,将池化操作的区域形状从矩形拓展到多边形并且不需要借助其他形状进行拟合,即可将多边形区域对应的特征池化为固定维度的特征序列,避免了拟合过程中出现误差。进而,将池化后的特征视为具有空间关系的序列,然后利用Transformer融合视觉特征之间的上下文关系,降低训练难度,提升检测精确度。在包含弯曲文本等复杂文本情况的ICDAR2015、MLT、Total Text和CTW1500数据集上的测试实验结果表明,提出的双阶段检测算法能更好地提取感兴趣区域特征,并取得了比现有方法更好的检测结果。 Text detection plays an important role in image understanding,and deep-learning-based algorithms are popular methods including single-stage and two-stage methods.Usually,two-stage based text detection methods have a higher accuracy than the single stage based methods.The two-stage text detection method usually contains the feature pooling operation in the region of interests(RoI),which provides the local region features with fixed dimensions for further detection and recognition tasks.However,for complex text areas such as a curved text,the existing pooling methods based on the rectangular RoI are no longer applicable.Using point features instead of area features to solve the problem loses spatial information compared with area features.To address this issue,we propose a complex text region detection method based on polygon feature pooling and Transformer.First,we extend the feature pooling shape of RoI from the rectangle to the polygon,which does not need any shape fitting.and the features of polygon RoI with fixed dimensions are pooled,which avoids the error in the fitting process.Furthermore,the pooled polygon region features are regarded as context-sensitive sequences,which are input to the Transformer to fuse the context of the visual feature to reduce the training difficulties and improves the detection accuracy.Our experiments on the complex text region datasets,such as ICDAR2015,MLT,Total Text and CTW1500,show that the proposed two-stage detection algorithm can extract the features of RoI very well and achieves better detection results than the state-of-the-art methods.

作者张相南高新波田春娜 ZHANG Xiangnan;GAO Xinbo;TIAN Chunna(School of Electronic Engineering,Xidian University,Xi’an 710071,China;Chongqing Key Laboratory of Image Cognition,College of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,China)

机构地区西安电子科技大学电子工程学院重庆邮电大学计算机科学与技术学院图像认知重庆市重点实验室

出处《西安电子科技大学学报》 EI CAS CSCD 北大核心 2024年第3期113-123,共11页 Journal of Xidian University

基金国家自然科学基金(62173265,62036007)。

关键词文本检测双阶段方法多边形特征池化 TRANSFORMER text region detection two-stage methods polygon feature pooling Transformer

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1刘晓佩,卢朝阳,李静.结合WTLBP特征和SVM的复杂场景文本定位方法[J].西安电子科技大学学报,2012,39(4):103-108. 被引量：11
2李英,田春娜,颜建强,庄怀宇,李相威.一种图像中的文字区域检测新方法[J].西安电子科技大学学报,2013,40(6):187-192. 被引量：6

二级参考文献16

1李雪妍,郭树旭,郜峰利.基于小波模极大值的视频文本区域的提取[J].计算机工程,2007,33(5):26-28. 被引量：3
2Zhang J, Kasturi R. Extraction of Text Objects in Video Documents: Recent Progress[C]//Proc of the 8th International Conference on Pattern Recognition. New York: IEEE Computer Society, 2008: 5-17.
3Liang J, Doermann D, Li H P. Camera-based Analysis of Text and Documents: a Survey[J]. International Journal on Document Analysis and Recognition, 2005, 7(2-3) : 84-104.
4Zhang J, Goldgof D, Kasturi R. A New Edge Based Text Verification Approach for Video [C]//Proc of the Eighth International Conference on Pattern Recognition. New York: IEEE Computer Society, 2008: 1-4.
5Anthimopoulos M, Gatos B, Pratikakis I. A Two-stage Scheme for Text Detection in Video Images [J]. Image and Vision Computing, 2010, 28(12): 1413-1426.
6Pan Yifeng, Hou Xinwen, Liu Chenglin. Text Localization in Natural Scene Images based on Conditional Random Field [C]//Proc of the 10th International Conference on Document Analysis and Recognition. New York: IEEE Computer Society, 2009: 6-10.
7Pan Yifeng, Hou Xinwen, Liu Chenglin. A Hybrid Approach to Detect and Localize Texts in Natural Scene Images [J]. IEEE Trans on Image Processing, 2011, 20(3) : 800-813.
8Ye Qixiang, Jiao Jianbin, Huang Jun, et al. Text Detection and Restoration in Natural Scene Images [J] . Journal of Visual Communication & Image Representation, 2007, 18(6) : 504-513.
9Ojala T, Pietikainen M. Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns[J]. IEEE Trans on Pattern Analysis and Machine Intelligence, 2002, 24(7) : 971-987.
10温浩,卢朝阳,高全学.融合小波变换和张量PCA的人脸识别算法[J].西安电子科技大学报,2009,36(4):602-607.

共引文献15

1姜维,卢朝阳,李静,刘晓佩,姚超.基于视觉显著性和提升框架的场景文字背景抑制方法[J].电子与信息学报,2014,36(3):617-623. 被引量：3
2秦伟.基于聚类及融合技术的数字图像文字提取与识别研究[J].电子技术与软件工程,2014(9):111-111.
3何晓川,许录平,冯冬竹,余航.结合HLBP模型与色彩位置信息的动目标检测方法[J].西安电子科技大学学报,2015,42(4):27-32. 被引量：2
4同鸣,王硕,丁力伟,王纲.HCRF和网络文本的精彩事件自动检测定位[J].西安电子科技大学学报,2015,42(4):81-87.
5林晓红,闫铮,程继伟,朱新建.基于Gabor特征的MRI椎间盘定位与退行性变分级算法研究[J].中国生物医学工程学报,2015,34(6):736-742. 被引量：1
6李海霞,张擎,王青,尹义龙,郝凡昌.肺部CT图像特征的设备无关性研究[J].西安电子科技大学学报,2016,43(1):157-161.
7刘亚亚,于凤芹,陈莹.基于连通区域和统计特征的图像文本定位[J].计算机工程与应用,2016,52(5):165-168. 被引量：5
8蔡华杰,谢光艺.基于WT-BTC特征和SVM组合分类的场景文本检测算法[J].科学技术与工程,2016,16(33):98-102. 被引量：5
9蔡华杰,谢光艺.基于小波变换与SVM分类网络的视频文本检测算法[J].武警工程大学学报,2016,32(6):5-9.
10蔡华杰,谢光艺.基于边缘特征分析和线性判断的文本帧检测算法[J].科学技术与工程,2016,16(36):201-207. 被引量：1

1曾凡军,袁玄成,邹异民.踏面清扫装置对重载机车车轮多边形影响初探[J].汽车知识,2024,24(4):253-256.
2张杜祥,刘成.一种基于双目视觉的立方星位姿参数估计算法[J].空间控制技术与应用,2023,49(6):28-37.
3田子建,吴佳奇,张文琪,陈伟,周涛,杨伟,王帅.基于Transformer和自适应特征融合的矿井低照度图像亮度提升和细节增强方法[J].煤炭科学技术,2024,52(1):297-310. 被引量：1
4何江,池静,池佳稷,高松.视觉-语言多模态下的多任务人脸年龄估计[J].现代电子技术,2024,47(14):171-176.
5张丹,王鑫,侯彪,焦李成.高校科研实验室文化建设在创新人才培养中的探索与实践——以智能感知与图像理解教育部重点实验室为例[J].西安电子科技大学学报（社会科学版）,2024,34(2):103-107.

西安电子科技大学学报

2024年第3期

浏览历史

内容加载中请稍等...

基于多边形特征池化与融合的复杂文本检测

参考文献2

二级参考文献16

共引文献15

相关作者

相关机构

相关主题

浏览历史