基于多尺度注意力特征融合的场景文本检测

Text Detection Algorithm Based on Multi-Scale Attention Feature Fusion

下载PDF

导出

摘要针对目前文本检测中小尺度文本和长文本检测精度低的问题,提出了一种基于多尺度注意力特征融合的场景文本检测算法。该方法以Mask R-CNN为基线模型,引入Swin_Transformer作为骨干网络提取底层特征。在特征金字塔(feature pyramid networks,FPN)中,通过将多尺度注意力热图与底层特征通过横向连接相融合,使检测器的不同层级专注于特定尺度的目标,并利用相邻层注意力热图之间的关系实现了FPN结构中的纵向特征共享,避免了不同层之间梯度计算的不一致性问题。实验结果表明:在ICDAR2015数据集上,该方法的准确率、召回率和F值分别达到了88.3%、83.07%和85.61%,在CTW1500和Total-Text弯曲文本数据集上相较现有方法均有良好表现。 Aiming at the low detection accuracy of small scale text and long text in text detection,a scene text detection algorithm based on multi-scale attention feature fusion is proposed.This method takes Mask R-CNN as the baseline model,selects Swin_Transformer as the backbone network to extract the bottom features.In the feature pyramid networks(FPN),the multi-scale attention heat maps are fused with the bottom features through lateral connection,so that different layers of the detector focus on specific scale targets,and the vertical feature sharing in FPN structure is realized by using the relationship between the adjacent attentional heat maps,avoiding the inconsistency of gradient calculation among different layers.Experimental results demonstrate that the accuracy,recall and F-value of this method reach 88.3%,83.07%and 85.61%respectively on ICDAR2015 data set,and it performs well than the existing methods on CTW1500 and Total-Text curved text data set.

作者厍向阳刘哲董立红 SHE Xiangyang;LIU Zhe;DONG Lihong(College of Computer Science and Technology,Xi’an University of Science and Technology,Xi’an 710054,China)

机构地区西安科技大学计算机科学与技术学院

出处《计算机工程与应用》 CSCD 北大核心 2024年第1期198-206,共9页 Computer Engineering and Applications

基金陕西省自然科学基础研究(2019JLM-11) 陕西省科技计划(2021JQ-576) 陕西省教育厅项目(19JK0526)。

关键词场景文本检测 Mask R-CNN Swin Transformer 注意力机制多尺度特征融合 scene text detection Mask R-CNN Swin Transformer attention mechanism multi-scale feature fusion

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1张伟娜,李卫东,李晓娟,赵子琪.基于双边实时语义分割的金属缺陷检测[J].河北省科学院学报,2023,40(6):1-8.

计算机工程与应用

2024年第1期

浏览历史

内容加载中请稍等...

基于多尺度注意力特征融合的场景文本检测

相关作者

相关机构

相关主题

浏览历史