摘要
现有的场景文本检测方法在处理任意形状文本时,由于复杂背景的影响会造成文本区域定位不准确、相邻文本漏检误检的问题,基于此提出一种双分支跨级特征融合的自然场景文本检测方法。首先,以Resnet50为主干网络提取初始特征,设计跨级特征分布增强模块(cross-level feature distribution enhancement module,CFDEM),增强跨级特征文本信息的交互性,提高特征的表达能力;然后,为自适应地选择过滤非文本或冗余特征,降低误检率和漏检率,提出自适应融合策略(adaptive fusion strategy,AFS),利用双分支结构加强不同维度特征之间的联系,优化融合过程;最后,预测阶段采用可微分二值化的方法来生成文本检测结果。所提方法在ICDAR2015、ICDAR2017、Total-Text、CTW1500数据集上进行消融实验,实验结果表明该方法能准确定位文本区域,克服文本漏检误检影响。
Current scene text detection methods cause the inaccurate location of text regions and false detection of adja-cent texts due to the influence of complex backgrounds in arbitrarily shaped texts.To solve this issue,a natural scene text detection method based on double-branch cross-level feature fusion is proposed.First,the initial features were ex-tracted using Resnet50 as the backbone network,and then a cross-level feature distribution enhancement module was de-signed to improve the interaction of cross-level feature text information and the expression ability of features.Second,an adaptive fusion strategy was proposed to filter nontext or redundant features adaptively and reduce the false and missed detection rates using the double-branch structure to strengthen the relationship between different dimensional features and optimize the fusion process.Last,the differential binarization method was used to yield text detection res-ults in the prediction phase.The proposed method was employed to perform ablation experiments on the ICDAR2015,ICDAR2017,Total-Text,and CTW1500 datasets.The findings revealed that this method can accurately locate the text area and overcome the impact of text miss and false detections.
作者
刘光辉
张钰敏
孟月波
占华
LIU Guanghui;ZHANG Yumin;MENG Yuebo;ZHAN Hua(School of Information and Control Engineering,Xi’an University of Architecture and Technology,Xi’an 710055,China)
出处
《智能系统学报》
CSCD
北大核心
2023年第5期1079-1089,共11页
CAAI Transactions on Intelligent Systems
基金
国家自然科学基金项目(52278125)
陕西省重点研发计划(2021SF-429)。
关键词
文本检测
任意形状
跨级特征分布增强
自适应融合
双分支
空间维度
通道维度
可微分二值化
text detection
arbitrarily shaped
cross-level feature distribution enhancement
adaptive fusion
double branch
spatial dimension
channel dimension
differentiable binarization