摘要
基于深度学习的场景文本检测普遍缺少特征级的精细化,导致原本设计良好的模型不能被充分利用,提出将特征融合和特征金字塔注意力模块应用到场景文本检测。将基本特征提取网络(PixelLink算法)得到的4个特征映射层以采样后加权叠加的方式进行特征融合,并将结果送给特征金字塔注意力模块。特征融合使各层级的特征信息相结合,从而增加了特征映射层的信息量。采用注意力网络可以在增大感受野的同时不提高计算力,而空间金字塔结构可利用不同的网格尺度或不同的扩张率融合多尺度的特征信息。特征金字塔注意力模块包含精细化金字塔网络分支、非线性变换分支以及全局平均池化分支。实验结果表明,相较于PixelLink算法,该算法在ICDAR2015和ICDAR2013数据集上综合指标(F-measure,F)分别提升了2.91%和4.04%。
At present,text detection in natural scenes based on deep learning generally lacks the refinement of feature level,which results in the fact that the well-designed models cannot be fully utilized.In order to solve the above problem,the combination of feature fusion and feature pyramid attention module are proposed to implement the natural scene text detection.The four feature mapping layers obtained from the basic feature extraction network(PixelLink algorithm)are fused by means of using weighted-overlap after sampling,and sent to the feature pyramid attention module.The feature fusion module combines feature information of each level to increase the amount of information in the feature map layer.The attention network can expand the receptive field without more computing power,and the spatial pyramid structure employs different grid scales or expansion rates to fuse the multi-scale feature information.The feature pyramid attention module includes three branches:the refined pyramid network,the nonlinear transformation and the global average pooling.Compared with the PixelLink algorithm,our algorithm achieves F-measure improvement of 2.91%and 4.04%on ICDAR2015 and ICDAR2013,respectively.
作者
冯宇静
贾世杰
FENG Yujing;JIA Shijie(College of Electrical Information Engineering,Dalian Jiaotong University,Dalian 116028,P.R.China)
出处
《重庆邮电大学学报(自然科学版)》
CSCD
北大核心
2022年第1期110-116,共7页
Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)
基金
辽宁省教育厅科学研究项目(JDL2019006)。