基于深度学习的街景下的文本检测

Text detection in street scene based on deep learning

下载PDF

导出

摘要针对自然街景文本角度倾斜、形状弯曲、长度不定等特点,提出一种基于注意力机制的自然街景文本检测方法,通过利用注意力机制的优势,对主干网络提取的特征进行加权融合,从而提升整体网络的检测性能.首先,针对特征金字塔(FPN)横向连接中特征信息丢失的问题,引入注意力融合模块AFFM(Attention Feature Fusion Module),通过计算高维和低维特征的融合权重,来改进原FPN中简单直接相加的特征融合方式,从而减少FPN特征融合过程中文本信息丢失,增强网络的特征提取能力.其次,针对不同尺度特征图中的文本特征,引入一个子空间注意力模块SAM(Subspace Attention Module),通过将多尺度融合特征图按通道划分为数个子空间特征图,分别学习每个子空间中的文本特征权重,使得融合后的特征图包含更多不同尺度的文本特征,从而增强融合特征图对文本实例的表征能力,进而提升网络的检测效果.在公开数据集Total-Text上对模型进行评估,实验结果表明,该算法与目前快速高效的DBNet相比,准确率、召回率和F值分别提高了0.5%、0.4%和0.4%. Aiming at the characteristics of natural street text such as skewed angle,curved shape and variable length,an attention mechanism-based natural street text detection method is proposed to improve the overall network detection performance by taking advantage of the attention mechanism and weighting the fusion of features extracted from the backbone network.Firstly,to address the problem of feature information loss in the lateral connection of the feature pyramid network(FPN),we introduce the Attention Feature Fusion Module(AFFM)to improve the feature fusion method of simple direct summation in the original FPN by calculating the fusion weights of high and low-dimensional features,so as to reduce the text information loss in the process of FPN feature fusion and thus enhance the feature detection performance of the network.This reduces the loss of text information in the process of FPN feature fiision,and enhances the feature extraction capability of the network.Secondly,a Subspace Attention Module(SAM)is introduced for text features in feature maps of different scales,by dividing the multi-scale fused feature map into several subspace feature maps by channels,and learning the text feature weights in each subspace separately,so that the fused feature map contains more text features of different scales,thus The fused feature maps contain more text features at different scales,thus enhancing the characterization ability of the fused feature maps for text instances,and thus improving the detection effect of the network.The model is evaluated on the publicly available dataset Total-Text,and the experimental results show that the algorithm improves the accuracy,recall and F-value by 0.5%,0.4%and 0.4%,respectively,compared with the current fast and efficient DBNet.

作者朱志颖程艳云 ZHU Zhiying;CHENG Yanyun(School of Automation,School of Artificial Intelligence,Nanjing University of Posts and Telecommunications,Nanjing 210046,Jiangsu,China)

机构地区南京邮电大学自动化学院、人工智能学院

出处《微电子学与计算机》 2023年第2期79-86,共8页 Microelectronics & Computer

基金国家自然科学基金青年科学基金(61802204)。

关键词文本检测注意力机制特征增强特征融合注意力子空间注意力 scene text detection attention mechanism feature enhancement attention feature fusion subspace attention

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]