摘要
随着深度学习技术的发展,自然场景文本检测的性能获得了显著的提升.但目前仍然存在两个主要的挑战:一是速度和准确度之间的权衡,二是对任意形状的文本实例的检测.本文采用基于分割的方法高效准确的检测任意形状场景文本.具体来说,使用具有低计算成本的分割头和简洁高效的后处理,分割头由特征金字塔增强模块和特征融合模块组成,前者可以引入多层次的信息来指导更好的分割,后者可以将前者给出的不同深度的特征集合成最终的特征进行分割.本文采用可微二值化模块,自适应地设置二值化阈值,将分割方法产生的概率图转换为文本区域,从而提高文本检测的性能.在标准数据集ICDAR2015和Total-Text上,本文提出的方法使用轻量级主干网络如ResNet18在速度和准确度方面都达到了可比较的结果.
With the development of deep learning technology,the performance of natural scene text detection has been significantly improved.Nonetheless,two main challenges still exist:the first problem is the trade-off between speed and accuracy,and the second one is to model the arbitrary-shaped text instance.In this study,we propose a segmentationbased method to tackle arbitrary-shaped text detection efficiently and accurately.Specifically,we use a low computational-cost segmentation head and efficient post-processing.The segmentation head is made up of Feature Pyramid Enhancement Module(FPEM)and Feature Fusion Module(FFM).FPEM can introduce multi-level information to guide the better segmentation.FFM can integrate the features given by the FPEMs of different depths into a final feature for segmentation.We use a Differentiable Binarization(DB)module,which can perform the binarization process in a segmentation network.Optimized along with a DB module,a segmentation network can adaptively set the thresholds for binarization,which not only simplifies the post-processing but also enhances the performance of text detection.On the standard datasets ICDAR2015 and Total-Text,the method proposed in this study uses a lightweight backbone network such as ResNet18 to achieve comparable results in terms of speed and accuracy.
作者
蔡鑫鑫
王敏
CAI Xin-Xin;WANG Min(College of Computer and Information,Hohai University,Nanjing 211100,China)
出处
《计算机系统应用》
2020年第12期257-262,共6页
Computer Systems & Applications
关键词
自然场景文本检测
分割
特征金字塔增强模块
特征融合模块
可微二值化模块
natural scene text detection
segmentation
feature pyramid enhancement module
feature fusion module
differentiable binarization module