摘要
当前场景文本检测技术面临的挑战主要体现在2个方面:模型实时性和准确性之间的权衡,以及任意形状文本的检测。它们决定了场景文本检测在真实场景中应用是否可行。针对以上2个问题,本文采用基于分割的方法,提出一种轻量且特征提取能力强的主干网络,可以实时准确地检测任意形状的自然场景文本。具体来说,使用了结构简单的双分辨率残差主干网络和低计算成本的深度聚合金字塔池化模块,将二者提取到的特征融合使用可微二值化模块进行分割。通过在标准英文数据集ICDAR2015上进行的对比实验表明,本文提出的改进方法有效,且在实时性和准确性上都达到可比较的结果。
The current challenges of scene text detection technology are mainly reflected in two aspects:the trade-off between model real-time performance and accuracy,and the detection of arbitrary shape text.They determine whether scene text detection is feasible in real scenes.Aiming at the above two problems,this paper proposes a lightweight backbone network with strong feature extraction ability based on segmentation method,which can accurately detect natural scene text of arbitrary shape in real time.Specifically,a simple dual-resolution residual backbone network and a deep aggregate pyramid pooling module with low computational cost are used,and the features extracted from them are fused and segmented using a differentiable binarization module.Through the comparative experiment on the standard English dataset ICDAR2015,the result show that the improved method proposed in this paper is effective,and achieves comparable results in real-time performance and accuracy.
作者
许鸿奎
李振业
郭文涛
赵京政
郭旭斌
XU Hong-kui;LI Zhen-ye;GUO Wen-tao;ZHAO Jing-zheng;GUO Xu-bin(School of Information and Electrical Engineering,Shandong Jianzhu University,Jinan 250101,China;Shandong Key Laboratory of Intelligent Buildings Technology,Jinan 250101,China)
出处
《计算机与现代化》
2023年第11期95-100,共6页
Computer and Modernization
基金
山东省重大科技创新工程项目(2019JZZY010120)
山东省重点研发计划项目(2019GSF111054)。
关键词
实时文本检测
双分辨率主干
语义分割
深度聚合金字塔池化模块
real-time text detection
dual resolution backbone
semantic segmentation
deep aggregation pyramid pooling module