摘要
场景文本识别旨在将自然图像中所包含的文本信息识别为计算机可处理的字符序列,其挑战性在于如何处理不规则分布形状的场景文本。目前的主流方法是将其解耦为文本校正与序列识别两个子任务,文本校正模块负责将不规则文本行特征扭曲为标准化的水平形式,然后送入后续的序列识别模块。由于缺乏必要的标注信息,目前大部分文本校正方法依赖于弱监督方式训练的空间变换网络,并且需要微妙的参数初始化策略和端到端的优化方法才能收敛。本文注意到场景文本通常满足一定的几何先验约束,提出一种在该约束下学习的光流网络,其生成的光流场可以用于文本校正,并在若干真实场景文本识别数据集上进行了相关实验。实验结果表明,基于本文方法的文本识别系统比传统基于STN网络的系统的准确率有所提升,这可以归因于本文所提出的基于光流变换的文本校正算法的有效性。
Scene Text Recognition(STR)aims to recognize the text information from natural images into computer-processable character sequences,whose challenge comes from handling the irregular-shaped scene text.To this end,current works usually decouple it into two subtasks:text rectification and sequence recognition,where the former warps the irregular text line features into canonical form which fed into the subsequent sequence recognition module.Due to the lack of annotation,most text rectification methods heavily rely on the Spatial Transformer Network(STN)and the weakly-supervised training,which require delicate parameter initialization and end-to-end optimization to converge.We noticed that scene text usually satisfy certain geometric prior constraints and propose an optical flow network,which can generate the optical flow field used for text rectification.The extensive experiments have been conducted on several real scene text recognition datasets,and the results indicate that our text recognition system has an improved accuracy than the traditional STN-based system,which should be attributed to the effectiveness of our text rectification method based on optical flow prediction.
作者
张文强
张亚博
左旺孟
ZHANG Wenqiang;ZHANG Yabo;ZUO Wangmeng(School of Computer Science and Technology,Harbin Institute ofTechnology,Harbin 150001,China)
出处
《智能计算机与应用》
2020年第8期157-160,163,共5页
Intelligent Computer and Applications