摘要
近年来,场景文本识别技术得到了飞速发展.然而,由于不规则场景文本图像中经常存在诸如杂物遮挡、分布扭曲、光照不足等视觉障碍,使得现有方法不能对单词中某些字符进行准确识别,进而产生较多的错误识别.为了解决这一问题,本文提出了一种基于错误纠正(error correction,EC)模块的场景文本识别算法.与现有算法中的纠错模块不同,所提出的EC模块是一个序列到序列的预测模型.在EC模块的编解码结构中增加了多单元注意力机制,能够更加关注特征图中的一些重要信息.EC模块可直接从纯文本中学习语义信息,用于纠正拼写错误的文本.此外,提出了一种基于场景文本识别的多特征(multi-feature,MF)提取器,该提取器由5个MF单元组成,可分别从Resnet-45后5个模块的输出中提取特征信息.与传统的方法相比,MF提取器可以从不同深度挖掘更加丰富的图像信息.在7个数据集上的对比实验结果表明,与当前先进方法相比,所提算法在性能上具有明显的优势.
Recently,the scene text recognition technology has achieved rapid development.However,in irregular datasets,the existing methods may fail to recognize some part of a word due to some visual obstacles,such as oc-clusion,distortion,and poor illumination,resulting in false recognition.To solve this problem,we propose a scene text recognition algorithm based on an error correction(EC)module.Unlike the existing approaches,the pro-posed EC module is a sequence-to-sequence prediction model by adding a multi-unit attention mechanism to the en-coding-decoding structure to pay more attention to the important information in the feature map.Our EC module can gather the semantic information from the pure texts and then correct the predicted results on this basis.Besides,we propose a multi-feature(MF)extractor for scene text recognition,which can extract features from the last five blocks’output of Resnet-45.Compared with the traditional methods,the MF extractor can extract more image information at different depths.Extensive experimental comparative findings on seven datasets demonstrate that our algorithm can achieve the most advanced performance compared with the state-of-the-art ones.
作者
于洁潇
张大壮
何凯
Yu Jiexiao;Zhang Dazhuang;He Kai(School of Electrical and Information Engineering,Tianjin University,Tianjin 300072,China)
出处
《天津大学学报(自然科学与工程技术版)》
EI
CAS
CSCD
北大核心
2023年第4期400-407,共8页
Journal of Tianjin University:Science and Technology
基金
国家自然科学基金资助项目(62171314).
关键词
场景文本识别
语义信息纠错
多特征提取
深度学习
scene text recognition(STR)
semantic error correction(SEC)
multi-feature(MF)extraction
deep learning