摘要
提出了一种结合卷积神经网络和递归神经网络的有效的端到端场景文本识别方法。首先使用特征金字塔(FPN)提取图像的多尺度特征,然后将引入残差网络(ResNet)的深度双向递归网络(Bi-LSTM)对这些特征进行编码,获得文本序列特征,进而引入注意力机制(Attention)对文本序列特征进行解码达到识别效果。在ICDAR2013、ICDAR2015数据集实验验证了该算法的有效性,该方法不仅降低了训练难度,而且提升了网络的收敛速度,提高了文本识别准确率。该方法的有效性在ICDAR2013、ICDAR2015数据集上得到了充分验证。
An efficient end-to-end scene text recognition method combining convolutional neural network(CNN)and long-short term memory(LSTM)is proposed.In this method,multi-scale features from images are extractd by feature pyramid networks(FPN)firstly,then passed to bidirectional LSTM module based on ResNet network to get encoded text sequence features,which is decoded by Attention mechanism and get final detections.This method works well on ICDAR2013 and ICDAR2015 datasets,experimental results show that this method can speed up network convergence,reduce the difficulty of training the network and improve the detection accuracy.
作者
陈鹏
李鸣
张宇
王志鹏
CHEN Peng;LI Ming;ZHANG Yu;WANG Zhi-peng(School of Information Engineering,Nanchang University,Nanchang 330000,China)
出处
《测控技术》
2022年第7期17-22,63,共7页
Measurement & Control Technology
基金
江西省重点研发计划项目(20161BBE50084)
江西省自然科学基金(20181BAB211019)。
关键词
特征金字塔
卷积网络
双向递归网络
文本识别
feature pyramid networks
convolutional network
bidirectional long short-term memory
text recognition