期刊文献+

基于Transformer的证件图像无检测文字识别 被引量:3

Non-detection text recognition of certificate image based on Transformer
下载PDF
导出
摘要 深度学习在图像识别的现存模型中,都有检测和识别两个过程,且需借助复杂的网络结构、大量的文本框标注来提高识别准确率。文中针对存在的问题提出了一个简单且鲁棒性强的证件图片无检测文字识别方法,通过嵌入二维特征图中不同序列位置的水平、竖直方向位置编码,将不同子空间的特征表达连接到序列解码器,解码器部分加入了全局上下文模块,网络模型能并行训练并可以快速收敛,通过插入特殊符号直接得到结构化的字段,简化了信息后处理流程,单张图片识别时间在122ms左右。测试结果表明,模型在身份证扫描件文本图像识别上表现出优越的性能。 The existing models of deep learning in image recognition have two steps,including detection and recognition,and use the complex network structure and a large number of bounding box annotations to improve the recognition accuracy.The paper proposes a simple and robust method of non-detection text recognition for certificate image.This method directly embeds the horizontal and vertical position coding of different sequence positions in the two-dimensional feature map,and connects the feature expression of different subspaces to the sequence decoder.The decoder part is added with the global context module.The network model can be trained in parallel and can be converged quickly.Moreover,the structured text field can be readily obtained by inserting special symbols,which simplifies the information post-processing process.The recognition time of a single image is about 122ms.The test results show that the model has excellent performance in text image recognition of scanned ID card.
作者 肖慧辉 张东波 王旺 王家奎 XIAO Hui-hui;ZHANG Dong-bo;WANG Wang;WANG Jia-kui(School of Automation and Electronic Information,Xiangtan University,Xiangtan 411100,Hunan Province,China;Wuhan Veilytech Co.,Ltd.,Wuhan 430000,China)
出处 《信息技术》 2021年第6期78-85,90,共9页 Information Technology
关键词 TRANSFORMER 端到端模型 无检测文字识别 全局上下文 二维位置编码 Transformer end-to-end model non detect text recognition Global Context two-dimensional position encoding
  • 相关文献

参考文献1

二级参考文献40

  • 1Lowe D G. Distinctive image features from scale-invariant keypoints[J].{H}International Journal of Computer Vision,2004,(2):91-110.
  • 2Dalal N,Triggs B. Histograms of oriented gradients for human detection[A].San Diego,CA,USA:IEEE,2005.886-893.
  • 3Ojala T,Pietikainen M,Harwood D. Performance evaluation of texture measures with classification based on Kullback discrimination of distributions[A].Jerusalem,Irsael:IEEE,1994.582-585.
  • 4Matas J,Chum O,Urban M. Robust wide-baseline stereo from maximally stable extremal regions[J].{H}IMAGE AND VISION COMPUTING,2004,(10):761-767.
  • 5Hinton G E,Osindero S,Teh Y W. A fast learning algorithm for deep belief nets[J].{H}Neural Computation,2006,(7):1527-1554.
  • 6Hinton G E. Learning multiple layers of representation[J].{H}Trends in Cognitive Sciences,2007,(10):428-434.
  • 7Hinton G E,Zemel R S. Autoencoders,minimum description length,and Helmholtz free energy[A].Burlington,USA:Morgan Kaufmann,1994.3-10.
  • 8Rumelhart D E,Hinton G E,Williams R J. Learning Representations by Back-Propagating Errors[M].Cogmitive Modeling:MIT Press,2002.213.
  • 9Vincent P,Larochelle H,Bengio Y. Extracting and composing robust features with denoising autoencoders[A].New York,NY,USA:ACM,2008.1096-1103.
  • 10Lee H,Grosse R,Ranganath R. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations[A].New York,NY,USA:ACM,2009.609-616.

共引文献145

同被引文献24

引证文献3

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部