摘要
文字识别技术在文档管理、图像理解、视觉导航等中具有重要应用。然而,自然场景中的文字通常排列任意、形状不一、字体多样,难以被检测和识别。提出了面向自然场景图像的三阶段文字识别框架,该框架包括文字检测、文字矫正和文字识别。首先,利用特征金字塔网络分割图像中的字符,基于双向长短期记忆网络获取字符间的亲和度,连接孤立字符构建单词行,文字检测率(F分数)高达91.97%。然后,通过多目标矫正网络矫正被检测文字,以应对场景图像文字的复杂形变,增强阅读性。最后,通过注意力序列识别网络按序输出预测结果,实现单词级识别,文字识别正确率达84.98%。
Text recognition technology plays an important role in applications such as document management,image understanding,and visual navigation.However,the appearances text in natural scenes are often of arbitrary orientation,different shape and various fonts which makes it difficult to be detected and recognized.For natural scene images with irregular texts,a three-stage text recognition framework for natural scene images is proposed,including text detection,rectification and recognition.Firstly,a feature pyramid network is used to segment the character instances,and the affinity among them is predicted by a bidirectional long short-term memory,so as to group the isolated characters into words.It is reported that the F-score of text detection is as high as 91.97%.The detected words are then rectified by a multi-object rectification network,which can deal with complicated distortion of scene text to improve its readability.Finally,an attention-based sequence recognition network outputs the predictions in sequence to achieve the word-level recognition,where the recognition accuracy is as high as 84.98%.
作者
邹北骥
杨文君
刘姝
姜灵子
ZOU Beiji;YANG Wenjun;LIU Shu;JIANG Lingzi(School of Computer Science and Engineering,Central South University,Changsha 410083,China;Hunan Engineering Research Center of Machine Vision and Intelligent Medicine,Changsha 410083,China)
出处
《浙江大学学报(理学版)》
CAS
CSCD
北大核心
2021年第1期1-8,共8页
Journal of Zhejiang University(Science Edition)
基金
国家自然科学基金资助项目(61902435)
科技部重大项目(2018AAA0102102)
湖南省科技计划项目(2017WK2074)
教育部学科创新引智基地项目(B18059)
湖南省自然科学基金资助项目(2019JJ50808)
2020年大学生创新创业训练计划支持项目(GCX2020325Y).
关键词
文字识别
自然场景
文字检测
文字矫正
text recognition
natural scene
text detection
text rectification