基于多模态迭代及修正的文本识别算法

Text recognition algorithm based on multimodal iteration and correction

导出

摘要针对场景文本识别在长距离建模时容易产生信息丢失和对低分辨率文本图像表征能力较弱的问题,提出了一种基于多模态迭代及修正的文本识别算法。本文算法的视觉模型(vision model)是由CoTNet(contextual transformer networks for visual recognition)、动态卷积注意力模块(dynamic convolution attention module,DCAM)、EA-Encoder(external attention encoder)和位置注意力机制组合而成的。其中CoTNet可以有效起到缓解长距离建模产生的信息丢失问题;DCAM在增强表征能力、专注于重要特征的同时,将重要的特征传给EA-Encoder,进而提高CoTNet和EA-Encoder之间的联系;EA-Encoder可以学习整个数据集上最优区分度的特征,捕获最有语义信息的部分,进而增强表征能力。经过视觉模型后,再经过文本修正模块(text correction model)和融合模块(fusion model)得到最终的识别结果。实验数据显示,本文所提出的算法在多个公共场景文本数据集上表现良好,尤其是在不规则数据集ICDAR2015上准确率高达85.9%。 A text recognition algorithm based on multimodal iteration and correction is proposed to address the problems that scene text recognition is prone to information loss when modeling over long distances and weak characterization for low-resolution text images.The visual model of the algorithm in this paper is a combination of contextual transformer networks for visual recognition(CoTNet),a dynamic convolutional attention module(DCAM),an external attention encoder(EA-Encoder),and a positional attention mechanism.The CoTNet can effectively alleviate the information loss problem arising from long-distance modeling.The DCAM enhances representation by focusing on the essential features while passing the critical components to the EA-Encoder,improving the connection between CoTNet and EA-Encoder.EA-Encoder learns the best distinguishing features on the entire dataset,capturing the most semantic information parts and thus enhancing representation.After the visual model,the text correction and fusion modules obtain the final recognition results.According to the experimental data,the algorithm proposed in this paper performs well on several public scene text datasets,especially on the irregular dataset ICDAR2015 with an accuracy of 85.9%.

作者强观臣张丽真杨茜熊炜李利荣 QIANG Guanchen;ZHANG Lizhen;YANG Qian;XIONG Weij;LI Lirong(School of Electrical and Electronic Engineering,Hubei University of Technology,Wuhan,Hubei 430068,China;Hubei Key Laboratory of Solar Energy Efficient Utilization and Energy Storage Operation Control,Hubei University of Technology,Wuhan,Hubei 430068,China;Hubei Engineering Research Center for Safety Monitoring of New Energy and Power Grid Equipment,Hubei University of Technology,Wuhan,Hubei 430068,China;Department of Computer Science and Engineering,University of South Carolina,Columbia,South Carolina 29201,USA)

机构地区湖北工业大学电气与电子工程学院湖北工业大学太阳能高效利用及储能运行控制湖北省重点实验室湖北工业大学新能源及电网装备安全监测湖北省工程研究中心美国南卡罗来纳大学计算机科学与工程系

出处《光电子．激光》 CAS CSCD 北大核心 2024年第5期525-535,共11页 Journal of Optoelectronics·Laser

基金国家自然科学基金(62202148) 湖北省自然科学基金(2019CFB530) 湖北省科技厅重大专项(2019ZYYD020) 国家留学基金(201808420418)资助项目。

关键词场景文本识别动态卷积注意力模块外部注意力机制编码器 scene text recognition dynamic convolution attention module external attention mechanism encoder

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1程乐超.视觉大模型参数高效微调技术应用与展望[J].人工智能,2024(1):54-65.
2徐仕成,朱子奇.正交约束多头自注意力的场景文本识别[J].中国图象图形学报,2023,28(12):3855-3869. 被引量：1
3Ya-Lan Wang,Jia-Jun Wang,Xi-Cong Lou,Han Zou,Yun-E Zhao.Clinical usefulness of the baby vision test in young children and its correlation with the Snellen chart[J].International Journal of Ophthalmology(English edition),2024,17(2):348-352.
4李奇泽.洪涝灾害地域时空数据融合算法的性能评估与对比研究[J].信息记录材料,2024,25(3):237-239.
5曾凡智,冯文婕,周燕.深度学习的自然场景文本识别方法综述[J].计算机科学与探索,2024,18(5):1160-1181. 被引量：1
6Shuqin Wen,Bing Wei,Junyu You,Yujiao He,Jun Xin,Mikhail A.Varfolomeev.Forecasting oil production in unconventional reservoirs using long short term memory network coupled support vector regression method: A case study[J].Petroleum,2023,9(4):647-657.
7WAN Shichang,LI Qingshan,WANG Xuhua,LU Nanhua.CBA: multi source fusion model for fast and intelligent target intention identification[J].Journal of Systems Engineering and Electronics,2024,35(2):406-416.
8周燕,韦勤彬,廖俊玮,曾凡智,刘翔宇,周月霞.基于增强多层次特征融合的自然场景文本检测[J].佛山科学技术学院学报（自然科学版）,2024,42(3):1-13.
9Hongquan Xia,Shuxian Jiang.Geostress effect on resistivity and its relevant correction method[J].Petroleum,2023,9(3):412-418.
10Xianghong Cao,Xinyu Wang,Xin Geng,Donghui Wu,Houru An.An Approach for Human Posture Recognition Based on the Fusion PSE-CNN-BiGRU Model[J].Computer Modeling in Engineering & Sciences,2024,140(7):385-408.

光电子．激光

2024年第5期

浏览历史

内容加载中请稍等...

基于多模态迭代及修正的文本识别算法

相关作者

相关机构

相关主题

浏览历史