基于空间域和频率域特征融合的场景文本识别

Scene Text Recognition Based on Feature Fusion in Space Domain and Frequency Domain

下载PDF

导出

摘要对于小样本语言无关场景的文本识别,现有的方法往往面临鲁棒性低和泛化能力差的问题。针对这一问题,一方面,在特征提取阶段,提出了基于空间域和频率域特征融合的双流网络结构,其包含一个提取空间域特征的深度残差卷积网络分支,以及提取频率域特征的一维快速傅里叶变换和浅层神经网络分支,接着使用通道注意力机制融合这两种特征。另一方面,在序列建模阶段,针对语言无关场景的特点,提出一种多尺度一维卷积模块用来代替双向长短期记忆网络。然后结合现有的TPS矫正模块和CTC解码器搭建完整模型。训练过程中采用了迁移学习的方法,先在大型英文数据集上进行预训练,后在目标数据集上进行微调。在文中整理的两个小样本语言无关数据集上的实验结果表明,所提模型在准确率上优于现有的模型,验证了其在该场景下的具有较高的鲁棒性和泛化能力;此外,在语言相关场景的5个基准数据集上的相关实验(不用微调)表明,使用文中所述特征提取模块的方法优于对比的基线方法,证明了所提出的双流特征融合网络的有效性和通用性。 Existing scene text recognition methods often face the problems of low robustness and poor generalization ability in the few-shot and language-independent scene.To solve this problem,on the one hand,a dual-stream network structure based on the fusion of space domain and frequency domain features is proposed in the feature extraction stage.It consists of a deep residual convolutional network branch for extracting spatial domain features,and a shallow neural network with one-dimensional fast fourier transform(FFT)branch for extracted frequency features.And then apply the channel attention mechanism to fuse the two features.On the other hand,in the sequence modeling stage,a multi-scale one-dimensional convolution module is proposed to replace the bidirectional long short-term memory(BiLSTM)according to the characteristics of the language-independent scene.Finally,a complete model is built by combining the existing TPS rectification module and CTC decoder.The transfer learning me-thod is adopted in the training process.Pre-training is performed on the large English datasets first,and then fine-tuning is performed on the target datasets.Experimental results on two few-shot language-independent datasets compiled in the paper show that the method is superior to the existing methods in terms of accuracy,which verifies that it has high robustness and generalization ability in this scenario.Moreover,the method using the feature extraction module described in the paper is better than the baseline on the five benchmark datasets of language-dependent scene(no fine-tuning),which verifies the effectiveness and versati-lity of the dual-stream feature fusion network proposed in the paper.

作者霍华骑陆璐 HUO Huaqi;LU Lu(School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006,China;PENGCHENG Laboratory,Shenzhen,Guangdong 518055,China)

机构地区华南理工大学计算机科学与工程学院鹏城实验室

出处《计算机科学》 CSCD 北大核心 2023年第S02期36-43,共8页 Computer Science

基金广东省重点领域研究计划(2022B0101070001)。

关键词深度学习场景文本识别双流网络频率域分支小样本 Deep learning Scene text recognition Dual-stream network Frequency domain branch Few-shot

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1李颖,边山,王春桃,黄琼.基于双流网络结构的深度伪造人脸的检测方法[J].计算机科学,2022,49(S02):558-566. 被引量：2

二级参考文献3

1李旭嵘,于鲲.一种基于双流网络的Deepfakes检测技术[J].信息安全学报,2020,5(2):84-91. 被引量：9
2卞明运,彭勃,王伟,董晶.基于空洞卷积的低质量人脸深度伪造图片检测[J].现代电子技术,2021,44(6):133-138. 被引量：7
3龚晓娟,黄添强,翁彬,叶锋,徐超,游立军.基于双层注意力的Deepfake换脸检测[J].网络与信息安全学报,2021,7(2):151-160. 被引量：5

共引文献1

1王清波,李振,王一诺.人脸伪造主动防御方法的设计与实现[J].自动化应用,2023,64(20):172-175.

1盛连军,汤致轩,茅晓亮,白帆,黄定江.基于空间域和频率域方法的烟雾检测[J].华东师范大学学报（自然科学版）,2023(5):147-163.
2梁李芳,关东海,张吉,袁伟伟.基于时空注意力机制的多元时间序列异常检测[J].计算机科学,2023,50(S02):438-445.
3陈嘉俊,陈伟,赵雷.路网拓扑感知的轨迹表示学习方法[J].计算机科学,2023,50(11):114-121.
4刘姝妍,何柳,陶剑,卓雨东,王浩东.面向高分辨率遥感图像的分割模型研究[J].网络安全与数据治理,2023,42(10):8-15. 被引量：1
5罗会兰,于亚威,王婵娟.多维特征激励网络用于视频行为识别[J].计算机科学,2023,50(S02):226-233.
6谢卓,康乐,周丽娟,张志鸿.基于对比学习的多关系属性图聚类方法[J].计算机科学,2023,50(11):62-70.
7马汉达,方雨清.基于动态负采样的图卷积协同过滤推荐模型[J].计算机科学,2023,50(S02):489-495.
8刘起东,刘超越,邱紫鑫,高志敏,郭帅,刘冀钊,符明晟.基于时间感知Transformer的交通流预测方法[J].计算机科学,2023,50(11):88-96. 被引量：1
9黄友菊,农志铣,韩广萍,吴慧.一种基于差异注意力的广西耕地“非粮化”提取研究[J].测绘科学,2023,48(8):193-201.
10宫雪峰,李豪杰,陈志鹏,于航.基于空间域特征的引信安全控制方法[J].弹道学报,2023,35(3):39-46. 被引量：1

计算机科学

2023年第S02期

浏览历史

内容加载中请稍等...

基于空间域和频率域特征融合的场景文本识别

参考文献1

二级参考文献3

共引文献1

相关作者

相关机构

相关主题

浏览历史