摘要
文字识别是计算机视觉领域中的重要研究内容之一,为建设智能政务服务奠定了基础.然而政务图像质量参差不齐、字体风格多样,造成识别准确率偏低.针对上述问题,提出了一种结合胶囊网络和语言模型的CNLM模型,并将字符切割与胶囊网络进行结合.首先将政务图像数据集构造为文字识别图像和语言模型句子样本进行分阶段训练,一阶段通过公开字符切割数据集对视觉模型进行预训练,通过句子样本和已有结构化数据对语言模型进行预训练;二阶段将视觉模型与语言模型进行联合训练,并对它们的输出结果进行选择迭代,最后得到图像包含的文字序列信息.该方法在政务图像数据集和GA-HWDB数据集上测试,其准确率相比VisionLAN分别提高2.12%和2.69%.
Character recognition is one of the important research contents in the field of computer vision,which lays the foundation for building intelligent government services.However,the uneven quality of government images and diverse font styles cause the low recognition accuracy.In order to solve above problems,a CNLM model combining capsule network and language model is proposed,and the character cutting is combined with capsule network.Firstly,the government image dataset is constructed as character recognition images and sentence samples of the language model for training in stages,in the first stage,the visual model is pre-trained by public character cut dataset,and the language model is pre-trained by sentence samples and existing structured data.In the second stage,the visual model and language model are jointly trained,the output results of them are selected and iterated to finally obtain the text sequence information contained in the images.The method is tested on both the government image dataset and GA-HWDB dataset,and its accuracy is improved by 2.12%and 2.69%compared with VisionLAN.
作者
于龙洋
王德军
孟博
吴余龙
胡宗华
段伟
YU Longyang;WANG Dejun;MENG Bo;WU Yulong;HU Zonghua;DUAN Wei(College of Computer Science,South-Central Minzu University,Wuhan 430074,China;Wuhan Lilosoft Co.,Ltd,Wuhan 430015,China)
出处
《中南民族大学学报(自然科学版)》
CAS
2024年第3期393-400,共8页
Journal of South-Central University for Nationalities:Natural Science Edition
基金
湖北省科技创新人才计划资助项目(2023DJC094)
国家重点研发计划资助项目(2020YFC1522900)
中南民族大学研究生学术创新基金资助项目(3212023sycxjj168)。
关键词
智能政务
文字识别
胶囊网络
语言模型
intelligent government affair
character recognition
capsule network
language model