摘要
针对Tesseract文字识别框架对输入图像的像素要求,以及图像采集过程中可能出现的歪斜、黑边等情况,基于文字识别流程,对预处理阶段的二值化、缩放、边框处理与倾斜矫正进行研究与C++代码的实现。对文字识别OCR(optical character recognition,光学字符识别)的流程进行了概述,重点研究图像缩放与二值化过程,利用双线性插值算法逐像素、逐行分别对横纵坐标进行线性插值,完成图像缩放;利用最大类间方差法、聚类的思想,遍历灰度值,获取最佳二值化阈值,实现图像的二值化。参考OpenCV库函数,提出图像边框与偏移的处理思路。在VS2015环境下基于Tesseract框架,对整个流程进行实现,介绍了Tesseract框架的接口与功能、输入与输出参数。图像的预处理对文字识别必不可少,有利于Tesseract之后的识别工作。
According to the pixel requirements of the input image of the Tesseract text recognition framework,as well as the skew and black edges that may occur in the image acquisition process,based on the text recognition process,the binarization,scaling,border processing and tilt correction in the preprocess are researched and implemented in C++code.The process of OCR(optical character recognition)is summarized,focusing on the process of image scaling and binarization.The bilinear interpolation algorithm is used to linearly interpolate the horizontal and vertical coordinates pixel by pixel and line by line so as to complete image scaling.According to idea of maximum inter-class variance method and clustering,the gray value is traversed to obtain the optimal binarization threshold to achieve the binarization of the image.With reference to the OpenCV library function,the image frame and offset processing ideas are proposed.Based on the Tesseract framework in VS2015,the entire process is implemented,and the interfaces and functions of the Tesseract framework,input and output parameters are introduced.Image preprocessing is essential for text recognition,which is beneficial to the recognition work after Tesseract.
作者
章安
马明栋
ZHANG An;MA Ming-dong(School of Telecommunications&Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;School of Geographical and Biological Information,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)
出处
《计算机技术与发展》
2021年第1期73-76,174,共5页
Computer Technology and Development
基金
江苏省自然科学基金-青年基金项目(BK20140868)