摘要
在对文档图像进行光学字符识别时,由于书籍扭曲的存在,识别率会降低。对于含有页眉页脚线的扭曲文档图像,提出一种快速校正方法。首先分别检测并定位图像中的页眉线,保存页眉线的坐标信息。根据等比算法计算页眉线上各点在校正时所需向上或向下移动的距离,然后以此距离为参数扫描图像,计算页眉页脚线之间的各个目标像素校正所需移动的距离,同时进行像素点的移动重构图像,最终得到校正的图像。实验结果表明,该方法校正效果明显,对于包含页眉页脚线的扭曲文档图像有较好的校正效果,校正后OCR识别率大幅度提高。
The recognition rate of OCR (optical character recognition) is low because of the warped document images. For those warped document images with header and footer lines, a fast method is proposed to increase the rate of OCR in this paper. Firstly, the location of the header line is detected and restored in the document image. Then the distance of the line moving upward or downward is calculated based on geometric algorithm. After that, the image is scanned using the distance as parameters and the distance that every target pixel needs to remove is calculated. At the same time, all pixel are removed in order to restructure the image and then a well corrected image is obtained. Experiments demonstrated that this correcting method was efficient. The OCR rate of warped document image with header line could be significantly improved.
出处
《图学学报》
CSCD
北大核心
2016年第1期79-83,共5页
Journal of Graphics
基金
国家自然科学基金项目(61371142)
关键词
计算机应用
扭曲文档
页眉页脚线
等比距离
图像校正
computer application
warped document
header and footer line
geometric distance
image correct