摘要
笔画与表格框线的交叠的现象在表格型文档中普遍存在,严重影响了文档自动处理系统的性能.现有的去线算法大部分都是基于二值图像的,许多有用的局部信息已经丢失.提出了直接利用图像灰度信息的灰值线检测与去除算法.首先利用图像的边缘特征检测直线以及字线的相交位置;然后通过对直线上相交点对的分析确定字线的交叠方式,并将这些方式归纳为穿透和未穿透两类简单的形式;最后将直线划分为保护区和擦除区两部分,保护区内的像素在去线过程中被保留,而擦除区内的像素则利用灰度形态学算法来擦除.在我国现行支票上的实验表明算法是有效的.
Preprocess procedure is an important procedure in a document image analysis (DIA) system. In practical document images, characters usually overlap with the preprinted form frames, creating tremendous problems for the recognition engines. Most of the form frame line removal algorithms are based on bi-level images, which have lost much useful information during the binary stage. Proposed in this paper is a line removal algorithm directly based on gray-level images. First, cross-points of characters and lines are detected by Soble gradient. Then the overlapping types of characters and lines are converted into touch type or crossover type by cross-points analysis. Finally, lines are removed with topological method. Experiment results on 1225 real life character string images demonstrate the efficiency of this algorithm. The recognition rate is improved from 75.9% to 91.4%.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2005年第4期635-639,共5页
Journal of Computer Research and Development
基金
高等学校博士点科研基金项目(20020288013)
关键词
文档处理
表格处理
直线检测
直线去除
document processing
form processing
line detection
line removal