摘要
提出一种基于距离加权的自适应字线分离算法。应用一定的启发式规则,计算表格线上像素点的权值,将权值与阈值相比较来判断该点是否为字符上的点,其中权值和阈值根据具体表格自动确定。该算法与表格线检测方法无关,且易于实现。实验结果表明,可以很好地处理字线交叠问题,提高了表格识别的正确率。
A new adaptive separating algorithm based on distance-weighted is proposed in this paper. Applying some heuristic rules, it counts the weights of the pixels on form line, then compares each weight with the threshold to judge whether the pixel belongs to character. The weights and the threshold are obtained automatically according to the processing form. The algorithm is independent of the form line detecting methods, and easier to develop. Experiments show that this method can do well with the overlaps easily with high quality, which can improve the accuracy of form recognition.
出处
《计算机工程》
CAS
CSCD
北大核心
2007年第4期206-208,共3页
Computer Engineering
关键词
文档分析和识别
表格识别
字线分离
OCR
Document analysis and recognition
Form recognition
Separation of character and line
Optical character recognition