摘要
现有的表格识别算法速度较慢,且仅能容忍表格线的微小断线,文章给出了基于顶点链编码的表格单元格矩形识别算法,利用边界标定自动机,标定表格单元格内环边界并生成顶点链编码,利用顶点链编码特性,有效地去除表格框线上的锯齿,修复断裂的框线,通过搜索单元格矩形4个角的顶点链编码来获得表格单元格的矩形区域。实验证明本算法具有速度快、鲁棒性高、抗表格框线断裂等优点。
The form recognition algorithms in existence are inefficient, and only can abide tiny broken lines. This paper presents an algorithm based on vertex chain code for form celt rectangle recognition, the algorithm uses region-labeling robot to label the inner border of a form cell to get the sertex chain code, using the characters of the vertex chain code, the algorithm can remove the sawteeth on the form frame line efficiently and restore the form frame lines and get the region of the form cell by searching the vertex chain code of the four angles of the cell. Experimens prove that the algorithm has the advantages of high speed, high robustness and being able to resist broken form frame lines.
出处
《计算机工程》
EI
CAS
CSCD
北大核心
2006年第13期9-11,14,共4页
Computer Engineering