摘要
表格分析是表格自动处理过程中的第一步。本文充分利用表格的特点 ,给出一个基于直线提取和补全的通用表格分析方法。先使用一种矢量化的直线提取算法在游程连通图的基础上得到表格线 ,同时对表格进行倾斜校正。然后根据表格特性调整表格线 ,再从表格线得到表格特征点 ,最后建立规则通过对表格线的补全来求得表格结构的行单元描述。使用该方法对表格图象进行分析 ,能处理表格线断裂、文字表格线粘连等常见问题 ,正确得到表格结构。
Form analysis is the first step in automatic form processing.This paper introduces a general form analysis method based on line extraction and completion,which exploits thoroughly the property of form.We use a vectorization algorithm to extract form lines from run length connect graph which is calculated first,and in the same time the skew angle is detected.Lines are adjusted according to the characteristic of form.Then all critical points are calculated from which form cell description of the form can be derived based on some rules to complete the form lines.By applying this method,broken line and character line overlap problems can be solved and the structure of forms can be analyzed correctly.
出处
《中文信息学报》
CSCD
北大核心
2000年第2期49-54,共6页
Journal of Chinese Information Processing
关键词
表格分析
表格自动处理
直线提取
Form analysis Automatic form processing Line extraction