摘要
表格框线检测是表格识别的基础.现有的表格框线检测算法或者速度慢,或者鲁棒性差,而且没有充分利用表格框线之间的约束信息提出了一种基于所定义的图像结构基元“有向单连通链”的自底向上表格框线检测算法.在此算法中,有向单连通链是一种黑像素游程序列,作为非常合适的矢量基元,在引入一定表格框线约束信息的条件下合并单连通链,有效地去除伪框线,补全断裂的框线,提高了算法的鲁棒性,可以准确而快速地提取表格框线.通过滤除噪声单连通链,加快单连通链的合并速度,算法速度提高了3~10倍,满足了实用要求、实验证明,该算法具有速度较快、鲁棒性高、抗任意角度的倾斜、抗断裂等优点.
The existing form frame line detection algorithms are either time consuming or with low robustness. Furthermore, all these approaches do not use the constraint information between form frame lines. In this paper, a novel bottom-up form frame line detection algorithm is proposed based on the directional single-connected chain (DSCC). Defined as an array of black pixel run-lengths, DSCC works very well as an image structure element or a vector in this vectorization algorithm. By merging multiple DSCCs under some constraints, people are able to extract the form frame lines automatically yet fast. With the help of the constraints between form frame lines, the robustness of the approach is increased drastically by getting rid of pseudo lines and completing broken lines. By filtering DSCCs created by noise and speeding up the merging of DSCCs, the speed of this algorithm is comparable with the well-known projection method. Experimental results show that this algorithm is fast, resistant to moderate serious line break and skew of any angle.
出处
《软件学报》
EI
CSCD
北大核心
2002年第4期790-796,共7页
Journal of Software
基金
国家自然科学基金资助项目(69972024)
863高科技发展计划基金资助项目(863-306-ZT03-03-1)