摘要
针对传统表格结构识别算法中,前期图像预处理工作量大、复杂表格结构识别率低、高分辨率和高复杂度表格时间开销过于大的问题,提出先对图像表格结构利用直线段检测器进行框线检测,再利用双阈值直线判断规则,对本应属于同一直线的多条线段进行合并细化,最后对横纵线交点处缺失或过长线段,采用表格结构整体框线对齐的快速识别算法。实验结果表明,该算法既可以对图像高分辨率下简单表格和复杂表格准确识别,也可以满足图像低分辨率中简单表格和复杂表格的识别需求,可以容忍一定倾斜角度,因此减少了图像预处理工作,缩短了检测时间,甚至可以对非严格定义表格结构进行精准识别,进一步推动图像表格结构通用识别算法的进程。
In the traditional algorithm for table structure recognition,image preprocessing is strenuous,the recognition rate of a complex table structure is low,and the processing of tables with high resolution and high complexity is time-consuming.To solve the problems,a fast algorithm for table structure recognition was presented.First,line segment detector was used to detect table structures.Then the multiple line segments that should belonged to the same line were merged and refined using double threshold straight line judgment.Finally,the overall frame line alignment of the table structure was adopted for the missing or over-long line segments at the points where the horizontal and vertical lines intersect.The experimental results show that the algorithm can not only accurately recognize simple and complex tables in high-resolution images,but also meet the recognition needs of simple and complex tables in low-resolution images.The algorithm allows a certain tilt angle,thus reducing the workload in the image pre-processing and shortening the detection time.It can even accurately recognize non-strictly defined table structures,which promotes the progress of the generic recognition algorithm of image table structures.
作者
刘云锴
彭程
边赟
LIU Yunkai;PENG Cheng;BIAN Yun(Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610041,China;University of Chinese Academy of Sciences,Beijing 100049,China)
出处
《计算机应用》
CSCD
北大核心
2021年第S01期250-254,共5页
journal of Computer Applications
基金
四川省重点研发项目(18ZDYF3994)。
关键词
表格结构
直线段检测器
框线检测
合并细化
框线对齐
table structure
line segment detector
frame line detection
merge and refinement
frame line alignment