期刊文献+

基于游程聚类的表格框线检测算法 被引量:6

Table frame line detection algorithm based on run-length clustering
下载PDF
导出
摘要 现有的基于游程的表格框线检测算法检测速度快,但对于复杂表格框线检测质量不高,甚至出现大量错误。提出一种基于游程层次聚类的表格框线检测算法。首先,把可能属于同一条横线或纵线的游程划分到一个游程组,定义了两条框之间的相似度;然后以这组游程为初始原子类,通过层次聚类迭代地选择相似度最大的两条横线或纵线合并为一条框线。当相似度最大的两条框线相似度小于预先设定的一个阈值或仅剩下一条框线时迭代停止。针对图像中的标题和说明段等文字信息形成的线条,提出亲属表格线的概念,删除不包含两条亲属表格线的线段,最后对提取的框线进行二次提取。为了对算法加速,提出对各游程组并行聚类。实验结果表明,该算法相比现有算法对一些复杂表格的框线识别率提高了50%以上。 The existing frame line detection algorithm based on run-length takes few of time, but has low quality for complex frame line detection and even a lot of errors. A kind of method based on run-length clustering for frame line detection was presented. Firstly, run-lengths which belonged to the same horizontal fine or vertical line were put together as one group. The similarity between two lines was defined. Then under hierarchical clustering, the two lines with the biggest similarity were merged into one line iteratively with these run-lengths as the initial classes in the group. The iteration stopped when the similarity between the two lines with the biggest similarity is less than the threshold set beforehand or only one line is left. For those lines generated by captions and explanatolT paragraphs, the paper defined the concept of relative frame fine and the fines which did not have two relatives were deleted. Frame fines were extracted after one process named second extraction. In order to increase the speed of the algorithm, parallel clustering for each run-length group was presented. The experimental result shows that this algorithm increases accuracy by 50% for the frame line recognition of some complex tables compared with the existing method.
作者 白伟 崔喆 BAI Wei;CUI Zhe(Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610041,China;University of Chinese Academy of Sciences,Belting 100049,China)
出处 《计算机应用》 CSCD 北大核心 2018年第A01期179-182,共4页 journal of Computer Applications
基金 四川省科技支撑计划项目(2015GZ0088) "西部之光"联合学者项目
关键词 表格识别 框线检测 表格线游程 层次聚类 table recognition frame line detection run-length of table line hierarchical clustering
  • 相关文献

参考文献2

二级参考文献12

  • 1管继斌,明德烈.基于游程的倾斜表格图像的快速检测和校正[J].华中科技大学学报(自然科学版),2005,33(8):69-71. 被引量:7
  • 2Tang Y Y, Lee S W, Suen C Y. Automatic Document Processing: A Survey[J]. Pattern Recognition, 1996, 29(12): 1931-1952.
  • 3Liu Jinhui, Ding Xiaoqing, Wu Youshou. Description and Recognition of Form and Automated Form Data Entry[C]//Proc. of the 3rd International Conference on Document Analysis and Recognition. Montreal, Canada: [s. n.], 1995: 579-582.
  • 4Illingworth,J.,Kittler,J.A survey of the hough transform.Computer Vision,Graphics,and Image Processing,1988,44(1):87~116.
  • 5Liu,J.H.,Ding,X.Q.,Wu,Y.S.,et al.Description and recognition of form and automated form data entry.In: Proceedings of the 3th International Conference on Document Analysis and Recognition.Montreal,Canada,1995.579~582.
  • 6Liu,W.Y.,Dov,D.From raster to vectors: extracting visual information from line drawings.Pattern Analysis and Application,1999,2(1):10~21.
  • 7Yu,B.,Jain,A.K.A generic system for form dropout.IEEE Transactions on Pattern Analysis and Machine Intelligence,1996,18(11):1127~1131.
  • 8Pan,S.Y.Research and realization of a generic form recognition system [MS.Thesis].Beijing: Tsinghua University,1999 (in Chinese).
  • 9Chen,J.-L.,Lee,H.-J.An efficient algorithm for form structure extraction using strip projection.Pattern Recognition,1998,31(9):1353~1368.
  • 10潘世言.通用表格识别系统的研究与实现[硕士学位论文].北京:清华大学,1999.

共引文献28

同被引文献24

引证文献6

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部