摘要
表格文档在日常生活中运用十分广泛,对这类文档进行计算机自动处理能提高文档处理速度和准确度,具有重要的现实意义。表格文档版面结构提取是文档信息处理自动化的核心。由于表格文档图像包含印刷体和手写体字符、图像、污损、噪声和一定的倾斜,在其影响下,正确的提取文档的版面结构是比较困难的。在总结国内外表格文档版面结构提取方法的基础上,提出了一种基于最优坐标系的版面结构提取方法,该方法与其它方法相比具有很强的抗干扰能力和文档版面定义灵活方便的特点。
Table documents are frequently used in daily life. Automatically handling this kind of documents by computer can not only save time but also offer high accuracy. The layout structure extraction of the table documents is the kernel of the automatic processing of table information. Because the images of table documents always consist of printed and handwritten characters, images, defiles, noises and tilts, it is difficult to extract the layout structure correctly. After summarizing the existing methods of layout structure extraction, this paper presents a new method based on optimal coordinates system to get the layout structure. The new method outperforms the traditional ones in noise - resisting and layout definition.
出处
《计算机仿真》
CSCD
2007年第4期211-215,共5页
Computer Simulation
关键词
版面分析
表格文档处理
最优坐标系
表格识别
Layout analysis
Table document processing
Optimal coordinates system
Table recognition