摘要
在文档扫描过程中,输入的文档图像不可避免地会发生倾斜现象,而布局分析及字符识别算法对页面倾斜都十分敏感,因此倾斜检测和校正是文档分析预处理的重要环节。本文提出了一个基于最小二乘法的倾斜检测方法。它将字符连通区包围盒底边中心点作为特征点,利用文本行中特征点与基线的关系,将特征点用最小二乘法拟合出基线的方向,即为页面倾斜方向。同时,本文介绍了一种基于直线拟合的快速倾斜校正算法。实验证明,该算法速度快,准确度高。
During document scanning, the skew of input document images is inevitable. But the algorithms for layout analysis and character recognition are very sensitive to page skew. So skew detection and correction is an important step during preprocessing of document analysis. In this paper, a skew detection method based on the least square is provided.The center in the bottom side within bounding box of connected component is taken as an eigen - point. According to rektionships between baseline and eigen - points, the least square method is applied. At this base, the baseline direction of text line that corresponds to page skew direction can be calculated.Then a correction method based on straight line fitting technology is discussed. Experiments prove that this approach is fast and accurate.
出处
《计算机应用与软件》
CSCD
北大核心
2001年第9期43-46,共4页
Computer Applications and Software