摘要
针对提取的文本基线不能很好贴合文本边缘,影响弯曲文档图像几何纠正效果,文中提出了一种基于Radon变换和连通域方向的弯曲文本基线提取方法.首先通过分析连通域附近不同距离内像素分布变化情况,将图像分割成横排区域和竖排区域;其次把区域分成窄形条状子图,利用局部合并连通域的方向和Radon变换得到行连通域并提取基直线,各子图基直线合并拟合区域弯曲基线.实验比较显示,文中方法能适应不同弯曲、稀疏程度的文档图像,所提曲线较好地吻合文本边缘,可以应用于弯曲文档图像几何纠正、光学字符识别中.
Extracted virtual baselines cannot fit text edges very well,influence geometric rectification of curved document image,an extraction method of warped text baselines based on Radon transform and orientations of Connected Components(CCs) was proposed.By analyzing pixel quantities variation around CCs,it firstly segmented binary images into vertical and horizontal alignment regions,then divided each of regions into a sequence of overlapping vertical strips,distinguished text lines using Radon transform in the range of locally merged CCs' orientations and extracted linear baselines on strips,connected baselines between strips by cubic polynomials.By comparison,the method suited differently bent and sparse document images,extracted curves fitted text baselines better. It can be applied to geometric rectification of curved document image,Optical Character Recognition(OCR) system.
作者
罗晓萍
朱金好
LUO Xiao-ping;ZHU Jin-hao(School of Computer and Information,Anhui Normal University,Wuhu 241003,China;Department of Medical Information,Wannan Medical College,Wuhu 241002,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2018年第12期2699-2704,共6页
Journal of Chinese Computer Systems
关键词
文本基线
连通域重心
连通域倾斜方向
RADON变换
曲线拟合
text baseline
centroid of connected component
orientation of connected component
radon transform
curve fitting