摘要
针对中英混排文档图像中粘连字符分割准确率不高的问题,提出了一种改进的滴水分割算法。该方法以贝叶斯分类器区分字符类型并利用阈值判定粘连字符的存在,利用粘连字符上下轮廓的极值点确定候选粘连分割点,并利用距离变换提取粘连位置的中心线,最终将极值点与中心线共同确定的路径作为滴水算法的分割路径来完成对粘连字符的分割。实验结果表明,该方法解决了准确选择滴水算法的起始位置和字符笔划损伤的问题,有效地提高了分割准确率。
An improved drop segmentation algorithm is proposed for touching character in the mixed character and image typesetting document whose segmentation accuracy is not high with the traditional algorithm.This method uses simple Bayes Classifier to distinguish the character types and use adhesion threshold to judge the existence of character touching.The possible cut points of the touching characters are located by analyzing the extreme values of character contours,and axis of stroke connection between two characters is extracted by distance transform.Finally,the path determined by extreme points and the centerline are used as the drop fall algorithm′s segmentation path to complete the segmentation of the characters.Experimental results show that this method is effective to choose the starting position of the drop fall algorithm and to solve the damage problem of the characters.The method can highly improved the segmentation accuracy.
出处
《激光与红外》
CAS
CSCD
北大核心
2010年第12期1369-1373,共5页
Laser & Infrared
关键词
文档图像
粘连字符
轮廓
滴水算法
document image
touching character
contour
drop fall algorithm