摘要
字符笔画与表格线的粘连或交叠是表格型票据中普遍存在的现象,严重影响了后期票据自动识别处理的性能.现有方法大多基于二值图像,未能充分利用灰度图中的框线特征.基于票据图像中的框线特征,提出一种表格型票据预处理中的框线检测与去除算法,首先充分利用票据灰度图像的特点准确地检测出框线,再采用一种连通链结构描述叠加后的框线区域,然后对交叠进行判断和标记,根据标记保留字符笔划去除框线干扰.经过实际银行支票图像测试证明了算法的有效性和鲁棒性.
In practical form bill images, characters usually overlap with the form frames, which will greatly affect the performance of the document image auto-processing system. Most of the form frame line removal algorithms are based on binary images, which can not make good use of line characteristics in gray images. According to the attribute of financial documents ' structure, an improved line detection and removal algorithm applied in financial form image pre-processing is proposed in this paper. In order to reduce the complexity and improve the effect of line removal, the process of line detection and removal are carried out respectively. First, frame lines are exactly detected according to the line characteristics in gray images. Then chain code method is used to describe the frame line region. Cross-points of characters and lines are detected subsequently with deterministic finite automaton in order to analyse the overlapping types. Finally, frame lines are removed with the marks in cross-points detection. Therefore, the limitation of stroke aberrance caused by thresholding is overcome and higher accuracy of line removal can be achieved. The results of experiment demonstrate that compared with different existing methods based on handwritten digit character recognition, the proposed algorithm is efficient and robust.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2008年第5期909-914,共6页
Journal of Computer Research and Development
基金
国家自然科学基金重点项目(60632050)
国家“八六三”高技术研究发展计划基金项目(2006AA01Z119)~~
关键词
文档分析
表格识别
直线检测
连通链结构
框线去除
document analysis
form recognition
line detection
chain code
frame line removal