摘要
字线交叠是表格处理中经常遇到的问题,它严重干扰了字符识别。本文提出一种基于线宽信息的表格框线去除算法—线宽阈值法。字符内采用较小的阈值去除框线,字符间采用较大的阈值,使本方法具有很好的抗噪声能力。针对数字与框线交叠的特殊情况,本文提出并比较了两种利用先验知识的方法:启发式先验知识法和识别反馈法。增值税发票的识别实验结果表明,本算法能使字线交叠情况下数字的识别率与字线不交叠的情况相当。
Characters often overlap with form frame lines both in printed forms and in handwritten forms. In this paper, a simple but powerful frame line removal algorithm- Line Width Thresholding Method- is presented. To cope with the situation that digits superpose with frame lines, two approaches are implemented and compared to use syntactic and semantic information. One approach uses the priori information, the other uses the feedback of recognition core. The experiments on value-added tax invoice recognition show the effectness of our approaches: the recognition rate of digit strings overlaping with frame lines does not decrease.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2001年第2期206-210,共5页
Pattern Recognition and Artificial Intelligence
基金
国家863高技术计划
国家自然科学基金
关键词
表格处理
表格框线
线宽阈值法
字符识别
计算机
Form Processing, Character-Line Overlap, Form Frame Line Removal, Character Stroke Reservation