摘要
本文论述一个表格自动录入系统.它包括表格分析与理解和字符识别.文中提出了表格分析与理解的一个层次模型和算法.它用数学形态学检测表格线段.然后形成表格线段描述.再从表格线段产生表格线.从表格线产生矩形块集合.最后进行表格理解和结果生成.它具有抗部分断线、虚线和倾斜的能力。不受粘连和噪音点的影响。文中还提出一种基于机器学习的字符识别方法。通过对大量样本的学习来产生分类的规则。并介绍了一个基于此方法的手写数字识别系统。该系统对学习样本识别率为100%。对测试样本的识别率达到97.6%。
This paper deseribes a form reading system. which is composed of an image analysis module and a character recognition module. A layered model and an algorithm for form image analysis are proposed. A method of character recognition based on machine learning is also presented. The rules of the classifier are learned from a large scale sample set by AQ15. a general concept learning system. A numeral recognition system developed in this method is introduced.
出处
《计算机学报》
EI
CSCD
北大核心
1995年第12期924-929,共6页
Chinese Journal of Computers
关键词
汉字识别
表格录入系统
数学形态学
OCR. document analyis. machine learning. mathematical morphology