摘要
本文在充分考察了手写汉字和中国大汉字集特点的基础上,提出了一组用于手写印刷体汉字识别的分类特征,它们是长笔划分布类型、各类笔划的数目、交叉点数目和折点数目。利用这组特征进行匹配就可直接识别出GB2312—80汉字集中的绝大部分汉字,再通过一个基于知识的推理过程即可进一步识别出已被分成类组的少数剩余汉字,这种将统计分类与基于知识的推理识别相结合的两级识别方法具有较高的效率。一个适应性较强的汉字笔划和特征点抽取方法也被设计,它是SLSA方法的改进,与机器学习功能相配合,大大提高了特征抽取的正确率。我们根据上述思想建立了一个手写印刷体汉字识别实验系统,并获得了较好的实验结果。
This paper describes a study and experiment on handprinted Chinese character recognition. Firstly, it presents a group of stable and essential features of handprinted Chinese character which are the distribution type of long strokes, the type and the numbers of strokes and the numbers of intersections and corners. The features express very well some information on the structural type, the basic elements and the topologic critical points of Chinese characters so that they have great capacity and excellent suitability for recognizing handprinted Chinese characters. Secondly, it recommends a two-stage recognition scheme which integrates the statistical classification with the knowledge-based inference. Thirdly,it improves a method for the extraction of strokes and feature points of handprinted Chinese characters by a machine learning mechanism so that the accuracy of the feature extraction is raised. Last, it -gives the experiment results.
出处
《中文信息学报》
CSCD
1991年第4期50-55,共6页
Journal of Chinese Information Processing