摘要
字符切分是影响OCR系统识别的关键因素之一。对于中英文混排文档,提出了基于字符类别的识别反馈混排字符切分方法,利用字符特征分类判别出文档中的汉字类、英文、数字和标点符号类、部件类,对汉字类和部件类借助识别技术分别进行处理。该方法结构简单,容易实现,实验结果表明该方法切分效果好,字符类别判断准确。
The characters′ segmentation is one of the key factors which affect character recognition in OCR system.Aimed at the document image with both Chinese characters and English characters.this paper present the method which is the segmentation of the mixed arranging character based on the sort and recognition of characters.Classifying by the characters characteristic,it is distinguished into Chinese character class,English and number and punctuation mark class,and component class.Then Chinese character class and component class is processed respectively with recognition technique.The structure of this method is simple and easy to realize.The result of the experiment indicates that this method has good effect on segmentation and has high accurate rate in character classification discrimination.
出处
《河北省科学院学报》
CAS
2011年第1期15-19,共5页
Journal of The Hebei Academy of Sciences
基金
河北省自然科学基金资助项目(602127)
关键词
字符切分
分类器设计
字符类别判断
字符识别
Character segmentation
Classification design
Character classification discrimination
Character recognition