摘要
本文对多体多字号印刷汉字识别的方法进行了研究,本文提出的方法是首先对不同字号印刷汉字进行归一化处理,再抽取汉字四周笔端数特征、改进粗外围特征、笔划穿插次数特征和投影WALSH变换特征,然后对组合特征进行多级分共识别。实验在IBM-PC/AT微型机上进行,结果表明,实验系统在识别实际印刷文本时识别率大于98%。
In this paper the approach to multi-font and multi-size printed Chinese character recognition has been studied. According to the proposed approach,the input multisize printed Chinese characters were normalized at first and then the numbers of stroke end points in four surrounding parts of the character, improved peripheral feature, numbers of cutting accros strokes and walsh transform of projection functions on x and y axes were extracted as features. Finally, the recognition process is based on a multi-level classification procedure according to the compound features extracted above . Experiments were taken on the basis of a microcomputer IBM PC/AT. The results show that the recognition rate is great than 98% when a real printed book were recognized.
出处
《中文信息学报》
CSCD
1990年第3期57-62,共6页
Journal of Chinese Information Processing