摘要
本文面向手写字符序列输入信号连续识别研究,分析了汉字及联机手写文本的特点,提出并构建了手写汉字部件集。基于该部件集,完成了GB2312-80的6,763个汉字的部件拆分编码和部件集的测试。统计编码数据发现,汉字依手写部件数的分布规律呈对数正态分布。本文从统计学和字符识别技术的角度对手写部件的构字能力作了分析和讨论,部件集的设计方案在部件选择和汉字拆分上均满足设计要求。实验表明,基于手写部件构造的部件识别器对手写汉字和连续汉字的部件识别率分别达到70.21%和58.49%。
The paper introduces a handwritten Chinese character radical set which is established oriented for the research on continuous handwritten character sequence recognition. Based on the set, the task of radical-based splitting and coding for 6,763 Chinese characters was done. From the statistical data we can found that the distribution of Chinese character numbers with regard to radical numbers fits the logarithmic normal distribution model, Futhermore the composing power of handwritten radicals are analyzed and discussed in the view of statistics and character recognition technique. Finally, two radical recognizers were built for test. 70, 21% radical recognition rate was obtained in single Chinese character test, while the rate in continuous handwritten character sequence test is 58.49%. The results show that the radical set accords with the characteristic of Chinese character and on-line handwritten text.
出处
《中文信息学报》
CSCD
北大核心
2006年第5期58-64,共7页
Journal of Chinese Information Processing
关键词
人工智能
模式识别
连续字符识别
手写汉字部件
对数正态分布
artificial intelligence
pattern recognition
continuous character recognition
handwritten Chinese character radical
logarithmic normal distribution