摘要
要提高脱机手写英文字母识别的识别率,关键是特征的提取与有效鉴别特征的抽取。主曲线是主成分分析的非线性推广,它是通过数据分布"中间"并满足"自相合"的光滑曲线。它较好地反映了数据分布的结构特征。首先将主曲线用于训练数据的特征提取;其次在详细分析字母主曲线的结构特点的基础上,选择出用于字母识别的粗分类、细分类特征;最后在对手写字母进行识别时,先用这些特征进行一级分类;对个别不能很好区分的相似字母用模糊数学方法进行二级模糊分类。所提方法在CEDAR手写体小写字母数据库上的实验结果表明:利用这些特征能有效区分相似字母,提高手写小写英文字母的识别率,不但能为脱机手写小写英文字母识别的研究提供一条新途径,而且能为手写单词识别提供有用信息。
Extraction and choice of features are critical to improve the recognition rate of off-line handwritten letters. Principal curves are nonlinear generalizations of principal components analysis. They are smooth self-consistent curves that pass through the "middle" of the distribution. They preferably reflect the structural features of the data. Firstly principal curves were used to extract the structural features of training data. Secondly we chose the classification features used for letters coarse classification and precise classification by analyzing the structural features of principal curves in detail. Finally we separately carried out coarse classification and precise classification in the handwritten letters recognition, and we used fuzzy mathematics method to classify several similar letter once again. The CEDAR hand-written lowercase letter database was used in the experiment. The result of experiment shows that these features have good discriminating power of similar letters. The proposed method can effectively improve the recognition rate of off-line handwritten letters and provide a new approach to the research for off-line handwritten words recognition.
出处
《计算机科学》
CSCD
北大核心
2009年第10期197-201,共5页
Computer Science
基金
国家自然科学基金项目(No.60775036,60703007)
国家973项目(2003CB316902)
博士学科点专项科研基金(20060247039)资助
关键词
主曲线
结构特征
特征选取
Principal curves, Structural features, Features extraction