摘要
手写识别作为改善人机交互的技术之一已经变得越来越重要,涌现了大量对手写文字和手绘图形的研究工作,而作为手写识别的一个重要部分,对图形和文本的分类工作一直没有获得足够的重视。本文基于开源数据挖掘工具Weka设计并实现一种手写图文分离方法,基于LogitBoost、Random Forest和LADTree三种不同分类器的测试结果表明,LogitBoost的综合分类效果最好。通过联合3个分类器能够实现精确的图形判定,但文本的分类效率则受限于分类效果最差的分类器。同时基于信息增益评估结果,分析了不同特征对图文分类的影响。
As a technology to improve human-computer interaction , handwriting recognition is becoming more and more impor-tant.However, the distinction of handwriting texts and shapes has not drawn enough attention .In this paper, we designed and implemented a handwriting text and shape separation approach based on Weka .The experiment results based on three classifica-tion techniques , LogitBoost , RandomForest and LogitBoost , show that LogitBoost performances best .Through a combination of these three classifiers , shapes can be recognized more accurately , while the precision of text is limited by the classifier with lowest accuracy.Moreover, the effect of different features to the results is analyzed based on Information Gain Method .
出处
《计算机与现代化》
2013年第12期145-148,154,共5页
Computer and Modernization
基金
国家自然科学基金资助项目(61100109)
关键词
手写识别
数据挖掘
图文分离
分类模型
sketch recognition
data mining
text-shape separation
classification model