摘要
文本识别问题是模式分类中的一类重要的识别问题,也是较难处理的一类。该类问题中往往存在很多冗余属性,因此传统的分类方法对它的效果一般不好。本文针对文本识别问题,提出了一种基于核主成分分析的神经网络集成算法,该算法首先利用核主成分分析进行降维,合理的去除冗余属性,然后再利用神经网络集成算法进行分类学习。在文本分类数据集上的实验说明,本文算法可以有效地提高文本分类问题的分类性能。
Text recognition problem is an important class of recognition problems in pattern classification, and it is also more difficult to deal with. Since there is often a lot of redundant attributes for this kind of problem, so the effect of traditional classification methods is not very well. In this paper, for the problem of text recognition, a neural network ensemble algorithm based on kernel principal component analysis is proposed. The algorithm first use kernel principal component analysis to reduce the dimensionality, removing redundant attributes reasonable. Then use the neural network ensemble algorithm to classify. Experiments on text classification data sets illustrate the algorithm can effectively improve the classification performance of the text classification problem.
出处
《科技通报》
北大核心
2013年第8期124-126,共3页
Bulletin of Science and Technology
关键词
文本识别
冗余属性
核主成分分析
神经网络集成
text recognition
redundant attributes
kernel principal component analysis
neural network ensemble