摘要
KNN作为一种简单的分类方法在文本分类中有广泛的应用,但存在着计算量大和训练文档分布不均所造成的分类准确率下降等问题.针对这些问题,基于最小化学习误差的增量思想,该文将学习型矢量量化(LVQ)和生长型神经气(GNG)结合起来提出一种新的增量学习型矢量量化方法,并将其应用到文本分类中.文中提出的算法对所有的训练样本有选择性地进行一次训练就可以生成有效的代表样本集,具有较强的学习能力.实验结果表明:这种方法不仅可以降低KNN方法的测试时间,而且可以保持甚至提高分类的准确性.
As a simple classification method KNN has been widely applied in text classification. There are two problems in KNN-based text classification: the large computation load and the deterioration of classification accuracy caused by the non-uniform distribution of training samples. To solve these problems, based on minimizing the increment of learning errors and combining LVQ and GNG, the authors propose a new growing LVQ method and apply it to text classification. The method can generate an effective representative sample set after one phase of selective training of the training sample set, and hence has a strong learning ability. Experimental results show that this method can not only reduce the testing time of KNN, but also maintain or even improve the accuracy of classification.
出处
《计算机学报》
EI
CSCD
北大核心
2007年第8期1277-1285,共9页
Chinese Journal of Computers