摘要
针对KNN算法懒惰分类和效率不高的特点,将训练数据集进行优化,提取有代表性的训练样本作为代表样本,用其代替整个训练集进行相似度比较。实验结果表明,使用代表样本集的分类性能与传统KNN算法的性能相当,缩短了分类时间,提高了分类效率,并且不需要估计K值,减少了人工估计值的偏差。
This paper proposed an improved method to solve the problem of the KNN. algorithm that the classification process of the KNN costs too much time so that it does fit for web online classification. This improved method based on datasets optimization aims to generate best sample dataset instead of original datasets. The result of the experiments shows that it can improve efficiency of the KNN classification.
出处
《微计算机应用》
2008年第3期21-25,共5页
Microcomputer Applications
关键词
网页分类
KNN
代表样本
数据集优化
相似度
webpage classification, KNN, datasets, optimization, samples, similarity