摘要
网页分类算法中,KNN算法的缺陷之一是分类效率较低,分类的效果很大程度上依赖于相似度函数和参数K的选择。同时,基于支持向量机(SVM)的网页分类器的限制在于要求处理的向量是数值型向量,而网页特征向量往往是词条特征向量。利用KNN算法生成训练样本,进而将词条特征向量数值化,再利用支持向量机分类器对测试网页进行分类,构建了一种新的分类器——KNN-SVM分类器。
In all kind of methods of web page classifications, KNN's efficiency is not good enough, and the performance depends on the similarity function and the parameter K. Meanwhile, the limitation of SVM is the requirement of numeric vectors, but the feature vector of a page is often based on words. Through making use of KNN to generate training samples, and turns word vectors to numeric vectors, then uses SVM to finish the classification, so as to build a new classifier, KNN-SVM classifier.
出处
《现代计算机》
2008年第7期92-94,共3页
Modern Computer
关键词
KNN
SVM
词条特征向量
数值化
K-Nearest Neighbor(KNN)
Support Vector Machine(SVM)
Word Vectors
Numeric