摘要
介绍了基于向量空间模型(VSM)中的KNN文本分类方法,分析了KNN方法的实质,指出了该方法的不足,对KNN分类中的文档相似性度量公式提出了一种改进方法.改进方法是在文本属性关联和概念共现等基础上提出来的.分类实验结果表明,分类准确率平均提高了约12%.
Based on the Vector Space Model (VSM) in the k - Nearest Neighbor (KNN) text classification methods, the essential of KNN in the VSM and its weakness are analyzed. Then we put forward an improved method, which is based on text attribute association and concept co - occurrence. Results of experiments show that the ratio of accuracy is increased by about 12%.
出处
《河南工程学院学报(自然科学版)》
2008年第3期65-67,共3页
Journal of Henan University of Engineering:Natural Science Edition
关键词
文本分类
KNN
向量模型
相似度
Web page classification
KNN
vector model
degree of similarity