摘要
KNN算法是一种应用广泛的人工智能算法,在文本分类应用中,简单有效,易于实现.但是,KNN分类的时间复杂度与训练样本数量成正比,而且,训练样本分布密度的不均匀性将导致分类准确性的下降.本文在KNN算法的基础上,提出一种改进算法.算法分析了训练样本的分布密度,通过裁减高密度区域训练样本,降低样本数量,调节训练样本分布,达到提高分类准确性的目的.实验证明,基于密度的改进KNN文本分类算法在降低时间复杂度的同时,还具有较好的准确率和召回率.
The KNN algorithm is a widely used in artificial intelligence field. As a text categorization algorithm, it is simple,effectlve, and easy to implement. But the time complexity of KNN is directly proportional to the sample size. And the categorization accuracy will decrease in case of training samples uneven distribution. An improved KNN algorithm is proposed to improve the text categorization accuracy by adjusting training sample distribution. It analyzed and reduced the training samples in high distribution density areas. Experiments show that, the algorithm works with lower time complexity, also has better accuracy rate and r, ecall rate than common KNN in text classification.
出处
《漳州师范学院学报(自然科学版)》
2012年第2期45-48,共4页
Journal of ZhangZhou Teachers College(Natural Science)
关键词
K近邻
文本分类
样本裁减
KNN
Text Categorization
Sample Reduction