期刊文献+

一种基于KNN的文本分类算法 被引量:1

An Algorithm for Text Classification Based on KNN
下载PDF
导出
摘要 KNN(K—Nearest Neighbor)是向量空间模型中最好的文本分类算法之一。但是,当样本集较大以及文本向量维数较多时,KNN算法分类的效率和准确率就会大大降低。该文提出了一种提高KNN分类效率的改进算法,并且改进了相似度的计算方法,能更准确的判断维数高且样本集大的文本向量。算法在训练过程中计算出各类文本在向量空间中的分布范围,在分类过程中,根据待分类文本向量在样本空间中的分布位置,缩小其K最近邻搜索范围。实验证实改进的算法可以在保持KNN分类性能基本不变的情况下,显著提高分类效率。 KNN (K-Nearest Neighbor) is one of the best text classification algorithms by Vector Support Model. However, its efficiency and accuracy rate are very low for text classification task with high dimension and huge samples. In this paper, a new algorithm is intro- duced to improve the efficiency rate. For high precision, we also have a new way to compute the similarity of two texts. The distribution of training samples of each class is computed in the training process. According to the position of the documents in the sample space, this al- gorithm can reduce the searching range of their K nearest neighbors in the classing process. The results of experiments show that this algo- rithm can save largely the classification time and has almost the same classification performance as that of the traditional KNN classification algorithm.
作者 余悦蒙 黄小斌 YU Yue-meng, HUANG Xiao-bin (School of Information Science and Engineering, Xiamen University, Xiamen 361005, China)
出处 《电脑知识与技术》 2012年第3期1564-1566,共3页 Computer Knowledge and Technology
关键词 文本分类 K-最近邻 算法 text classification KNN algorithm
  • 相关文献

参考文献2

二级参考文献10

  • 1王晓晔,王正欧.K-最近邻分类技术的改进算法[J].电子与信息学报,2005,27(3):487-491. 被引量:25
  • 2Yang Y.Expert network:Effective and efficient learning from human decisions in text categorizations in text categorization and retrieval[C]//The 17th International ACM SIGIR Conference on Research and development in Information Retrieval,1994:13-22.
  • 3Aha D W,Kibler D,Albert M K.Instance-based learning algorithms[J].Machine Learning,1991(6):37-66.
  • 4Aha D W.Lazy learning[M].Dordrecht:Kluwer Academic,1997.
  • 5Hjahason G R,Hanan S.Index-driven similarity search in metric spaces[J].ACM Trans.on Database Systems,2003,28(4):517-580.
  • 6Hinneburg A,Aggarwal C C,Keim D A.What is the nearest neighbor in high dimensional spaces[C]//The 26th International Conference on Very Large Data Bases,Cairo,Egypt,2000:506-515.
  • 7Weber R,Schek H,Blott S.A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces[C]//The 24th International Conference on Very Large Data Bases,Morgan Kaufman,1998:494-205.
  • 8于一.K-近邻法的文本分类算法分析与改进[J].火力与指挥控制,2008,33(4):143-145. 被引量:5
  • 9庞剑锋,卜东波,白硕.基于向量空间模型的文本自动分类系统的研究与实现[J].计算机应用研究,2001,18(9):23-26. 被引量:293
  • 10杨建良,王永成.基于KNN与自动检索的迭代近邻法在自动分类中的应用[J].情报学报,2004,23(2):137-141. 被引量:18

共引文献5

同被引文献10

引证文献1

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部