期刊文献+

基于聚类和密度裁剪的改进KNN算法 被引量:6

An Improved KNN Method for Reducing the Amount of Training Samples Based on Clustering and Density
下载PDF
导出
摘要 经典KNN算法在处理高维数据或样本数繁多的样本集时需要巨大的计算量,这使其在实际应用的过程中存在着一定的局限性;提出一种基于聚类和密度裁剪的改进KNN算法。在训练阶段,首先根据样本密度对整个训练集进行裁剪,然后将裁剪好的训练集进行聚类处理,得到若干个密度比较均匀的类簇并将其转化为超球。在测试阶段,采用两种方法,第一种是找出距离待测样本最近的k个超球,然后将这个k个超球内的训练样本作为新的训练样本集,在这个新的训练样本集上使用经典KNN算法得到待测样本的类别;第二种则是找出距离待测样本最近的1个超球,然后根据该超球的类别得出待测样本的类别。实验采用8个UCI样本集进行测试,实验结果表明,该算法同经典KNN相比具有良好的性能,是一种有效的分类方法。 Classical KNN method has some limitations in the practical application process because of its large computational demands when using it to deal with high-dimensional data set including lots of sam-ples. An improved KNN method is proposed for reducing the amount of training samples based on cluste-ring and density. In the training stage, first, reduce the amount of training samples based on the samples’ density, then, cluster the training samples and turn the class clusters into hyper-spheres. In the testing stage, two methods are designed, the first is to find the testing samplers k nearest hyper-spheres, and rec-ognize all the training samples in the k nearest hyper-spheres as the new trainset, then use the classical KNN method to get the testing sample’s class in this new training set. The second is to find the testing samplers nearest hyper-sphere, and get the testing samplers class according to the hyper-sphere's class. Eight UCI datasets are used to do the experiments. The results show that the improved KNN method is effective and has good performance compared with classical KNN method.
出处 《青岛大学学报(自然科学版)》 CAS 2017年第2期62-68,共7页 Journal of Qingdao University(Natural Science Edition)
关键词 聚类 密度 样本裁剪 KNN算法 clustering density reducing the amount of training samples KNN method
  • 相关文献

参考文献8

二级参考文献65

  • 1董道国,刘振中,薛向阳.VA-Trie:一种用于近似k近邻查询的高维索引结构[J].计算机研究与发展,2005,42(12):2213-2218. 被引量:10
  • 2苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:386
  • 3巩军,刘鲁.一种k-NN文本分类器的改进方法[J].情报学报,2007,26(1):56-59. 被引量:10
  • 4王煜,白石,王正欧.用于Web文本分类的快速KNN算法[J].情报学报,2007,26(1):60-64. 被引量:33
  • 5刘红岩.可扩展的快速分类算法的研究与实现[M].北京:清华大学出版社,2000..
  • 6(加)HanJ KamberM 范明 盂小峰 等译.数据挖掘概念与技术m[M].北京:机械工业出版社,2001.223-262.
  • 7..http://lib, slat. Cmu. Edu/datasets/places. Data,.
  • 8Lewis D D. Naive Bayes at Forty: The Independence Assumption in Information Retrieval // Proc of the lOth European Conference on Machine Learning. Chemnitz, Germany, 1998 : 4 - 15.
  • 9Cohen W W, Singer Y. Context-Sensitive Learning Methods for Text Categorization// Proc of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Zurich, Switzerland, 1996 : 307 - 315.
  • 10Joaehims T. Text Categorization with Support Vector Machines: Learning with Many Relevant Features//Proc of the 10th European Conference on Machine Learning. Chemnitz, Germany, 1998: 137 - 142.

共引文献415

同被引文献42

引证文献6

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部