期刊文献+

基于聚类的加速k-近邻分类方法 被引量:6

SPEEDING K-NN CLASSIFICATION METHOD BASED ON CLUSTERING
下载PDF
导出
摘要 实际生活中,经常会遇到大规模数据的分类问题,传统k-近邻k-NN(k-Nearest Neighbor)分类方法需要遍历整个训练样本集,因此分类效率较低,无法处理具有大规模训练集的分类任务。针对这个问题,提出一种基于聚类的加速k-NN分类方法 C_kNN(Speeding k-NN Classification Method Based on Clustering)。该方法首先对训练样本进行聚类,得到初始聚类结果,并计算每个类的聚类中心,选择与聚类中心相似度最高的训练样本构成新的训练样本集,然后针对每个测试样本,计算新训练样本集中与其相似度最高的k个样本,并选择该k个近邻样本中最多的类别标签作为该测试样本的预测模式类别。实验结果表明,C_k-NN分类方法在保持较高分类精度的同时大幅度提高模型的分类效率。 There are a lot of large-scale data classification problems in practical life,and traditional k-nearest neighbour (k-NN) classification method always needs to traverse entire training sample set,so its classification efficiency is low and can not solve the classification problems with large-scale training set.To solve this problem,this paper presents a clustering-based speeding k-NN classification method,called C_k-NN.The method first clusters the training sample to obtain initial clustering results,and calculates the cluster centres of each category,and chooses those training samples having the highest similarity to the cluster centre to form the new training sample set.Then aiming at every test sample it calculates k samples in the new training sample set with the highest similarities as it,and selects the category labels the most in number in these k nearest neighbour samples as the forecasting pattern category of this testing sample.Experimental results demonstrate that the proposed C _k-NN classification method can improve the model classification efficiency greatly while keeping higher accuracy as well.
作者 任丽芳
出处 《计算机应用与软件》 CSCD 2015年第10期298-301,共4页 Computer Applications and Software
关键词 k-近邻分类 聚类 相似度 训练样本集 C_k-NN算法 k-nearest neighbour classification Clustering Similarity Training sample set C_k-NN algorithm
  • 相关文献

参考文献15

  • 1Nature. Big Data[ EB/OL]. [2012-10-02]. http://www, nature, corn/ news/specials/bigdata/index, html.
  • 2Science. Special online collection : Dealing with data[ EB/OL]. 2011. [ 2012-10-02 ]. http ://www. sciencemag, org/site/special/data/.
  • 3Tian J, Li M Q, Chen F Z, et al. Coevolutionary learning of neural net- work ensemble for complex classification tasks [ J ]. Pattern Recogni- tion ,2012,45 (4) : 1373 - 1385.
  • 4Haim Y B,Tov E T. A streaming parallel decision tree algorithm[J]. Journal of Machine Learning Research ,2010,11 : 849 - 872.
  • 5Xue H, Chen S C, Yang Q. Structural regularized support vector ma- chine: A framework for structural large margin classifier [ J ]. IEEE Transactions on Neural Networks,2011,22 ( 4 ) :573 - 587.
  • 6Hart P E. The condensed nearest neighbor rule[ J]. IEEE Transactions on Information Theory,1968,14(3) :515-516.
  • 7刘义,景宁,陈荦,熊伟.MapReduce框架下基于R-树的k-近邻连接算法[J].软件学报,2013,24(8):1836-1851. 被引量:60
  • 8Liu Z G, Pan Q, Dezert I. A new belief-based K-nearest neighbor classi- fication method [ J ]. Pattern Recognition, 2013,48 ( 3 ) : 834 - 844.
  • 9Wilson D R, Martinez T R. Improved heterogeneous distance functions [ J]. Artificial Intelligence Research, 1997,6 : 1 - 34.
  • 10罗辛,欧阳元新,熊璋,袁满.通过相似度支持度优化基于K近邻的协同过滤算法[J].计算机学报,2010,33(8):1437-1445. 被引量:125

二级参考文献54

  • 1Sarwar B,Karypis G,Konstan J,Reidl J.Item-based collaborative filtering recommendation algorithms//Proceedings of the 10th International Conference on World Wide Web.Hong Kong,China,2001:285-295.
  • 2Deshpande M,Karypis G.Item-based top-n recommendation algorithms.ACM Transactions on Information Systems,2004,22(1):143-177.
  • 3Bell R M,Koren Y.Improved neighborhood-based collaborative filtering//Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.California,2007:7-14.
  • 4Koren Y.Factor in the Neighbors:Scalable and accurate collaborative filtering.ACM Transactions on Knowledge Discovery from Data,2009,4(1):1-24.
  • 5Kurucz M,Benczúr A A,Csalogny K.Methods for large scale SVD with missing values//KDD Cup Workshop at Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.California,2007:31-38.
  • 6Paterek A.Improving regularized singular value decomposition for collaborative filtering//KDD Cup Workshop at Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.California,2007:39-42.
  • 7Takcs G,Pilszy I,Németh B,Tikky D.Investigation of various matrix factorization methods for large recommender systems//Proceedings of the 2nd KDD Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition,2008:1-8.
  • 8Herlocker J,Konstan J,Riedl J.An empirical analysis of design choices in neighborhood-based collaborative filtering algorithms.Information Retrieval,2002,5(4):287-310.
  • 9Herlocker J,Konstan J,Terveen L,Riedl J.Evaluating collaborative filtering recommender systems.ACM Transactions on Information Systems,2004,22(1):5-53.
  • 10Adomavicius G,Tuzhilin A.Toward the next generation of recommender systems:A survey of the state-of-the-art and possible extensions.IEEE Transactions on Knowledge and Data Engineering,2005,17(6):734-749.

共引文献211

同被引文献49

引证文献6

二级引证文献32

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部