期刊文献+

基于数据块混合度量的加速K-近邻分类方法

Speeding K-NN Classification Method Based on Data Block Mixed Measurement
下载PDF
导出
摘要 针对标准K-近邻分类方法(K-Nearest Neighbor,KNN)在新样本类别预测过程中需要计算新样本与所有已标记样本距离而导致分类效率低,不能有效处理大规模数据分类的问题,本文提出一种基于数据块混合度量的加速K-近邻分类(KNN Method Based on Data Block Mixed Measurement,KNN_DBM^2)方法。该方法将数据块的混合度量引入K-NN的预测类别过程,首先将已标记的数据划分为不同的数据块,计算每个数据块的中心及数据块的混合度,当待测样本进入时,计算待测样本与所有数据块中心的距离,并选择距离待测样本最近的k个数据块,若k个数据块均为纯数据块,则根据中心标签并采用少数服从多数的原则对待测样本打标签,若存在混合度较高的数据块,则计算待测样本与该混合数据块所有样本的距离及与其他纯的数据块中心的距离,并选择最近的k个样本或中心对待测样本打标签。通过这种数据块划分及混合度量的方式,可以减少需要计算的待测样本与其他已标记样本距离的个数,提高K-近邻分类方法的预测性能。实验结果表明,本文提出的KNN_DBM^2方法能够获得较高的样本预测速度和较好的预测准确率。 This paper presents a K-Nearest Neighbor (KNN) method based on data block mixed measurement, called KNN_ DBM2, in order to solve the problem that the low training efficiency and cannot solve the large scale problems of normal K-NN because it needs compute the distance between the sample to be tested and the labeled samples in the new sample classification prediction process. By introducing the data block mixed measurement into the prediction process of K-NN, this method divides the labeled samples into many various data blocks, and the mixing degree and center of these blocks are computed. When the new sample to be tested is produced, all the distances between this sample and all the centers of data blocks are computed and the nearest k data blocks are extracted. If all these k data blocks are purity, then label the sample to be tested according to the centers label and adopting a minority to obey the majority. But if the mixed data blocks are existed in these blocks, the distance between the sample and all the samples in mixed data block is calculated, and the distance from the center of the other pure data blocks is also calculated, then label of the sample to be tested by the k nearest sample or centers. By this data block dividing and mixed measurement method, it reduces the number of distances between the sample to be tested and the other labeled samples and obtains the high prediction efficiency synchronously. The experiment results demonstrate that the proposed KNN_DBM2 model can obtain the high learning efficiency and testing accuracy simultaneously.
作者 邓曦辉 赵丽
出处 《计算机与现代化》 2016年第12期47-50,共4页 Computer and Modernization
关键词 K-近邻 数据块 混合度量 预测性能 KNN_DBM2算法 K-nearest neighbor data block mixed measurement prediction efficiency KNN_DBM2 algorithm
  • 相关文献

参考文献4

二级参考文献57

  • 1张海龙,王莲芝.自动文本分类特征选择方法研究[J].计算机工程与设计,2006,27(20):3840-3841. 被引量:45
  • 2Duygulu P,Barnard K,de Freitas J F G,et al.Object recognition as machine translation:learning a lexicon for a fixed image vocabulary[J].Leture Noyes in Computer Science,Heidelberg:Springer,2002,23(53):97-112.
  • 3Jeon J,Lavrenko V,Mnmatha R.Automatic image annotation and retrieval using cross-media relevance models[C] ∥Procee-dings of the 26th Annual Intelnational ACM SIGIR Conference on Research and Development in information Retrieval.Toronto,2003:119-126.
  • 4Carneiro G,Chan A B,Moreno P J,et al.Supervised learning of semantic classes for image annotation and retrival[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2007,29(3):394-4l0.
  • 5Vasconcelos N.Minimum probability of error image retrieval[J].IEEE Transactions on signal Processing,2004,52(8):2322-2336.
  • 6Cusano C,Ciocca G,Schettini R.Image annotation using SVM[J].Proceedings of SPIE,2004,53(41):330-338.
  • 7Pan Jia-yu,Yang Hyung-jeong,Faloutsos C,et al.GCap:Graph-based Automatic Image Captioning[C] ∥Proceedings of the 4th International Workshop on Multimedia Data and Document Engineering(MDDE 04),in conjunction with Computer Vision Pattern Recognition Conference(CVPR 04).2004:146-156.
  • 8Witten I H,Moffat A,Bell T.Managing Gigabytes:Compressing and Indexing Documents and Images[M].Morgan Kaufmann Publishers,1999.
  • 9Miller G,Beckwith R,Fellbaum C,et al.WordNet:An on-line lexical database[J].International Journal of Lexicography,1990,3(4):235-244.
  • 10Bailloeul T,Zhu C Z,Xu Y.Automatic image tagging as a random walk with priors on the canonical correlation subspace[C] ∥Proceedings of ACM Multimedia Information Retrieval.2008:75-82.

共引文献145

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部