It is well-known that in order to build a strong ensemble, the component learners should be with high diversity as well as high accuracy. If perturbing the training set can cause significant changes in the component l...It is well-known that in order to build a strong ensemble, the component learners should be with high diversity as well as high accuracy. If perturbing the training set can cause significant changes in the component learners constructed, then Bagging can effectively improve accuracy. However, for stable learners such as nearest neighbor classifiers, perturbing the training set can hardly produce diverse component learners, therefore Bagging does not work well. This paper adapts Bagging to nearest neighbor classifiers through injecting randomness to distance metrics. In constructing the component learners, both the training set and the distance metric employed for identifying the neighbors are perturbed. A large scale empirical study reported in this paper shows that the proposed BagInRand algorithm can effectively improve the accuracy of nearest neighbor classifiers.展开更多
针对传统k最近邻(k-nearest neighbor,KNN)算法中需要设定k值的问题,文章提出一种基于信息熵的自适应k值KNN二分类算法(adaptive k-value KNN bisecting classification algorithm based on information entropy,EAKNN)。该算法通过引...针对传统k最近邻(k-nearest neighbor,KNN)算法中需要设定k值的问题,文章提出一种基于信息熵的自适应k值KNN二分类算法(adaptive k-value KNN bisecting classification algorithm based on information entropy,EAKNN)。该算法通过引入样本比例定义信息熵,加强小样本的重要性;通过计算小于预设熵阈值的最小信息熵,得到对应的k值和模型分数;在此基础上,结合提出的精度提升模型计算得到模型精度,不断迭代模型精度,直到模型精度最大化。实验结果表明,该算法提升模型精度明显,分类准确率高。展开更多
文摘It is well-known that in order to build a strong ensemble, the component learners should be with high diversity as well as high accuracy. If perturbing the training set can cause significant changes in the component learners constructed, then Bagging can effectively improve accuracy. However, for stable learners such as nearest neighbor classifiers, perturbing the training set can hardly produce diverse component learners, therefore Bagging does not work well. This paper adapts Bagging to nearest neighbor classifiers through injecting randomness to distance metrics. In constructing the component learners, both the training set and the distance metric employed for identifying the neighbors are perturbed. A large scale empirical study reported in this paper shows that the proposed BagInRand algorithm can effectively improve the accuracy of nearest neighbor classifiers.
文摘针对传统k最近邻(k-nearest neighbor,KNN)算法中需要设定k值的问题,文章提出一种基于信息熵的自适应k值KNN二分类算法(adaptive k-value KNN bisecting classification algorithm based on information entropy,EAKNN)。该算法通过引入样本比例定义信息熵,加强小样本的重要性;通过计算小于预设熵阈值的最小信息熵,得到对应的k值和模型分数;在此基础上,结合提出的精度提升模型计算得到模型精度,不断迭代模型精度,直到模型精度最大化。实验结果表明,该算法提升模型精度明显,分类准确率高。