期刊文献+

基于邻近样本类别判断的不平衡数据分类算法 被引量:2

An Imbalanced Data Classification Algorithm Based on Adjacent Samples Labels Judgment
下载PDF
导出
摘要 数据类间分布不均衡是不平衡数据集分类效果不好的主要原因,为了克服类间分布的不均衡,本文提出了一种基于邻近样本类别判断的不平衡数据分类算法。首先,对待判定样本,计算它的k个最邻近样本,然后将待判定样本的类别指派到它的k个最邻近中的多数类。由于本文所提出的不平衡数据分类算法在类别决策时,只考虑少量的邻近样本的类别,而不是考虑所有的训练样本,因此可以较好地克服类间不平衡对少数类分类结果的影响。在客户流失数据集上的仿真实验充分证明了本文算法能较好地处理不平衡数据分类问题。 Uneven distribution between ctasses is the main reason for the bad effects of imbalanced data sets classification, in order to overcome the uneven distribution between classes, in this paper, we proposed an imbalanced data classification algorithm based on adjacent samples labels judgment. First, for the sample undetermined, calculate its k most adjacent samples, and then assign the sample undetermined to the most common class among its k nearest neighbors. As the imbalanced data classification algorithm proposed in this paper only considered the categories of a small number of neighboring samples, rather than considering those of all the training samples, so it can overcome the influence to the minority class caused by the uneven distribution between classes. The simulation experiments on churn datasets fully proved that the proposed algorithm can effectively deal with unbalanced data classification.
作者 胡艳
出处 《科技通报》 北大核心 2013年第10期58-60,共3页 Bulletin of Science and Technology
关键词 不平衡数据集 邻近样本 数据分类 少数类 imbalanced data sets adjacent samples data classification the minority class
  • 相关文献

参考文献7

  • 1Pawlak Z I. Rough sets[J]. International Journal of ParallelProgramming, 1982,11 (5): 341-356.
  • 2Japkow Iczn, Stephen S. The class imbalance problem: asystematic study [J]. Intelligent Data Analysis Journal,2002,6 (5): 429-450.
  • 3徐尽.基于线性判别分析的数据集可分性判定算法[J].科技通报,2013,29(4):31-32. 被引量:5
  • 4Domngos P. METACOST: a general method for makingclassifiers cost sensitive [C]//. Proceedings of the 5 th In-ternational Conference on Knowledge Discovery and DataMining. San Diego, CA:ACM Press, 1999: 155-164.
  • 5Han J W, Kamber M著.范明译.Data Mining Concepts andTechnique(第二版)[M].北京:机械工业出版社,200.1:257-259.
  • 6Cortes C, Vapnik V. Support vector networks [J]. MachineLearning, 1995, 20: 273-297.
  • 7Rumelhart D E, Hinton G E, Williams R J. Learning rep-resentations by back -propagating errors [J]. Nature,1986,323 (6088): 533-536.

二级参考文献2

共引文献4

同被引文献12

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部