期刊文献+

引入偏置选择变量的不平衡数据集重采样方法

An Imbalanced Data Set Resampling Algorithm by Introducing Bias Selection Variable
下载PDF
导出
摘要 不平衡数据分类是模式分类领域较难处理的一类问题,其主要原因在于类间样本数目不均衡。为了有效地提高不平衡数据分类效果,本文提出了一种引入偏置选择变量的不平衡数据集重采样算法。该算法引入一个偏置选择变量,该变量定义了多数类样本被取样的概率。通过引入偏置选择变量可以有效地降低不平衡度,因此能很好地提高分类算法在不平衡数据集上的泛化性能。在人工生成数据集上的分类实验充分验证了本文重采样算法的有效性。 Imbalanced data classification is more difficult to handle in the field of pattern classification, mainly due to the uneven number of samples between classes. In order to effectively improve the classification performance on imbalanced data set, this paper proposes an imbalaneed data set resampling algorithm by introducing bias selection variable. The al- gorithm introduced a bias selection variable, which defines the sampling probability of the majority class sample. By in- troducing bias selection variables, the imbalanced degree of data sets can be effectively reduced, and thus the generaliza- tion performance of the classification algorithm on imbalanced data sets can be improved . Classification experiments on artificially generated data sets fully verify the validity of this proposed algorithm.
作者 徐尽
出处 《科技通报》 北大核心 2013年第8期139-141,共3页 Bulletin of Science and Technology
关键词 模式分类 偏置选择变量 不平衡度 泛化性能 pattem classification bias selection variables the imbalanced degree generalization performance
  • 相关文献

参考文献6

  • 1Gustavo E A,Batista P A,Ronaldo C,et al.A study ofthe behavior of several methods for balancing machinelearning training data[J].SIGKDD Explorations,2004,6(1):20-29.
  • 2Drummond C,Holte R C.C4.5,class imbalance,and costsensitivity:why under-sampling beats over-sampling[C]//.International Conference on Machine Learning.Washing-ton DC,2003:152-154.
  • 3Quinlan,J.R,Induction of decision trees[J].Machinelearning.1986,1(1):81-106.
  • 4Kohavi R.A study of cross-validation and bootstrap foraccuracy estimation and model selection.[C]//Wermter S,Riloff E,Scheler G,eds.Proc.14th Joint Int.Conf.Artifi-cial Intelligence.San Mateo,CA:Morgan Kaufmann,1995.1137-1145.
  • 5孙英慧,孙英娟,蒲东兵.基于决策目标的知识获取方法[J].科技通报,2012,28(12):78-80. 被引量:1
  • 6HanJW,KamberM著.范明译.Data Mining Conceptsand Techniques,第二版[M].北京:机械工业出版社,2001:257-259.

二级参考文献10

共引文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部