摘要
不平衡数据分类是模式分类领域较难处理的一类问题,其主要原因在于类间样本数目不均衡。为了有效地提高不平衡数据分类效果,本文提出了一种引入偏置选择变量的不平衡数据集重采样算法。该算法引入一个偏置选择变量,该变量定义了多数类样本被取样的概率。通过引入偏置选择变量可以有效地降低不平衡度,因此能很好地提高分类算法在不平衡数据集上的泛化性能。在人工生成数据集上的分类实验充分验证了本文重采样算法的有效性。
Imbalanced data classification is more difficult to handle in the field of pattern classification, mainly due to the uneven number of samples between classes. In order to effectively improve the classification performance on imbalanced data set, this paper proposes an imbalaneed data set resampling algorithm by introducing bias selection variable. The al- gorithm introduced a bias selection variable, which defines the sampling probability of the majority class sample. By in- troducing bias selection variables, the imbalanced degree of data sets can be effectively reduced, and thus the generaliza- tion performance of the classification algorithm on imbalanced data sets can be improved . Classification experiments on artificially generated data sets fully verify the validity of this proposed algorithm.
出处
《科技通报》
北大核心
2013年第8期139-141,共3页
Bulletin of Science and Technology
关键词
模式分类
偏置选择变量
不平衡度
泛化性能
pattem classification
bias selection variables
the imbalanced degree
generalization performance