摘要
非均衡数据集的分类过程中,产生了向多数类偏斜、少数类识别率较低的问题。为了提高少数类的分类精度,提出了一种S-SMO-Boost方法。该方法基于Adaboost提升算法迭代过程中错分少数类样本,构造虚拟样本,以加强对易错分样本的训练;其中构造样本利用空间插值方法,即在错分少数类样本周围构造超几何体,在该超几何体内部空间随机插值产生有效虚拟样本。在实际数据集上进行实验验证,结果表明,S-SMO-Boost方法提高了非均衡数据集的分类性能。
Analyzing the problem that the classification results is always biased to the majority class in imbalanced data sets. An improved method S-SMO-Boost is proposed. Based on the minorities which are misclassified in the iterative process of Adaboost algorithm, virtual samples are constructed to strengthen the training of minority class samples that are hardly classified .A method called S-SMOTE is used to construct a super geometry based on the minority class samples and its k nearest neighbors. The new virtual samples are generated inside the super geometry. Based on the real data sets, the experiments show that S-SMO- Boost improved the classification performance of imbalanced data sets.
出处
《微型机与应用》
2012年第18期60-62,65,共4页
Microcomputer & Its Applications
关键词
非均衡数据集
超几何体
样本生成
提升算法
imbalanced data sets
super geometry
generate samples
boosting algorithm