摘要
合成少数类过采样技术(SMOTE)是一种被广泛使用的用来处理不平衡问题的过采样方法,SMOTE方法通过在少数类样本和它们的近邻间线性插值来实现过采样.Borderline-SMOTE方法在SMOTE方法的基础上进行了改进,只对少数类的边界样本进行过采样,从而改善样本的类别分布.通过进一步对边界样本加以区分,对不同的边界样本生成不同数目的合成样本,提出了面向不平衡数据集的一种精化Borderline-SMOTE方法(RB-SMOTE).仿真实验采用支持向量机作为分类器对几种过采样方法进行比较,实验中采用了10个不平衡数据集,它们的不平衡率从0.064 7到0.536 0.实验结果表明:RB-SMOTE方法能有效地改善不平衡数据集的类分布的不平衡性.
One of the most popular over-sampling approaches to deal with the class imbalance problem is SMOTE( Synthetic Minority Over-sampling Technique),which generated new synthetic samples along the line between the minority examples and their selected nearest neighbors. Based on SMOTE method,borderline-SMOTE only over-samples the minority samples near the borderline. In this paper,we propose a refined borderline-SMOTE( RB-SMOTE) method which generates different number of new synthetic samples according to different borderline minority samples. In experiments,SVMs are used as classifiers,the proposed RB-SMOTE method is evaluated on 10 imbalanced datasets whose imbalance ratios vary from drastic 0. 064 7 to 0. 536 0 and is shown to be very effective to improve the performance of classification of imbalance data sets.
出处
《复旦学报(自然科学版)》
CAS
CSCD
北大核心
2017年第5期537-544,共8页
Journal of Fudan University:Natural Science
基金
浙江省自然科学基金(LY18F030003)
国家自然科学基金(61373057)
关键词
不平衡数据集
分类
过采样
支持向量机
imbalanced data set
classification
over sampling
support vector machine