摘要
针对少数类样本合成过采样技术(SMOTE)在处理非平衡数据集分类问题时,为少数类的不同样本设置相同的采样倍率,存在一定的盲目性的问题,提出了一种基于遗传算法(GA)改进的SMOTE方法——GASMOTE。首先,为少数类的不同样本设置不同的采样倍率,并将这些采样倍率取值的组合编码为种群中的个体;然后,循环使用GA的选择、交叉、变异等算子对种群进行优化,在达到停机条件时获得采样倍率取值的最优组合;最后,根据找到的最优组合对非平衡数据集进行SMOTE采样。在10个典型的非平衡数据集上进行的实验结果表明:与SMOTE算法相比,GASMOTE在F-measure值上提高了5.9个百分点,在G-mean值上提高了1.6个百分点;与Borderline-SMOTE算法相比,GASMOTE在F-measure值上提高了3.7个百分点,在G-mean值上提高了2.3个百分点。该方法可作为一种新的解决非平衡数据集分类问题的过采样技术。
When the Synthetic Minority Over-sampling Technique( SMOTE) is used in imbalance dataset classification,it sets the same sampling rate for all the samples of minority class in the process of synthetising new samples, which has blindness. To overcome this problem, a Genetic Algorithm( GA) improved SMOTE algorithm, namely GASMOTE( Genetic Algorithm Improved Synthetic Minority Over-sampling Technique) was proposed. At the beginning, GASMOTE set different sampling rates for different minority class samples. One combination of the sampling rates corresponded to one individual in the population. And then, the selection, crossover and mutation operators of GA were iteratively applied on the population to get the best combination of sampling rates when the stopping criteria were met. At last, the best combination of sampling rates was used in SMOTE to synthetise new samples. The experimental results on ten typical imbalance datasets show that, compared with SMOTE algorithm, GASMOTE can increase 5. 9 percentage on F-measure value and 1. 6 percentage on G-mean value,and compared with Borderline-SMOTE algorithm, GASMOTE can increase 3. 7 percentage on F-measure value and 2. 3percentage on G-mean value. GASMOTE can be used as a new over-sampling technique to deal with imbalance dataset classification problem.
出处
《计算机应用》
CSCD
北大核心
2015年第1期121-124,139,共5页
journal of Computer Applications
基金
国家自然科学基金资助项目(61075063)
湖北省自然科学基金资助项目(2013CFA004)
中国博士后科学基金面上资助项目(2014M560700)
重庆博士后特别资助项目(XM2014057)
关键词
非平衡数据集
分类
少数类样本合成过采样技术
采样倍率
遗传算法
imbalance dataset
classification
Synthetic Minority Over-sampling Technique(SMOTE)
sampling rate
Genetic Algorithm(GA)