期刊文献+

基于遗传算法改进的少数类样本合成过采样技术的非平衡数据集分类算法 被引量:18

Classification method for imbalance dataset based on genetic algorithm improved synthetic minority over-sampling technique
下载PDF
导出
摘要 针对少数类样本合成过采样技术(SMOTE)在处理非平衡数据集分类问题时,为少数类的不同样本设置相同的采样倍率,存在一定的盲目性的问题,提出了一种基于遗传算法(GA)改进的SMOTE方法——GASMOTE。首先,为少数类的不同样本设置不同的采样倍率,并将这些采样倍率取值的组合编码为种群中的个体;然后,循环使用GA的选择、交叉、变异等算子对种群进行优化,在达到停机条件时获得采样倍率取值的最优组合;最后,根据找到的最优组合对非平衡数据集进行SMOTE采样。在10个典型的非平衡数据集上进行的实验结果表明:与SMOTE算法相比,GASMOTE在F-measure值上提高了5.9个百分点,在G-mean值上提高了1.6个百分点;与Borderline-SMOTE算法相比,GASMOTE在F-measure值上提高了3.7个百分点,在G-mean值上提高了2.3个百分点。该方法可作为一种新的解决非平衡数据集分类问题的过采样技术。 When the Synthetic Minority Over-sampling Technique( SMOTE) is used in imbalance dataset classification,it sets the same sampling rate for all the samples of minority class in the process of synthetising new samples, which has blindness. To overcome this problem, a Genetic Algorithm( GA) improved SMOTE algorithm, namely GASMOTE( Genetic Algorithm Improved Synthetic Minority Over-sampling Technique) was proposed. At the beginning, GASMOTE set different sampling rates for different minority class samples. One combination of the sampling rates corresponded to one individual in the population. And then, the selection, crossover and mutation operators of GA were iteratively applied on the population to get the best combination of sampling rates when the stopping criteria were met. At last, the best combination of sampling rates was used in SMOTE to synthetise new samples. The experimental results on ten typical imbalance datasets show that, compared with SMOTE algorithm, GASMOTE can increase 5. 9 percentage on F-measure value and 1. 6 percentage on G-mean value,and compared with Borderline-SMOTE algorithm, GASMOTE can increase 3. 7 percentage on F-measure value and 2. 3percentage on G-mean value. GASMOTE can be used as a new over-sampling technique to deal with imbalance dataset classification problem.
出处 《计算机应用》 CSCD 北大核心 2015年第1期121-124,139,共5页 journal of Computer Applications
基金 国家自然科学基金资助项目(61075063) 湖北省自然科学基金资助项目(2013CFA004) 中国博士后科学基金面上资助项目(2014M560700) 重庆博士后特别资助项目(XM2014057)
关键词 非平衡数据集 分类 少数类样本合成过采样技术 采样倍率 遗传算法 imbalance dataset classification Synthetic Minority Over-sampling Technique(SMOTE) sampling rate Genetic Algorithm(GA)
  • 相关文献

参考文献13

  • 1SODA P. A multi-objective optimisation approach for class imbal- ance learning [ J]. Pattern Recognition, 2011, 44 (8) : 1801 - 1810.
  • 2HE H, GARCIA E A. Learning from imbalanced data [ J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21 (9): 1263 - 1284.
  • 3谷琼,袁磊,熊启军,宁彬,李文新.基于非均衡数据集的代价敏感学习算法比较研究[J].微电子学与计算机,2011,28(8):146-149. 被引量:30
  • 4王超学,潘正茂,董丽丽,马春森,张星.基于改进SMOTE的非平衡数据集分类研究[J].计算机工程与应用,2013,49(2):184-187. 被引量:19
  • 5CHAWLA N, BOWYER K, HALL L, et al. SMOTE: synthetic mi- nority over-sampling technique [ J]. Journal of Artificial Intelligence Research, 2002, 16(1) : 321 -357.
  • 6HAN H, WANG W, MAO B. Borderline-SMOTE: a new over-sam- piing method in imbalance data set learning [ C]// ICIC'05: Pro- ceedings of the 2005 International Conference on Advances in Intelli- gent Computing. Berlin: Springer, 2005:878-887.
  • 7GUO H, VIKTOR H L. Learning from imbalance data set with boos- ting and data generation: the DataBoost-IM approach [ J]. ACM SIGKDD Explorations Newsletter, 2004, 6( 1): 30-39.
  • 8陈思,郭躬德,陈黎飞.基于聚类融合的不平衡数据分类方法[J].模式识别与人工智能,2010,23(6):772-780. 被引量:28
  • 9葛继科,邱玉辉,吴春明,蒲国林.遗传算法研究综述[J].计算机应用研究,2008,25(10):2911-2916. 被引量:409
  • 10SU C, CHEN L, YIH Y. Knowledge acquisition through informa- tion granulation for imbalanced data [ J]. Expert Systems with Ap- plications, 2006, 31(3): 531-541.

二级参考文献95

共引文献536

同被引文献152

  • 1王凯,张少杰,马娟,杨红娟,刘敦龙,杨超平.大数据环境下滑坡宏观位移阶段空间分布规律及预警判据研究[J].地球科学进展,2022,37(10):1054-1065. 被引量:3
  • 2凌晓峰,SHENG Victor S..代价敏感分类器的比较研究(英文)[J].计算机学报,2007,30(8):1203-1212. 被引量:35
  • 3彭博,洪永潮,杜森森,韦巍.乒乓球机器人击打点的预测方法[J].江南大学学报(自然科学版),2007,6(4):433-437. 被引量:14
  • 4陈明金,欧阳祖熙,范国胜.基于数据融合的滑坡综合监测信息提取方法[J].大地测量与地球动力学,2007,27(6):77-81. 被引量:11
  • 5Joo Daejoon,Hong Taeho,Han Ingoo.The neural networkmodels for IDS based on the asymmetric costs of falsenegative errors and false positive errors[J].Expert Systemswith Applications,2009:69-75.
  • 6López V,del Río S,Benítez J M,et al.Cost-sensitivelinguistic fuzzy rule based classification systems underthe MapReduce framework for imbalanced big data[J].Fuzzy Sets and Systems,2015,258:5-38.
  • 7Aslantas V,Dogru M.A new SVD based fragile imagewatermarking by using genetic algorithm[C].Sixth InternationalConference on Graphic and Image Processing(ICGIP 2014),2015.
  • 8Wu Tianfu,Zhu Songchun.Learning near-optimal costsensitivedecision policy for object detection[J].PatternAnalysis and Machine,2015,37(5):1013-1027.
  • 9ABDI L, HASHEMI S. To combat multi-class imbalanced problems by means of over-sampling and boosting techniques [J]. Soft Computing, 2015, 19(12): 3369-3385.
  • 10VERBIEST N, RAMENTOL E, CORNELIS C, et al. Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection [J]. Applied Soft Computing, 2014, 22(5): 511-517.

引证文献18

二级引证文献87

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部