期刊文献+

一种基于SMOTE的不平衡数据集重采样方法 被引量:19

A NEW RESAMPLING METHOD BASED ON SMOTE FOR IMBALANCED DATA SET
下载PDF
导出
摘要 不平衡数据集是指在数据集中,某一类样本的数量远大于其他类样本的数量,其会影响分类结果,使基本分类器偏向多数类。合成少数样本过采样技术(SMOTE)是处理数据不平衡问题的一种经典过采样方法,以两个少数样本对应的线段为端点生成一个合成样本。提出一种基于SMOTE的少数群体过采样方法,改进生成新样本的方式,在合成样本的过程中参考两个以上的少数类样本,增加合成样本的多样性。实验结果表明,在不同的基本分类器下该方法可以获得更好的接收者操作特征曲线面积(ROC-AUC)和稳定性。 The imbalanced data set refers to more instances in one class than that in other classes,which can influence classification results,and make basic classifiers have bias towards the majority class.Synthetic minority over-sampling technique(SMOTE)is one of over-sampling methods dealing with data imbalance problem,this method generates one synthetic sample according to a line segment of two minority samples as endpoint.This paper proposes a new over-sampling method of the minority class based on SMOTE.This method made improvement on how to generate new samples,it took more than two real samples into account to generate one synthetic sample,which increased diversity of synthetic samples.The experimental results show that this method achieves better area under curve and stability.
作者 张天翼 丁立新 Zhang Tianyi;Ding Lixin(School of Computer Science,Wuhan University,Wuhan 430072,Hubei,China)
出处 《计算机应用与软件》 北大核心 2021年第9期273-279,共7页 Computer Applications and Software
基金 广东省珠海市产学研合作项目(2010A090200067,2016B090918097,2012D0501990016,2012D0501990026)。
关键词 不平衡数据集 过采样 样本合成 分类 Imbalanced dataset Over-sampling Sample synthesis Classification
  • 相关文献

参考文献4

二级参考文献50

  • 1凌晓峰,SHENG Victor S..代价敏感分类器的比较研究(英文)[J].计算机学报,2007,30(8):1203-1212. 被引量:35
  • 2Bartlett P L, Traskin M. AdaBoost is consistent. Journal of Machine Learning Research, 2007, 8:2347-2368.
  • 3Schapire R E. The convergence rate of AdaBoost [open prob lem]//Proceedings of the 23rd Conference on Learning Theo ry. Haifa, Israel, 2010.
  • 4Japkowicz N. Learning from imbalanced data sets: A com parison of various strategies/ /Proceedings of the AAAI 2000 Workshop, 2000:10-15.
  • 5Chawla N V, Japkowicz N, Kotcz A. Workshop on learning from imbalanced data sets//Proceedings of the ICML' 2003. Washington, DC, USA, 2003.
  • 6Chawla N V, Japkowicz N, Kolez A. Editorial: Special issue on learning from imbalanced data sets. ACM SIGKDD Ex- plorations Newsletter, 2004, 6 (1) : 1-6.
  • 7He Hai-Bo, Garcia E A. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263-1284.
  • 8Liu X Y, Zhou Z H. The influence of class imbalance on cost-sensitive learning: An empirical study//Proeeedings of the 6th International Conference on Data Mining(ICDM'06). Hong Kong, China, 2006 : 970-974.
  • 9Wang B X, Japkowicz N. Boosting support vector machines for imbalanced data sets. Lecture Notes in Artificial Intelli- gence, 2008, 4994: 38-47.
  • 10Ertekin S, Huang J, Bottou L, Giles L. Learning on the border: active learning in imbalanced data classification// Proceedings of the ACM Conference on Information and Knowledge Management. Lisbon, Portugal, 2007: 127-136.

共引文献109

同被引文献178

引证文献19

二级引证文献30

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部