期刊文献+

一种融合集成思想的不平衡数据分类方法 被引量:1

An Imbalanced Data Classification Method Integrating Ensemble Ideas
下载PDF
导出
摘要 单一的传统分类器在处理不平衡数据时,对少数类的分类存在较大误差,效果往往不够理想,为了提高少数类和整体的分类性能,提出一种融合集成思想的不平衡数据分类方法。该方法首先对多数类样本进行过采样,与少数类样本组成类平衡的数据集;其次,从数据和算法的异质性出发,集成多个基分类器,利用集成后的算法改变数据分布。实验结果表明,该方法能够有效提升分类器的AUC、G-mean和F-measure,在实验数据集中最高提升了16.7%、21.9%和20.2%。在处理不平衡数据时,该方法能够改善分类器对少数类和整体的分类性能。 When single traditional classifier is dealing with imbalanced data,and there is a large error in the classification of minority classes,and the effect is often not ideal.In order to improve the classification performance of the minority classes and the overall classification,an imbalanced integration idea is proposed.This method firstly oversamples the majority of samples and forms a balanced data set with the minority samples;secondly,starting from the heterogeneity of the data and algorithms,it integrates multiple base classifiers and uses the integrated algorithm to change the distribution of the data.The experimental results show that this method can effectively improve the AUC,G-mean and F-measure of the classifier,and the maximum increase in the experimental data set is 16.7%,21.9%and 20.2%,respectively.This method can improve the classification performance of the classifier for minority classes and the whole when dealing with imbalanced data.
作者 强冰冰 尹红 王瑞 QIANG Bing-bing;YIN Hong;WANG Rui(Faculty of Mechanical and Electrical Engineering,Kunming University of Science and Technology,Kunming 650550,China)
出处 《软件导刊》 2021年第9期206-212,共7页 Software Guide
基金 云南省人民政府发展研究项目(YNDR2017G1C06)。
关键词 不平衡数据 分类器 类边界 少数类 imbalanced data classifier class boundary minority class
  • 相关文献

参考文献7

二级参考文献72

  • 1凌晓峰,SHENG Victor S..代价敏感分类器的比较研究(英文)[J].计算机学报,2007,30(8):1203-1212. 被引量:35
  • 2Bartlett P L, Traskin M. AdaBoost is consistent. Journal of Machine Learning Research, 2007, 8:2347-2368.
  • 3Schapire R E. The convergence rate of AdaBoost [open prob lem]//Proceedings of the 23rd Conference on Learning Theo ry. Haifa, Israel, 2010.
  • 4Japkowicz N. Learning from imbalanced data sets: A com parison of various strategies/ /Proceedings of the AAAI 2000 Workshop, 2000:10-15.
  • 5Chawla N V, Japkowicz N, Kotcz A. Workshop on learning from imbalanced data sets//Proceedings of the ICML' 2003. Washington, DC, USA, 2003.
  • 6Chawla N V, Japkowicz N, Kolez A. Editorial: Special issue on learning from imbalanced data sets. ACM SIGKDD Ex- plorations Newsletter, 2004, 6 (1) : 1-6.
  • 7He Hai-Bo, Garcia E A. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263-1284.
  • 8Liu X Y, Zhou Z H. The influence of class imbalance on cost-sensitive learning: An empirical study//Proeeedings of the 6th International Conference on Data Mining(ICDM'06). Hong Kong, China, 2006 : 970-974.
  • 9Wang B X, Japkowicz N. Boosting support vector machines for imbalanced data sets. Lecture Notes in Artificial Intelli- gence, 2008, 4994: 38-47.
  • 10Ertekin S, Huang J, Bottou L, Giles L. Learning on the border: active learning in imbalanced data classification// Proceedings of the ACM Conference on Information and Knowledge Management. Lisbon, Portugal, 2007: 127-136.

共引文献200

同被引文献12

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部