期刊文献+

基于RSBoost算法的不平衡数据分类方法 被引量:21

Classification Method of Imbalanced Data Based on RSBoost
下载PDF
导出
摘要 不平衡数据的分类问题在多个应用领域中普遍存在,已成为数据挖掘和机器学习领域的研究热点。提出了一种新的不平衡数据分类方法 RSBoost,以解决传统分类方法对于少数类识别率不高和分类效率低的问题。该方法采用SMOTE方法对少数类进行过采样处理,然后对整个数据集进行随机欠采样处理,以改善整个数据集的不平衡性,再将其与Boosting算法相结合来对数据进行分类。通过实验对比了5种方法在多个公共数据集上的分类效果和分类效率,结果表明该方法具有较高的分类识别率和分类效率。 The problem of class imbalance which is very common to many application domains becomes the research hotspot in data mining and machine learning. We presented a new classification method of imbalance data, called RS- Boost, to increase the recognition rate of minority class and the classification efficiency. This approach uses SMOTE (synthetic minority over-sampling technique) and random under-sampling to balance the data sets, and then uses boos- ting method to optimize the classification performance. We conducted experiments using several public data sets to eva- luate the performances of RSBoost and other four methods. The experimental results show that the approach proposed in this article can improve the classification performance and efficiency of imbalance data sets.
出处 《计算机科学》 CSCD 北大核心 2015年第9期249-252,267,共5页 Computer Science
基金 山东省自然科学基金(ZR2013FL034)资助
关键词 不平衡数据 组合数据采样 BOOSTING RSBoost Imbalanced data, Mixed data sampling, Boosting, RSBoost
  • 相关文献

参考文献16

  • 1Batista G E A P A,Prati R C,Monard M C.A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data[J].ACM SIGKDD Explorations Newsletter,2004,6(1):20-29.
  • 2高嘉伟,梁吉业.非平衡数据集分类问题研究进展[J].计算机科学,2008,35(4):10-13. 被引量:16
  • 3Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:Synthetic Minority Over-SamplingTechnique[J].Journal of Artificial Intelligence Research,2002,6(1):321-357.
  • 4Laurikkala J.Improving Identification of Difficult Small Classes by Balancing Class Distribution[C]∥Proceedings of the 8th Conference on AI in Medicine Europe:Artificial.2001:63-66.
  • 5Drummond C,Holte R C.C4.5,Class Imbalance and Cost Sensitivity:Why Under-Sampling beats Over-Sampling[C]∥Proceedings of the ICML’03 Workshop on Learning from.2003.
  • 6Seiffert C,Khoshgoftaar T M,Van Hulse J,et al.RUSBoost:A Hybrid Approach to Alleviating Class Imbalance[J].IEEE T ransactions on System,MAN,and Cybernetics-PART A:Systems and Humans,2010,0(1):185-197.
  • 7Batista G E,Prati R C,Monard M C.A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data[J].ACM SIGKDD Explorations Newsletter,2004,6(1):20-29.
  • 8Chawla N V,Cieslak D A,Hall L O,et al.Automatically Coun-tering Imbalance and Its Empirical Relationship to Cost[J].Data Mining and Knowledge Discovery,2008,17(2):225-252.
  • 9王超学,潘正茂,马春森,董丽丽,张涛.改进型加权KNN算法的不平衡数据集分类[J].计算机工程,2012,38(20):160-163. 被引量:26
  • 10Joshi M V,Kumar V,Agarwal R.Evaluating Boosting Algo-rithms to Classify Rare Classes:Comparison and Improvements[C]∥Proc of the 1st IEEE International Conference on Data Mining.San Jose,USA,2001:257-264.

二级参考文献48

  • 1郑恩辉,李平,宋执环.不平衡数据知识挖掘:类分布对支持向量机分类的影响[J].信息与控制,2005,34(6):703-708. 被引量:17
  • 2谢纪刚,裘正定.非平衡数据集Fisher线性判别模型[J].北京交通大学学报,2006,30(5):15-18. 被引量:15
  • 3凌晓峰,SHENG Victor S..代价敏感分类器的比较研究(英文)[J].计算机学报,2007,30(8):1203-1212. 被引量:35
  • 4Bartlett P L, Traskin M. AdaBoost is consistent. Journal of Machine Learning Research, 2007, 8:2347-2368.
  • 5Schapire R E. The convergence rate of AdaBoost [open prob lem]//Proceedings of the 23rd Conference on Learning Theo ry. Haifa, Israel, 2010.
  • 6Japkowicz N. Learning from imbalanced data sets: A com parison of various strategies/ /Proceedings of the AAAI 2000 Workshop, 2000:10-15.
  • 7Chawla N V, Japkowicz N, Kotcz A. Workshop on learning from imbalanced data sets//Proceedings of the ICML' 2003. Washington, DC, USA, 2003.
  • 8Chawla N V, Japkowicz N, Kolez A. Editorial: Special issue on learning from imbalanced data sets. ACM SIGKDD Ex- plorations Newsletter, 2004, 6 (1) : 1-6.
  • 9He Hai-Bo, Garcia E A. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263-1284.
  • 10Liu X Y, Zhou Z H. The influence of class imbalance on cost-sensitive learning: An empirical study//Proeeedings of the 6th International Conference on Data Mining(ICDM'06). Hong Kong, China, 2006 : 970-974.

共引文献101

同被引文献170

引证文献21

二级引证文献152

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部