期刊文献+

一种面向非平衡数据集分类问题的组合选择方法 被引量:7

Case-based Ensemble Selection for the Classification of Imbalanced Datasets
下载PDF
导出
摘要 由于类分布的不平衡性,很多传统的分类方法在非平衡数据集上的分类效果不好.与传统的方法不同,论文从组合选择的角度考虑不平衡类学习问题,提出了一种基于实例的组合选择方法 CBES,提升组合分类器在不平衡数据集上的分类性能.CBES考察类标号未知的样本的k近邻,并以此为选择集,从分类器库中选择一个最优或次优的子组合分类器来预测未知样本的类标号.由于考虑了待分类样本的局部特征,更关注稀有类,因此CBES能够更好地对非平衡数据集进行分类.实验结果表明,本文的方法能够显著地降低模型的复杂度,有效地提高了非平衡数据集上的分类性能. Because of imbalanced class distribution, most classifiers do not work well on imbalanced datasets. Unlike traditional meth- ods, this paper reconsiders class imbalance learning problem from the viewpoint of ensemble selection, and proposes a new method, named case-based ensemble selection approach (CBES), to improve classifiers' performance on these datasets. CBES tries to search for the k-nearest neighbors of an unlabeled instance as the selection set, and then select an optimal or suboptimal sub-ensemble to pre- dict the instance. Due to considering the local properties of the unlabeled instance and taking more attention on rare class, CBES can do classification more efficiently on imbalanced data sets. Experimental results indicate that CBES induces models significantly with better performance on almost all imbalanced datasets comparing to traditional ensemble approaches.
出处 《小型微型计算机系统》 CSCD 北大核心 2014年第4期770-775,共6页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(60773048 61170223)资助
关键词 非平衡数据集 组合分类器 组合选择 K近邻 基分类器 imbalanced data sets ensemble ensemble selection KNN base classifier
  • 相关文献

参考文献15

  • 1He Hai-bo,Edwardo A Garcia. Learning from imbalanced data[ J]. IEEE Transactions on Knowledge and Data Engineering,2009,21 (9) :1263-1284.
  • 2Kuncheva L I. Combining pattern classifiers:methods and algo- rithms[ M ]. Hoboken, New Jersey:John Wiley and Sons Inc. , 2004.
  • 3Zhou Zhi-hua, Wang Jian-xin, Tang Wei. Ensembling neural net-works:many could be better than all [ J ]. Artificial Intelligence, 2002,139(1-2) :293-263.
  • 4Breiman L. Bagging predictors [ J ]. Machine Learning, 1996,24 (2) :123-140.
  • 5Freund Y, Schapire R F. A decision-theoretic generalization of on- line learning and an application to boosting[ J]. Journal of Comput- er and System Sciences, 1997,55 ( 1 ) : 119-139.
  • 6Breiman L. Random forests [ J ]. Machine learning ,2001,45 ( 1 ) :5-32.
  • 7Rodriguez J J, Kuncbeva L I, Alonso C J. Rotation forest: a new classifier ensemble method[ J]. IEEE Transactions on Pattern Anal- ysis and Machine Intelligence ,2006,28 ( 10 ) : 1619-1630.
  • 8Sun Yan-min,Mobamed S Kamel,Andrew K C Wong. Cost-sensi- tive boosting for classification of imbalanced data[ J]. Pattern Rec- ognition ,2007,40 ( 12 ) :3358-3378.
  • 9Liu Xu-ying,Wu Jian-xin,Zhou Zhi-hua. Exploratory under sampling for class imbalance learning [ C ]. Proceedings of IE~EF. International Conference on Data Mining,Hong Kong , China ,2006 :965-969.
  • 10Margineantu D D, Dietteifich T G. Pruning adaptive boosting [ C ].Proceedings of 14th International Conference on Machine Learn- ing, Nashville, Tennessee, USA, 1997:211-218.

同被引文献41

  • 1He Hai-bo,Garcia,Edwardo A. Learning from imbalanced Data [J]. IEEE Transactions on Knowledge and Data Engineering, 2009,21 (9) : 1263-1284.
  • 2Fawcett T, Provost F. Combining Data Mining and Machine Learning for Effective User Profile[C]//Proceedings of 2nd In- ternational Con/erence on Know/edge Discovery and Data Min- ing. Portland, Oregon, USA, 1996 : 8-13.
  • 3Ezawa K J, Singh M, Norton S W. Learning Goal Oriented Bayesian Networks for Telecommunications Risk Management [C] // Proceedings of the International Conference on Machine Learning. Bari, Italy, 1996 : 139-147.
  • 4Zheng Zhaohui, Wu Xiaoyun, Srihari Rohini. Feature Selection for Text Categorization on Imbalanced Data[J]. SIGKDD Ex- plorations, 2004,6 (1) : 80-89.
  • 5Breiman L. Baggixag predictors[J]. Machine Learning, 1996,24 (2) : 123-140.
  • 6Freund Y, Schapire R F. A decision-theoretic generalization of on-line learning and an application to boosting[J]. Journal of Computer and System Sciences, 1997,55 (1) : 119-139.
  • 7Breiman L. Random forests[J]. Machine learning, 2001,45 (1): 5-32.
  • 8Rodriguez J J, Kuncheva L I, Alonso C J. Rotation Forest: A new classifier ensemble method [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006,28 ( 10 ) : 1619-1630.
  • 9Sun Yan-min,Mobamed S K,Wong A K C. Cost-sensitive boos- ting for classification of imbalanced data[J]. Pattern Recogni- tion, 2007,40(12) : 3358-3378.
  • 10Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: Synthetic Minority Over-Sampling Technique[J]. Journal of Artificial In- telligence Research, 2002,16 : 321-357.

引证文献7

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部