期刊文献+

一种面向不平衡数据分类的组合剪枝方法 被引量:12

An Ensemble Pruning Method for Imbalanced Data Classification
下载PDF
导出
摘要 传统的数据分类算法多是基于平衡的数据集创建,对不平衡数据分类时性能下降,而实践表明组合选择能有效提高算法在不平衡数据集上的分类性能。为此,从组合选择的角度考虑不平衡类学习问题,提出一种新的组合剪枝方法,用于提升组合分类器在不平衡数据上的分类性能。使用Bagging建立分类器库,直接用正类(少数类)实例作为剪枝集,并通过MBM指标和剪枝集,从分类器库中选择一个最优或次优子组合分类器作为目标分类器,用于预测待分类实例。在12个UCI数据集上的实验结果表明,与EasyEnsemble、Bagging和C4.5算法相比,该方法不但能大幅提升组合分类器在正类上的召回率,而且还能提升总体准确率。 Aiming to solve the problem of the low classification performance on imbalanced data caused by the construction on the balanced data set, this paper proposes a new simple but effective Ensemble Pruning Method Based on Positive Examples(EPPE) to improve the classification performance of ensemble on imbalanced data sets. It establishes classifier database, directly treats positive(minority-class) cases as pruning set, and selects an optimal or sub-optimal classifier based on the index of MBM and pruning set as target classifier to predict classification cases. Experimental results on twelve UCI data sets indicate that EPPE not only significantly improves the recall rate of pruning set on positive(minority-class) cases, but also increases its overall accuracy compared with EasyEnsemble, Bagging and C4.5 algorithm.
出处 《计算机工程》 CAS CSCD 2014年第6期157-161,165,共6页 Computer Engineering
关键词 不平衡数据集 组合剪枝 剪枝集 评估指标 基分类器 imbalanced data set ensemble pruning pruning set assessment metrics base classifier
  • 相关文献

参考文献17

  • 1He Haibo, Garcia E A. Learning from Imbalanced Data[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263-1284.
  • 2Friedman J H, Bogdan E P. Predictive Learning via Rule Ensemble[J]. Annals of Applied Statistics, 2008, 2(3): 916-954.
  • 3Tan P, Steinbach M, Kumar V. 数据挖掘导论[M]. 范 明, 范宏建, 译. 北京: 人民邮电出版社, 2008.
  • 4Partalas I, Tsoumakas G, Vlahavas I P. An Ensemble Pruning Primer[C]//Proc. of Workshop on Applications of Supervised and Unsupervised Ensemble Methods. Berlin, Germany: Springer-Verlag, 2009: 1-13.
  • 5Breiman L. Bagging Predictors[J]. Machine Learning, 1996, 24(2): 123-140.
  • 6Freund Y, Schapire R F. A Decision-theoretic Generalization of On-line Learning and an Application to Boosting[J]. Journal of Computer and System Sciences, 1997, 55(1): 119-139.
  • 7Breiman L. Random Forests[J]. Machine Learning, 2001, 45(1): 5-32.
  • 8Rodriguez J J, Kuncheva L I, Alonso C J. Rotation Forest: A New Classifier Ensemble Method[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(10): 1619-1630.
  • 9Liu Xuying, Wu Jianxin, Zhou Zhihua. Exploratory Under Sampling for Class Imbalance Learning[C]//Proc. of ICDM’06. Hong Kong, China: [s. n.], 2006: 965-969.
  • 10Tang E K, Suganthan P N, Yao Xin. An Analysis of Diversity Measure[J]. Machine Learning, 2006, 65(1): 247-271.

二级参考文献19

  • 1Kuncheva L I. Combining Pattern Classifiers: Methods and Algorithms. New York, USA: John Wiley and Sons, 2004.
  • 2Breiman L. Bagging Predictors. Machine Learning, 1996, 24(2) : 123-140.
  • 3Freund Y, Schapire R F. A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting. Journal of Computer and System Sciences, 1997, 55( 1 ) : 119-139.
  • 4Breiman L. Random Forests. Machine Learning, 2001,45 (1) : 5- 32.
  • 5Rodriguez J J, Kuncheva L I, Alonso C J. Rotation Forest: A New Classifier Ensemble Method. IEEE Trans on Pattern Analysis and Machine Intelligence, 2006, 28(10) : 1619-1630.
  • 6Zhang Daoqiang, Chen Songcan, Zhou Zhihua, et al. Constraint Projections for Ensemble Learning//Proc of the 23rd AAAI Conference on Artificial Intelligence. Chicago, USA, 2008:758-763.
  • 7Partalas I, Tsoumakas G, Vlahavas I P. An Ensemble Pruning Primer [ EB/OL]. [2012-01-31]. http://lpis, csd. auth. gr / publications / tsoumakas09. pdf.
  • 8Caruana R, Nicu|escu-Mizil A, Crew G, et al. Ensemble Selection from Libraries of Models//Proc of the 21st International Conference on Machine Learning. Banff, Canada, 2004:137-144.
  • 9Martinez-Muverbnoz G, Suarez A. Aggregation Ordering in Bagging //Proc of the International Conference on Artificial Intelligence and Applications. Innsbruck, Austria, 2004 : 258-263.
  • 10Martinez-Muverbnoz G, Suarez A. Pruning in Ordered Bagging Ensembles//Proc of the 23rd International Conference on Machine Learning. Pittsburgh, USA, 2006:609-616.

共引文献2

同被引文献61

  • 1刘胥影,吴建鑫,周志华.一种基于级联模型的类别不平衡数据分类方法[J].南京大学学报(自然科学版),2006,42(2):148-155. 被引量:23
  • 2刘三阳,杜喆.一种改进的模糊支持向量机算法[J].智能系统学报,2007,2(3):30-33. 被引量:10
  • 3吴广潮,陈奇刚.不平衡数据集中的组合分类算法[J].计算机工程与设计,2007,28(23):5687-5689. 被引量:4
  • 4He Haibo,Garcia E A.Learning from Imbalanced Data[J].IEEE Transactions on Knowledge and Data Engineering,2009,21(9):1263-1284.
  • 5Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:Synthetic Minority Over-sampling Technique[J].Journal of Artificial Intelligence Research,2002,16:321-357.
  • 6Chen Shen.He Haibo,Garcia E A.RAMO Boost:Ranked Minority Oversampling in Boosting[J].IEEE Transac-tions on Neural Networks,2010,21(10):1624-1642.
  • 7Cao Peng,Zhao Dazhe,Zaiane O.An Optimized Costsensitive SVM for Imbalanced Data Learning[C]//Proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining.Gold Coast,Australia:[s.n.],2013:280-292.
  • 8Zhou Zhihua,Liu Xuying.Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem[J].IEEE Transactions on Knowledge and Data Engineering,2006,18(1):63-77.
  • 9Masnadi H,Vasconcelos N,Iranmehr A.Cost-sensitive Support Vector Machines[J].Journal of Machine Learning Research,2015,1(1):1-26.
  • 10Wang B X,Japkowicz N.Boosting Support Vector Machines for Imbalanced Data Sets[J].Knowledge and Information Systems,2010,25(1):1-20.

引证文献12

二级引证文献77

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部