摘要
传统的数据分类算法多是基于平衡的数据集创建,对不平衡数据分类时性能下降,而实践表明组合选择能有效提高算法在不平衡数据集上的分类性能。为此,从组合选择的角度考虑不平衡类学习问题,提出一种新的组合剪枝方法,用于提升组合分类器在不平衡数据上的分类性能。使用Bagging建立分类器库,直接用正类(少数类)实例作为剪枝集,并通过MBM指标和剪枝集,从分类器库中选择一个最优或次优子组合分类器作为目标分类器,用于预测待分类实例。在12个UCI数据集上的实验结果表明,与EasyEnsemble、Bagging和C4.5算法相比,该方法不但能大幅提升组合分类器在正类上的召回率,而且还能提升总体准确率。
Aiming to solve the problem of the low classification performance on imbalanced data caused by the construction on the balanced data set, this paper proposes a new simple but effective Ensemble Pruning Method Based on Positive Examples(EPPE) to improve the classification performance of ensemble on imbalanced data sets. It establishes classifier database, directly treats positive(minority-class) cases as pruning set, and selects an optimal or sub-optimal classifier based on the index of MBM and pruning set as target classifier to predict classification cases. Experimental results on twelve UCI data sets indicate that EPPE not only significantly improves the recall rate of pruning set on positive(minority-class) cases, but also increases its overall accuracy compared with EasyEnsemble, Bagging and C4.5 algorithm.
出处
《计算机工程》
CAS
CSCD
2014年第6期157-161,165,共6页
Computer Engineering
关键词
不平衡数据集
组合剪枝
剪枝集
评估指标
基分类器
imbalanced data set
ensemble pruning
pruning set
assessment metrics
base classifier