摘要
现实世界中广泛存在着类别分布不均衡的数据,而传统分类算法在数据失衡的情况下分类效果很不理想,为此提出一种基于决策准则优化的组合分类算法.该算法基于朴素贝叶斯模型输出的后验概率,以不均衡数据评价指标作为目标函数,对决策阈值(二类)或错分代价参数(多类)进行优化,得到最佳的分类决策准则;同时为了提高分类的泛化性,提出一种自适应随机子空间组合分类算法,增强基分类器之间的差异性,避免分类器学习和决策准则优化的过拟合,并可自动获得基分类器的最佳数量.通过大量UCI数据集的实验验证表明,与其它同类算法相比,该算法在精度和效率上都具有更好的处理不均衡数据的优势.
There widely exists the class imbalanced data in the real world, and the classification results of traditional classifiers in the case of imbalanced data set are not satisfactory, therefore we propose an ensemble classifier based on the optimization of the decision- criteria parameters. Using the imbalanced data evaluation metric as the objective function, the method optimizes the decision threshold parameter (binary class ) or misclassification cost parameters ( multiple classes) based on the posterior probabilities generated from Naive Bayesian model, so as to achieve the best decision criteria; moreover, to improve the generalization ability of classification on the imbalanced data, we design a adaptive random subspace ensemble classifier, which enhances the diversity between base classifiers with avoiding overfitting of learning and optimizing. Furthermore it can obtain the optimal amount of classifiers automatically. Exper- imental results demonstrate that the proposed method has a better advantage for imbalanced data learning in terms of accuracy and effi- ciency through a large number of UCI datasets.
出处
《小型微型计算机系统》
CSCD
北大核心
2014年第5期961-966,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61001047)资助
中央高校基本科研业务费专项资金项目(N110618001)资助
关键词
不均衡数据分类
代价敏感学习
组合分类
随机子空间
imbalanced data classification
cost sensitive learning
ensemble classification
random subspace method