期刊文献+

基于决策准则优化的不均衡数据分类 被引量:2

Multiclass Imbalanced Data Classification Based on Decision Criteria Optimization
下载PDF
导出
摘要 现实世界中广泛存在着类别分布不均衡的数据,而传统分类算法在数据失衡的情况下分类效果很不理想,为此提出一种基于决策准则优化的组合分类算法.该算法基于朴素贝叶斯模型输出的后验概率,以不均衡数据评价指标作为目标函数,对决策阈值(二类)或错分代价参数(多类)进行优化,得到最佳的分类决策准则;同时为了提高分类的泛化性,提出一种自适应随机子空间组合分类算法,增强基分类器之间的差异性,避免分类器学习和决策准则优化的过拟合,并可自动获得基分类器的最佳数量.通过大量UCI数据集的实验验证表明,与其它同类算法相比,该算法在精度和效率上都具有更好的处理不均衡数据的优势. There widely exists the class imbalanced data in the real world, and the classification results of traditional classifiers in the case of imbalanced data set are not satisfactory, therefore we propose an ensemble classifier based on the optimization of the decision- criteria parameters. Using the imbalanced data evaluation metric as the objective function, the method optimizes the decision threshold parameter (binary class ) or misclassification cost parameters ( multiple classes) based on the posterior probabilities generated from Naive Bayesian model, so as to achieve the best decision criteria; moreover, to improve the generalization ability of classification on the imbalanced data, we design a adaptive random subspace ensemble classifier, which enhances the diversity between base classifiers with avoiding overfitting of learning and optimizing. Furthermore it can obtain the optimal amount of classifiers automatically. Exper- imental results demonstrate that the proposed method has a better advantage for imbalanced data learning in terms of accuracy and effi- ciency through a large number of UCI datasets.
出处 《小型微型计算机系统》 CSCD 北大核心 2014年第5期961-966,共6页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61001047)资助 中央高校基本科研业务费专项资金项目(N110618001)资助
关键词 不均衡数据分类 代价敏感学习 组合分类 随机子空间 imbalanced data classification cost sensitive learning ensemble classification random subspace method
  • 相关文献

参考文献5

二级参考文献42

  • 1Veropoulos K., Campbell C. and Crisfianini N. Controlling the Sensitivity of Support Vector Machines[A]. Proceedings of the 16^th International Joint Conference on Artificial Intelligence (IJCAI 1999) [C]. Stockholm, Sweden: IJCAI Press, 1999:55 - 60.
  • 2R. Akbani, S. Kwek and N. Japkowicz. Applying Support Vector Machines to Imbalanced Datasets [ A ]. Proceedings of the 15th European Conference on Machine Learning (ECML 2004) [ C]. Italy: Springer Press, 2004.39 - 50.
  • 3Yuan J., Li J., and Zhang B. Learning Concepts from Large Scale Imbalanced Data Sets using Support Ouster Machines [ A].Proceedings of the 14th annul ACM International Conference on Multimedia[ C ]. Santa Barbara: ACM Press, 2006. 441 - 450.
  • 4P. Kang and S. Cho. EUS SVMs: Ensemble of Under - Sampied SVMs for Data Imbalance Problems [A]. Proceedings of the 13^th International Conference on Neural Information Processing (ICONIP 2006) [C]. Hong Kong: Springer Press, 2006: 837 - 846.
  • 5T Imam, K M Ting, J Kamruzzaman. z - SVM: An SVM for Improved Classification of Imbalanced Data [ A ]. Proceedings of the 19th Australian Joint Conference on Artifical Intelligence (AJCAI 2006) [ C]. Hobart, Australia: Springer Press, 2006. 264 - 273.
  • 6Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W. P. Smote: Synthetic Minority Over-sampling Technique[ J]. Journal of Artificial Intelligence Research. (JAIR) ,2002,16:321 - 357.
  • 7Y. Liu,A.An,X.Huang. Boosting prediction accuracy on irn- balanced datasets with SVM ensembles[ A]. Proceedings of the 10th Pacific- Asia Conference on Knowledge Discovery and Data Mining ( PAKDD 2006) [ C ]. Singapore: Springer Press, 2006:107 - 118.
  • 8J T Kwok, I W Tsang. The Pre-image Problem in Kernel Methods [J]. IEEE. Transactions on Neural Networks,2004, 15(6) : 1517- 1525.
  • 9J C Crower. Adding a Point to Vector Diagrams In Multivariate Analysis [ J]. Biometrika, 1968,55 (3) : 582 - 585.
  • 10Wu Jun-jic, Xiong Hui, Wu Peng, et al. Local decomposition for ram class analysis [ C ]. Conference on Knowledge Discovery in Data,New York,2007:814-823.

共引文献87

同被引文献25

引证文献2

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部