摘要
大多数非均衡数据集的研究集中于纯重构数据集或者纯代价敏感学习,本文针对数据集类分布非均衡和不相等误分类代价往往同时发生这一事实,提出了一种以最小误分类代价为目标的基于混合重取样的代价敏感学习算法。该算法将两种不同类型解决方案有机地融合在一起,先用样本类空间重构的方法使原始数据集的两类数据达到基本均衡,然后再引入代价敏感学习算法进行分类,能提高少数类分类精度,同时有效降低总的误分类代价。实验结果验证了该算法在处理非均衡类问题时比传统算法要优越。
Most studies on the imbalanced data set classification focus on the discussion of re-sam- pling or cost-sensitive learning systems themselves however, the fact that the costs of imbalanced class distribution and unequal misclassification errors always occur simultaneously is neglected. We propose a novel cost sensitive learning (CSL) algorithm which combines the methods of re-sampling and the CSL techniques together in order to solve the misclassification problem of imbalanced data set. On one hand, the re-sampling technique allows the balanced data sets by reconstructing both the majority and the mi- nority class. On the other hand, the classification is performed based on the minimal misclassification cost but not the maximal accuracy. Here the misclassification cost for the minority class is much higher than the misclassification cost for the majority class. A cost-sensitive learning procedure is then conduc- ted for classification. The experimental results show that the proposed method can improve the classifi- cation accuracy and decrease the misclassification cost effectively, and the algorithm is superior to the traditional algorithms as for dealing with the imbalanced problem.
出处
《计算机工程与科学》
CSCD
北大核心
2011年第9期130-135,共6页
Computer Engineering & Science
基金
国家自然科学基金资助项目(61075063)
国家863计划资助项目(2009AA12Z117)
湖北省自然科学基金资助项目(2010CDB05201)
湖北省教育厅中青年项目(Q20112604)
关键词
分类
非均衡数据集
混合重取样
代价敏感学习
classification
imbalanced dataset
hybrid re-sampling
cost sensitive learning