摘要
提出了一种自适应代价优化算法ACO,利用'登山式'方法查找最适合重采样数据子集的最优误分类代价值用于建立基分类器,克服了固定式误分类代价不尽科学和客观的缺点,利用重采样技术实现了数据集样本不足时的分类器训练.通过'投票'方式对原始数据集中的实例重新标记类标,学习得到一个适应于类分布不均衡数据集的自适应的集成分类器.实验证明,用自适应代价优化算法实现的分类器在类分布不均衡的数据集上的分类性能明显优于CSC,MetaCost和naive Bayes等建立的分类器.
A novel adaptive cost-sensitive classification algorithm, named ACO(adaptive cost optimization) ,is proposed,in which the "hill-climbing" method is used to search the optimal misclassification cost in a feasible misclassification cost space to build base classifiers on each sub-dataset. The re-sam- pling technique was adopted to train cost-insensitive classifiers when confronted with insufficient learning examples. By relabeling the labels of the original data sets, the ACO algorithm learns adaptive cost-sensitive composite the self-adapted classifiers to class imbalanced data sets. We empirically evaluateed ACO with CSC, MetaCost and na'/'ve Bayes on some common class imbalanced data sets, and find the performance of ACO significantly outperform the others.
出处
《华中科技大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2010年第10期5-8,共4页
Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金
中国国际科技合作基金资助项目(2009DFA12290)
关键词
机器学习
代价敏感
误分类代价
优化
集成分类器
machine learning
cost-sensitive
misclassification cost
optimization
ensemble classifier