期刊文献+

基于欠采样和代价敏感的不平衡数据分类算法 被引量:20

Classification algorithm based on undersampling and cost-sensitiveness for unbalanced data
下载PDF
导出
摘要 针对不平衡数据集中的少数类在传统分类器上预测精度低的问题,提出了一种基于欠采样和代价敏感的不平衡数据分类算法——USCBoost。首先在AdaBoost算法每次迭代训练基分类器之前对多数类样本按权重由大到小进行排序,根据样本权重选取与少数类样本数量相当的多数类样本;之后将采样后的多数类样本权重归一化并与少数类样本组成临时训练集训练基分类器;其次在权重更新阶段,赋予少数类更高的误分代价,使得少数类样本权重增加更快,并且多数类样本权重增加更慢。在10组UCI数据集上,将USCBoost与AdaBoost、AdaCost、RUSBoost进行对比实验。实验结果表明USCBoost在F1-measure和G-mean准则下分别在6组和9组数据集获得了最高的评价指标。可见所提算法在不平衡数据上具有更好的分类性能。 Focusing on the problem that the minority class in the unbalanced dataset has low prediction accuracy by traditional classifiers,an unbalanced data classification algorithm based on undersampling and cost-sensitiveness,called USCBoost(UnderSamples and Cost-sensitive Boosting),was proposed.Firstly,the majority class samples were sorted from large weight sample to small weight sample before base classifiers being trained by the AdaBoost(Adaptive Boosting)algorithm in each iteration,the majority class samples with the number equal to the number of minority class samples were selected according to sample weights,and the weights of majority class samples after sampling were normalized and a temporary training set was formed by these majority class samples and the minority class samples to train base classifiers.Secondly,in the weight update stage,higher misclassification cost was given to the minority class,which made the weights of minority class samples increase faster and the weights of majority class samples increase more slowly.On ten sets of UCI datasets,USCBoost was compared with AdaBoost,AdaCost(Cost-sensitive AdaBoosting),and RUSBoost(Random Under-Sampling Boosting).Experimental results show that USCBoost has the highest evaluation indexes on six sets and nine sets of datasets under the F1-measure and G-mean criteria respectively.The proposed algorithm has better classification performance on unbalanced data.
作者 王俊红 闫家荣 WANG Junhong;YAN Jiarong(School of Computer and Information Technology,Shanxi University,Taiyuan Shanxi 030006,China;Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education(Shanxi University),Taiyuan Shanxi 030006,China)
出处 《计算机应用》 CSCD 北大核心 2021年第1期48-52,共5页 journal of Computer Applications
基金 国家自然科学基金资助项目(61772323) 山西省自然科学基金资助项目(201701D121051)。
关键词 不平衡数据 分类 代价敏感 ADABOOST算法 欠采样 unbalanced data classification cost-sensitiveness AdaBoost algorithm undersampling
  • 相关文献

参考文献3

二级参考文献18

共引文献182

同被引文献159

引证文献20

二级引证文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部