摘要
用4种过采样算法,并结合1种样本过滤算法,对12份类别不平衡程度不同的数据进行类别平衡处理,对平衡后的数据与不平衡的数据使用xgboost算法建立分类器,并对各数据集上的分类效果进行综合比较,为提升机器学习分类器在类别不平衡问题上的性能提供参考。
In this paper, four oversampling algorithms are used for 12 data with different degree of category imbalance, and combined with a sample filtering algorithm for category balancing. XGBoost algorithm is also used to build classifier for balanced data and unbalanced data. In addition,a comprehensive comparison of the classification effect on each data set is made, which provides some positive suggestions for improving the performance of machine learning classifier on the problem of class imbalance.
作者
刘亚明
杨策平
LIU Yarning;YANG Ccping(School of Science,Hubei Univ.of Tech.,Wuhan 430068,China)
出处
《湖北工业大学学报》
2018年第5期111-115,共5页
Journal of Hubei University of Technology
关键词
类别不平衡
过采样
样本过滤
机器学习
class imbalance
oversampling
sample filtering
machine learning