摘要
目的:针对信用评分样本类别不平衡问题,提出一种新的分类方法——合成少数类过采样技术-自适应增强-决策树(SMOTE-AdaBoost-DT)模型。方法:首先,利用SMOTE生成少数类样本,降低数据的不平衡性;其次,利用以DT为基分类器的AdaBoost算法对数据进行分类预测;最后,选取Kaggle平台上的信贷数据集进行实证检验。结果:以AUC和G-mean作为分类评价指标,SMOTE-AdaBoost-DT模型的AUC均值为89.19%,G-mean均值为89.09%,优于决策树、随机森林、AdaBoost和神经网络等算法,且指标的标准差最小。结论:本文提出的模型不仅能提高客户信用评分的准确度,而且可以提高模型的稳定性。
Aims:According to imbalanced classification,a new ensemble classification model is proposed,which integrates the synthetic minority oversampling technique(SMOTE)and the Adaptive Boosting algorithm(AdaBoost)cascading multiple Decision Trees(DT).Methods:Firstly,SMOTE was used to generate some minority samples to keep balanced data distribution.Secondly,the AdaBoost algorithm with multiple DTs was employed to predict the credit score.Finally,the credit dataset on Kaggle was used to test the effectiveness of our model.Results:The area under the curve(AUC)of the SMOTE-AdaBoost-DT model was 89.19%;and the G-mean was 89.09%.Both were better than other algorithms,including DT,Random Forest,AdaBoost and Backpropagation Neural Networks.Meanwhile,the standard deviation was the smallest.Conclusions:The proposed model is good and stable.
作者
赵佳丽
徐明江
吴增源
郑素丽
ZHAO Jiali;XU Mingjiang;WU Zengyuan;ZHENG Suli(College of Economics and Management,China Jiliang University,Hangzhou 310018,China;Hangzhou Qiandao Lake Development Group Co.,Ltd.,Hangzhou 311799,China)
出处
《中国计量大学学报》
2021年第4期549-554,共6页
Journal of China University of Metrology
基金
国家自然科学基金项目(No.71572187)
浙江省自然科学基金项目(No.LY20G010008)。