摘要
本文对不平衡数据分类问题进行了研究,从数据、算法、评价指标三个层面介绍了不平衡数据分类的解决办法,并将其应用于银行营销中的客户购买定期存款预测任务,构建了逻辑回归、决策树、SVM、随机森林和GBDT预测模型。其中,设置balanced后的SVM模型在召回率(查全率)上表现最佳,基本可以识别大部分的目标客户,即选择购买定期存款的客户。而采用SMOTE重采样后的GBDT模型在召回率(查全率)、F1分数和ROC_AUC评价指标上的综合表现最好。将训练好的预测模型应用于银行营销中,可以帮助银行识别目标客户群体,进行精准营销,提高营销的成功率。
The imbalanced data classification problem is studied in this paper. This paper introduces the solu- tion of this problem from three levels including data, algorithms and evaluation. The research is applied in bank marketing to forecast whether customers will buy deposit. It builds logistic regression, decision tree, SVM, random forests and GBDT prediction model. Among them, the SVM model with ' balanced' performs best in the recall ratio, which can basically identify the majority of target customers. The GBDT model after SMOTE resembling shows the best comprehensive performance in recall rate, F1 score and ROC_AUC. The research can help banks to identify target customer groups for accurate marketing and improve the marketing success rate.
作者
季晨雨
Ji Chenyu,(Beijing Institute of Satellite Inormation Engineering, BeijinglO0000, China)
出处
《山西电子技术》
2018年第5期55-57,69,共4页
Shanxi Electronic Technology
关键词
分类
不平衡数据
银行营销
classification
imbalanced data
bank marketing