期刊文献+

不平衡数据分类研究及在银行营销中的应用

Research on Imbalanced Data Classification and Its Application in Bank Marketing
下载PDF
导出
摘要 本文对不平衡数据分类问题进行了研究,从数据、算法、评价指标三个层面介绍了不平衡数据分类的解决办法,并将其应用于银行营销中的客户购买定期存款预测任务,构建了逻辑回归、决策树、SVM、随机森林和GBDT预测模型。其中,设置balanced后的SVM模型在召回率(查全率)上表现最佳,基本可以识别大部分的目标客户,即选择购买定期存款的客户。而采用SMOTE重采样后的GBDT模型在召回率(查全率)、F1分数和ROC_AUC评价指标上的综合表现最好。将训练好的预测模型应用于银行营销中,可以帮助银行识别目标客户群体,进行精准营销,提高营销的成功率。 The imbalanced data classification problem is studied in this paper. This paper introduces the solu- tion of this problem from three levels including data, algorithms and evaluation. The research is applied in bank marketing to forecast whether customers will buy deposit. It builds logistic regression, decision tree, SVM, random forests and GBDT prediction model. Among them, the SVM model with ' balanced' performs best in the recall ratio, which can basically identify the majority of target customers. The GBDT model after SMOTE resembling shows the best comprehensive performance in recall rate, F1 score and ROC_AUC. The research can help banks to identify target customer groups for accurate marketing and improve the marketing success rate.
作者 季晨雨 Ji Chenyu,(Beijing Institute of Satellite Inormation Engineering, BeijinglO0000, China)
出处 《山西电子技术》 2018年第5期55-57,69,共4页 Shanxi Electronic Technology
关键词 分类 不平衡数据 银行营销 classification imbalanced data bank marketing
  • 相关文献

参考文献3

二级参考文献73

  • 1WU Xin-dong,KUMAR V,QUINLAN J R,et al.Top 10 algorithms in data mining[J].Knowledge and Information Systems,2008,14(1):1-37.
  • 2CHAWLA N V,JAPKOWICZ N,KOTCZ A.Editorial:special issue on learning from imbalanced data sets[J].ACM SIGKDD Explorations Newsletter,2004,6(1):1-6.
  • 3HE Hai-bo,GARCIA E A.Learning from imbalanced data[J].IEEE Trans on Knowledge and Data Engineering,2009,21(9):1263-1284.
  • 4TING K M.A comparative study of cost-sensitive boosting algorithms[C]//Proc of the 17th International Conference on Machine Learning.2000:983-990.
  • 5FAN Wei,STOLFO S J,ZHANG Jun-xin,et al.AdaCost:misclassification cost-sensitive boosting[C]//Proc of the 16th International Conference on Machine Learning.1999:97-105.
  • 6SUN Yan-min,KAMEL M S,WONG A K C,et al.Cost-sensitive boosting for classification of imbalanced data[J].Pattern Recognition,2007,40(12):3358-3378.
  • 7GALAR M,FERNNDEZ A,BARRENCHEA E,et al.EUSBoost:enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling[J].Pattern Recognition,2013,46(12):3460-3471.
  • 8JOSHI M V,KUMAR V,AGARWAL R C.Evaluating boosting algorithms to classify rare classes:comparison and improvements[C]//Proc of IEEE International Conference on Data Mining.Washington DC:IEEE Computer Society,2001:257-264.
  • 9GUO Hong-yu,VIKTOR H L.Learning from imbalanced data sets with boosting and data generation:the DataBoost-IM approach[J].SIGKDD Exploration Newsletter,2004,6(1):30-39.
  • 10FREUND Y,SCHAPIRE R.A desicion-theoretic generalization of on-line learning and an application to boosting[J].Journal of Computer & System Sciences,1997,55(1):119-139.

共引文献90

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部