摘要
随着大数据技术在应用层面的日渐普及,机器学习、深度学习相关算法在金融风控行业的应用得到了积极的探索。本文基于开源的信用卡数据(该数据具有样本比例极度不平衡的特点),比较不同采样方法对类别不平衡数据分类结果的影响,并应用集成学习算法Stacking融合多个基分类器训练数据,得到更为稳健的分类模型,有效避免了过拟合现象的发生。
With the increasing popularity of big data technology at the application level, the application of machine learning and deep learning related algorithms in the financial risk control industry has been actively explored. Based on open source credit card data (the data has the characteristics of extremely unbalanced sample ratios), this paper compares the impact of different sampling meth-ods on the classification effect of different classification algorithms in the binary classification prob-lem of unbalanced data, and applies ensemble learning algorithm to fuse multiple base classifier training data. A more robust classification model is obtained, effectively avoiding the occurrence of overfitting.
出处
《数据挖掘》
2020年第4期254-260,共7页
Hans Journal of Data Mining