摘要
基于美国最大的P2P平台Lending Club 2019年的个人借款数据,尝试将深度学习方法引入个人信用风险评估领域,与集成学习构建串联结构的组合学习模型.具体做法是将深度神经网络的隐藏层作为“特征提取器”,将原始变量转换为更高层次的抽象特征后输入随机森林、XGBoost、LightGBM和CatBoost 4种集成学习模型进行训练.研究结果表明,4种集成学习模型之间差距不大,随机森林表现最好,LightGBM训练速度最快;稀疏自编码器相比深度神经网络和主成分分析更适合作为集成学习的特征提取器,尤其是对Boosting类模型效果的提升更为明显.
Based on personal loan data of 2019 from Lending Club which is the biggest P2 P platform in the USA as experiment samples, the emerging deep learning method is introduced into personal credit risk assessment together with ensemble learning to construct sequential-structure hybrid learning models. The specific practice is to take the hidden layer of deep neural network as feature extractors to transform original variables as higher level abstract features. Then extracted features are input into four kinds of ensemble learning models for training, including random forest, XGBoost, LightGBM and CatBoost. The conclusions are as follows: There is little difference among ensemble learning models, while random forest ranks highest and LightGBM runs fastest. Compared with Deep Neural Network(DNN) and Principal Component Analysis(PCA), Sparse Auto Encoder(SAE) is the most suitable feature extractor for ensemble learning, especially improving the Boosting class models.
作者
牛晓健
凌飞
NIU Xiaojian;LING Fei(School of Economics, Fudan University, Shanghai 200433, China)
出处
《复旦学报(自然科学版)》
CAS
CSCD
北大核心
2021年第6期703-719,共17页
Journal of Fudan University:Natural Science
基金
国家自然科学基金面上项目(71873039,71573051)。
关键词
个人信用
风险评估
组合学习
personal credit
risk assessment
hybrid learning