摘要
互联网金融的快速发展,使得P2P成为一种创新的金融模式,如何识别出网贷中的潜在风险成为研究热点。网贷交易数据常常存在严重的不平衡,导致风险识别率较低。针对这一问题,文中采用随机下采样、SMOTE和Bagging方法进行类平衡处理,利用逻辑回归和支持向量分类机进行检验评价。实验表明,在P2P风险识别中,以召回率为标准,bagging的平衡处理效果优于随机下采样与SMOTE,且逻辑回归不存在明显的过拟合,所以其他SVC更适合用于P2P逾期风险识别。
The rapid development of Internet finance makes the P2P network loan as an innovative financing method for SMEs and individuals,therefore,how to identify the potential risks becomes a hot issue.However,due to the existence of serious imbalance between the overdue and non-overdue samples,the overdue recognition rate is low.To solve this problem,the paper used random undersampling,SMOTE and Bagging to pre-process the data,and then compared the result by using Logistic Regression(LR)and Support Vector Classification Machine(SVC).The empirical results show that the balancing effect of Bagging is better than random undersampling and SMOTE in P2P overdue loan recognition.In addition,LR is more suitable for P2P overdue loan recognition than SVC for not existing obvious over-fitting.
作者
刘华玲
林蓓
恽文婧
丁宇杰
LIU Hua-ling;LIN Bei;YUN Wen-jing;DING Yu-jie(School of Statistics and Information,Shanghai University of International Business and Economics,Shanghai 201620,China;School of Information Management and Engineering,Shanghai University of Finance and Economics,Shanghai 200433,China)
出处
《计算机科学》
CSCD
北大核心
2019年第S11期595-598,608,共5页
Computer Science
基金
上海市哲学社会科学规划课题(2018BJB023)
国家社会科学重大课题(16ZDA055)资助
关键词
类不平衡
逾期识别
集成学习
重采样
Class imbalance
Overdue loan recognition
Ensemble learning
Resampling