摘要
随着互联网技术的发展,P2P网络借贷的用户与数据量与日俱增。识别出异常的借款标的,促进平台的健康发展一直是社会关注的热点与焦点。针对这一问题,本文提出了"多层次分类"方法,以lending club发布的交易数据为研究对象,分层次进行数据分析。在第一层次,首先采用基于密度的DBSCAN聚类算法,排除大量正常用户,减弱数据中正负两类分布不均衡的缺陷;在第二层次,采用一般分类算法进行分类,最终识别出平台的异常借款标的。数值实验发现,将"多层次分类"方法应用在P2P网络借贷中,相比于其他方法,能在保证分类器整体性能的情况下,更有效地识别出异常还款的借款标的。
With the development of information technology in recent years, financial service intermediaries have entered into the Internet era. As the most popular innovative business model of Internet finance, online peer-to-peer(P2P) lending has attractedincreased attention from diverse sections. The risk and safety are the main concerns in online P2 P lending industry. Apart from the risks from P2 P platforms themselves, risks arise from delinquent loans. Borrowers of these loans do not make their repayments on time and even default the loans, which lead to the loss of the lenders. Thus, it is essential to develop a model to detect these abnormal loans to protect lenders and platforms from risk. Based on the second-hand data of some P2 P platforms, several extant academic studies have investigated the risk issue by using methods including statistical approaches(e.g., logistic regression) and data mining approaches(e.g., classification). However, in online P2 P lending, the distribution of positive(abnormal loans) and negative(normal loans) samples is often imbalanced. Normal loans are the majority, while abnormal loans only account for a small percentage of loans. According to the data of the second quarter in 2016 from lending club, only 12.55% of loans are abnormal loans. To address this problem, we propose a hierarchical classification method in this paper. In different hierarchies, according to various characteristics of data set, the new model processes and analyzes data using different methods. In the first level, the unsupervised clustering method DBSCAN is used to fill outsome negative samples(normal loans) so that the distribution of positive and negative samples can be more balanced. In the second level, supervised classification methods, such as random forest and J48 decision tree, are used to perform classifications of the samples thatare filtered from the first hierarchy. Given the data of lending club, experiments were conducted in severalmodelsto detect abnormal loans, including four traditional classification methods(i.e., J48 decision tree, logistic, NU support vector machine, KNN, and random forest) and five hybrid models(i.e., DBSCAN + J48, DBSCAN + random forest, DBSCAN + logistic, DBSCAN + KNN, and DBSCAN + NU support vector machine). Besides, under-sampling and over-sampling methods were also compared in our experiments. The experiment results reveal that the hierarchical classification method can increase recall and decrease false negative ratesmore effectively than the traditional methods. To sum up, in online P2 P lending field, detecting abnormal loans that do not repay on time in an effective way is important for the P2 P platforms. On one hand, our study proposesa novel hierarchical classification method from academic perspective. This new hybrid method can detect abnormal loans more effectively. On the other hand, the findings in our study will have practical implications for P2 P lending platforms. The findings can help regulate those targeted loans thatare detected by the proposed method.
作者
罗钦芳
丁国维
傅馨
蔡舜
陈熹
LUO Qin-fang DING Guo-wei FU Xin CAI Shun CHEN Xi(School of Management, Xiamen University, Xiamen 361005, China School of Management, Zhejiang University, Hangzhou 310058, China)
出处
《管理工程学报》
CSSCI
CSCD
北大核心
2017年第3期201-209,共9页
Journal of Industrial Engineering and Engineering Management
基金
国家自然科学基金资助项目(71572166)
国家自然科学基金资助项目(71372057)
国家自然科学基金资助项目(71301133)
厦门大学人文社科"校长基金-创新团队"基金资助项目(20720161044)
教育部人文社会科学基金资助项目(13YJC630033)
关键词
P2P网络借贷
异常检测
数据挖掘
多层次分类
Online P2P lending
Anomaly detection
Data mining
Hierarchicalclassification