期刊文献+

基于“多层次分类”方法的异常P2P网贷借款识别 被引量:8

Detecting anomaly ioans on P2P lending platform:Based on hierarchical classification method
下载PDF
导出
摘要 随着互联网技术的发展,P2P网络借贷的用户与数据量与日俱增。识别出异常的借款标的,促进平台的健康发展一直是社会关注的热点与焦点。针对这一问题,本文提出了"多层次分类"方法,以lending club发布的交易数据为研究对象,分层次进行数据分析。在第一层次,首先采用基于密度的DBSCAN聚类算法,排除大量正常用户,减弱数据中正负两类分布不均衡的缺陷;在第二层次,采用一般分类算法进行分类,最终识别出平台的异常借款标的。数值实验发现,将"多层次分类"方法应用在P2P网络借贷中,相比于其他方法,能在保证分类器整体性能的情况下,更有效地识别出异常还款的借款标的。 With the development of information technology in recent years, financial service intermediaries have entered into the Internet era. As the most popular innovative business model of Internet finance, online peer-to-peer(P2P) lending has attractedincreased attention from diverse sections. The risk and safety are the main concerns in online P2 P lending industry. Apart from the risks from P2 P platforms themselves, risks arise from delinquent loans. Borrowers of these loans do not make their repayments on time and even default the loans, which lead to the loss of the lenders. Thus, it is essential to develop a model to detect these abnormal loans to protect lenders and platforms from risk. Based on the second-hand data of some P2 P platforms, several extant academic studies have investigated the risk issue by using methods including statistical approaches(e.g., logistic regression) and data mining approaches(e.g., classification). However, in online P2 P lending, the distribution of positive(abnormal loans) and negative(normal loans) samples is often imbalanced. Normal loans are the majority, while abnormal loans only account for a small percentage of loans. According to the data of the second quarter in 2016 from lending club, only 12.55% of loans are abnormal loans. To address this problem, we propose a hierarchical classification method in this paper. In different hierarchies, according to various characteristics of data set, the new model processes and analyzes data using different methods. In the first level, the unsupervised clustering method DBSCAN is used to fill outsome negative samples(normal loans) so that the distribution of positive and negative samples can be more balanced. In the second level, supervised classification methods, such as random forest and J48 decision tree, are used to perform classifications of the samples thatare filtered from the first hierarchy. Given the data of lending club, experiments were conducted in severalmodelsto detect abnormal loans, including four traditional classification methods(i.e., J48 decision tree, logistic, NU support vector machine, KNN, and random forest) and five hybrid models(i.e., DBSCAN + J48, DBSCAN + random forest, DBSCAN + logistic, DBSCAN + KNN, and DBSCAN + NU support vector machine). Besides, under-sampling and over-sampling methods were also compared in our experiments. The experiment results reveal that the hierarchical classification method can increase recall and decrease false negative ratesmore effectively than the traditional methods. To sum up, in online P2 P lending field, detecting abnormal loans that do not repay on time in an effective way is important for the P2 P platforms. On one hand, our study proposesa novel hierarchical classification method from academic perspective. This new hybrid method can detect abnormal loans more effectively. On the other hand, the findings in our study will have practical implications for P2 P lending platforms. The findings can help regulate those targeted loans thatare detected by the proposed method.
作者 罗钦芳 丁国维 傅馨 蔡舜 陈熹 LUO Qin-fang DING Guo-wei FU Xin CAI Shun CHEN Xi(School of Management, Xiamen University, Xiamen 361005, China School of Management, Zhejiang University, Hangzhou 310058, China)
出处 《管理工程学报》 CSSCI CSCD 北大核心 2017年第3期201-209,共9页 Journal of Industrial Engineering and Engineering Management
基金 国家自然科学基金资助项目(71572166) 国家自然科学基金资助项目(71372057) 国家自然科学基金资助项目(71301133) 厦门大学人文社科"校长基金-创新团队"基金资助项目(20720161044) 教育部人文社会科学基金资助项目(13YJC630033)
关键词 P2P网络借贷 异常检测 数据挖掘 多层次分类 Online P2P lending Anomaly detection Data mining Hierarchicalclassification
  • 相关文献

参考文献9

二级参考文献133

  • 1李钧.P2P借贷:性质、风险与监管[J].金融发展评论,2013(3):35-50. 被引量:84
  • 2陈友,程学旗,李洋,戴磊.基于特征选择的轻量级入侵检测系统[J].软件学报,2007,18(7):1639-1651. 被引量:78
  • 3陈鸣钊 张志烈.模糊数学及其实用[M].南京:河海大学出版社,1993.176-178.
  • 4VAPNIK V. The nature of statistical learning theory [ M ]. Springer-Verlag, NY, 2000 : 138-167.
  • 5IMAM T, TING K M, KANMRUZZAMAN J. z-SVM: An SVM for improved classification of imbalanced data [A]. Australian Joint Conference on AI[C]. Hobart, Australia: Springer, 2006:264-273.
  • 6WU G, CHANG E. Class-boundary alignment for imbalanced dataset learning [ A ]. Workshop on learning from imbalanced data sets Ⅱ, ICML [ C ]. Washington, DC: AAAI Press, 2003:49-56.
  • 7CHAWLA N, BOWYER K, Hall L, et al. SMOTE: Synthetic minority over-sampling technique [ J ]. Journal of Artificial Intelligence Research, 2002,16( 1 ) :321-357.
  • 8KUBAT M, MATWIN S. Addressing the curse of imbalanced training sets: one-sided selection [ A ]. Proc. of the 14th International Conference on Machine Learning [ C ]. San Francisco, CA: Morgan Kaufmann 1997: 217-225.
  • 9CRISTIANINI N, KANDOLA J, ELISSEEFF A, et alJ. On kernel target alignment[ A]. Proceedings of the Neural Information Processing Systems [ C ]. Shanghai, China: The MIT Press, 2001:367-373.
  • 10VEROPOULOS K, CAMPBELL C, CRISTIANINI N. Controlling the sensitivity of support vector machines [ A ]. Proceedings of the International Joint Conference on AI [ C]. San Francisco, CA: Morgan Kaufmann, 1999:55-60.

共引文献576

引证文献8

二级引证文献35

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部