摘要
随着互联网金融的不断发展,P2P网络借贷的借贷人违约风险识别引起金融机构的重点关注,且随着互联网金融整改措施的实施,借贷违约量不断减少,因此在这P2P网络借贷历史违约数据不断减少的环境下,基于不均衡数据的违约预警分析显得尤为重要。本文在BSL不均衡样本抽样算法的基础上,通过Kmeans聚类算法降低抽样时间复杂度,并使用随机森林与其他机器学习分类算法进行对比实验,同时加入借款描述与借款标题的文本分析,最终建立了基于随机森林的P2P网络借贷违约预警模型来实现对于数据不均衡的P2P借贷违约风险识别。在满足高效率、高识别率的同时,满足了增量学习的现实需求,为P2P网络借贷平台提供一定的监管指导意见。
With the continuous development of Internet finance, the identification of borrowers’ default risk of peer-to-peer (P2P) lending has attracted the attention of financial institutions, and with the imple-mentation of Internet finance rectification measures, the amount of loan defaults has been de-creasing. Therefore, under the environment of decreasing historical default data in P2P lending, default warning analysis based on unbalanced data is particularly important. In this paper, based on BSL unbalanced sample sampling algorithm, K-means clustering algorithm is used to reduce the complexity of sampling time, and random forest is used to compare with other machine learning classification algorithms. At the same time, text analysis of loan description and loan title is added. Finally, a peer-to-peer lending default warning model based on random forest is established to identify P2P loan default risk of unbalanced data. It not only meets the needs of high efficiency and high recognition rate, but also meets the practical needs of incremental learning, and provides cer-tain supervision guidance for peer-to-peer lending platform.
出处
《金融》
2020年第5期455-464,共10页
Finance