摘要
随着移动互联网的快速发展,在线借贷(peer to peer,P2P)已成为一种快速增长的金融市场,吸引了众多用户。帮助贷款平台识别潜在的贷款违约风险已成为一项重大的挑战。但是,借款人信息的多源性和异质性使得传统的逻辑回归模型难以满足借贷平台的需求,样本数据不均衡特性使得逻辑(Logistic)回归和线性判别分析等模型难以获得良好的表现。鉴于这些特征,本文提出了一种基于神经网络的潜在因子交互作用模型。首先,隐狄利克雷分布(latent Dirichlet allocation,LDA)主题模型提取借款请求和借款人的文本信息,同时使用嵌入(Embedding)操作来处理离散和非离散类型信息,通过这两个操作可以有效地解决多源性和异构性数据带来的挑战。其次,压缩交互网络(compressed interaction network,CIN)和深度神经网络(deep neural network,DNN)模型用于转换第一步生成的结果。最后,用均衡系数来解决不平衡样本引起的问题。本文在真实世界数据集上系统地评估了模型的性能,实验结果证明了本文解决方案的有效性。
With the rapid development of mobile internet, peer-to-peer(P2P) online lending has become a fast-growing financial market and attracted a massive number of users. The task that to help lending platforms identify potential default risk of loans has become a major challenge. However, the multi-source and heterogeneous nature of borrower information make the traditional logistic regression model difficult to meet the needs. Unbalanced sample data makes it difficult for models such as logistic regression and linear discriminant analysis to obtain a good classifier through learning. In view of these characteristics, this paper proposes a latent factor interaction model based on neural network. Firstly, a latent dirichlet allocation technology is used to extract the textual information of borrowers and loans. At the same time, the categorical information is addressed by an Embedding operation. These two operations can effectively solve the challenges brought about by heterogeneous data from multiple sources. Then compressed interaction networkand deep neural network model are used to represent the result generated by first step. Finally, a balanced coefficient has been used to solve the problem caused by unbalanced samples. The model performance is systematically evaluated on a large-scale real-world data set in this paper. The experimental results demonstrate the effectiveness of our solution.
作者
明依东
肖迎元
MING Yidong;XIAO Yingyuan(School of Computer Science and Engineering,Tianjin University of Technology,Tianjin 300384,China)
出处
《天津理工大学学报》
2022年第2期1-7,共7页
Journal of Tianjin University of Technology
基金
国家自然科学基金(61170174)
国家自然科学基金重大项目(91646117)
天津市自然科学基金(17JCYBJC15200,16JCTPJC53600)。
关键词
潜在因子
特征交互
违约风险
神经网络
latent factor
feature interaction
default risk
neural network