摘要
传统信用评分方法主要利用统计分类方法,只能预测借款人是否会发生违约,但不能预测违约发生的时点。治愈率模型是二分类和生存分析的混合模型,不仅可以预测是否会发生违约,而且可以预测违约发生的时点,比传统二分类方法可以提供更多的信息。另外,随着大数据的发展,数据源越来越多,针对相同或者相似任务,可以收集到多个数据集,本文提出了融合多源数据的整合治愈率模型,可以对多个数据集同时建模和估计参数,通过复合惩罚函数进行组间和组内双层变量选择,并通过促进两个子模型回归系数符号相同,提高模型的可解释性。通过数值模拟发现,所提方法在变量选择和参数估计上均有明显优势。最后,将所提方法应用于信用贷款的违约时点预测中,模型表现良好。
Traditional credit scoring method,based on statistical classification,can only predict whether an applicant will default in the future,but cannot predict when he is likely to default.The cure rate model,which incorporates two submodels—binary classification and survival model,can predict not only whether a default will occur but also when it will occur.Furthermore,with the development of big data,more and more data sources have emerged.One can collect multiple data sets for the same or similar tasks.Motivated by this,an integrative cure rate model with multi-source data has been proposed in this paper,which can simultaneously model on multiple datasets and estimate parameters.Composite penalty function is adopted to select important groups as well as important members of those groups.Similarity in signs of two submodels’coefficients is promoted to improve the interpretability of the model.Numerical simulation shows the obvious advantages of our proposal in both variable selection and parameter estimation.Finally,the proposed method is applied to the default point prediction and performs well.
作者
范新妍
方匡南
郑陈璐
张志远
Fan Xinyan;Fang Kuangnan;Zheng Chenlu;Zhang Zhiyuan
出处
《统计研究》
CSSCI
北大核心
2021年第2期99-113,共15页
Statistical Research
基金
教育部人文社会科学研究青年基金项目“基于半监督学习的消费金融风控方法与应用研究”(20YJC910004)
全国统计科学研究重大项目“多源数据融合的无监督学习方法及其应用”(2019LD02)
国家自然科学基金面上项目“基于多源信息融合的高维分类方法及其在信用评分中的应用”(72071169)
关键词
多源数据
整合治愈率模型
违约日期预测
信用评分
Multi-source Data
Integrative Cure Rate Model
Default Point Prediction
Credit Scoring