摘要
针对信用评分中有标记样本获取难度大、成本高的问题,本文提出一种新的基于半监督支持向量机的信用评分模型。通过给未标记样本引入新的参数,使得模型无需满足随机缺失假设,具有良好的适用性。同时,在损失函数中加入半监督部分鼓励有标记样本和未标记样本系数的相似性,从而能够有效融合未标记样本信息,提升估计效果。此外,本文利用Group LASSO进行变量选择,可以充分利用组结构信息,筛选重要变量。通过数值模拟和一个信用卡风险违约预测实例数据证明了所提方法的可行性,以及在变量选择、系数估计和分类预测上的优良效果。
To address the problem of difficulty and high cost in obtaining labeled samples in credit scoring,a new credit scoring model is proposed based on semi-supervised support vector machines.By introducing new parameters to the unlabeled samples,the model need not satisfy the random missing assumption and has good applicability.Meanwhile,adding a semi-supervised part to the loss function encourages the similarity between the coefficients of labeled and unlabeled samples,which can effectively fuse the unlabeled sample information and improve the estimation effect.In addition,Group LASSO is used for variable selection,which can make full use of the group structure information and screen important variables.The feasibility of the proposed method and its excellent results in variable selection,coefficient estimation and classification prediction are demonstrated by numerical simulations and an example data of credit card risk default prediction.
作者
陈耸
于秀运
邱涌钦
方匡南
Chen Song;Yu Xiuyun;Qiu Yongqin;Fang Kuangnan(Mico-Finance College,Taizhou University,Taizhou 318000,China;School of Economics,Xiamen University,Xiamen 361005,China)
出处
《中国管理科学》
CSSCI
CSCD
北大核心
2024年第3期1-8,共8页
Chinese Journal of Management Science
基金
国家自然科学基金面上项目(72071169)
教育部人文社会科学研究青年基金项目(20YJC910004)
中央高校基本科研业务专项资金(20720231060)。
关键词
半监督分类
支持向量机
变量选择
信用评分
semi-supervised classification
support vector machines
variable selection
credit scoring