期刊文献+

基于不平衡样本的互联网个人信用评估研究 被引量:21

Internet Personal Credit Assessment Research Based on the Perspective of Unbalanced Sample
下载PDF
导出
摘要 国内互联网金融和消费信贷的迅猛发展,催生了互联网个人征信的巨大需求。针对不平衡的互联网征信数据,采用随机过抽样、随机欠抽样和SMOTE方法进行数据平衡化,并建立决策树、支持向量机和随机森林等分类模型对互联网个人信用评估进行研究,结果表明:互联网大数据背景下的个人信用评估研究具有可行性;过抽样方法可以较好地提高互联网个人信用评估模型的分类性能;构建信用等级较好用户的一般特质,即年龄在18~30岁之间、工资水平在2 000元以上、用户页面浏览量多集中在10~20次之间和申请贷款时间相对较早等。在对互联网个人信用评估中变量有效性进行探索的基础上,反驳了"采用的变量越多结果就越准确"的说法。 With the rapid development of the internet financeand consumer credit,it has given rise to the huge demand for internet personal credit reporting.Based on imbalanced of internet credit reporting data,we used the over-sampling,under-sampling and SMOTE,then established the decision tree and support vector machine and random forest model,selected F-measure and AUC value to evaluate the models and digs out the general feature of high credit rating.Our results found that the credit assessment is feasible under the background of the internet big data,and the over-sampling method improves the classification of the model.We found that the general feature of high credit rating is the age-group of 18-30,the wage levels range from more than 2,000 yuan per month,10-20 times page views and loan early.Under the variable effectiveness research,we effectively avoid variable involving user privacy information.
出处 《统计与信息论坛》 CSSCI 北大核心 2017年第2期84-90,共7页 Journal of Statistics and Information
基金 全国统计科学重点研究课题<基于移动通信大数据的流动人口精细化挖掘研究>(2015433) 山西省高等学校创新人才支持计划资助项目(晋教科〔2016〕3号)
关键词 互联网征信 不平衡样本 重抽样 随机森林 internet credit reporting imbalanced data resampling random forest
  • 相关文献

参考文献2

二级参考文献29

  • 1林舒杨,李翠华,江弋,林琛,邹权.不平衡数据的降采样方法研究[J].计算机研究与发展,2011,48(S3):47-53. 被引量:31
  • 2刘开瑞.财务预警分析指标[J].生产力研究,2007(4):138-141. 被引量:21
  • 3陈封能,斯坦巴赫,库玛尔.数据挖掘导论[M].范明,范宏建,等译.北京:人民邮电出版社,2011.
  • 4Tibshirani tL Regression Shrinkage and Selection Via the Lasso l-J]. Journal of the Royal Statistical Society (Series B), 1996(1).
  • 5Li Y, Qin Y, Xie Y, Tian F. Grouped Penalization Estimation of Osteoporosis Data in Traditional Chinese Medicine I-J]. Journal of Applied Statistics, 2013(4).
  • 6Chawla N V, t3owyer K W, Hall L O, Kegelmeyer W P. SMOTE: Synthetic Minority Over-Sampling Technique [J]. Journal of Artificial Intelligence Research,2002(16).
  • 7Ma S, Huang J. Regularized ROC Method for Disease Classification and Biomarker Selection with Mieroarray Data [J]. Bioinformaties, 2005(24).
  • 8Song X, Ma S. Penalized Variable Selection with U-Estimates [J]. Journal of Nonparametric Statistics, 2010(4).
  • 9Ma S, Huang J. Combining Multiple Markers for Classification Using ROC [J]. Biometrics, 2007(3).
  • 10Zhang C. Nearly Unbiased Variable Selection Under Minimax Concave Penalty [J]. The Annals of Statistics, 2010(2).

共引文献44

同被引文献216

引证文献21

二级引证文献104

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部