期刊文献+

基于收缩近邻方法的征信缺失数据插补研究 被引量:5

Research on Method of Credit Missing Data Imputation Based on Compress and Proximity
原文传递
导出
摘要 在海量征信数据的背景下,为降低缺失数据插补的计算成本,提出收缩近邻插补方法.收缩近邻方法通过三阶段完成数据插补,第一阶段基于样本和变量的缺失比例计算入样概率,通过不等概抽样完成数据的收缩,第二阶段基于样本间距离,选取与缺失样本近邻的样本组成训练集,第三阶段建立随机森林模型进行迭代插补.利用Australian数据集和中国各银行数据集进行模拟研究,结果表明在确保一定插补精度的情况下,收缩近邻方法较大程度减少了计算量. Massive credit data with large amount of samples and high dimensions pose serious problems of computational efficiency. This paper proposes a new missing data im- putation method ,called compress and proximity to tackle the problem. This method first compress the data through unequal probability sampling based on the proportion of missing data of samples and variables ,then select the samples which proximity to incomplete samples to compose training data based on distance, last built the Random forest model to interpo- late missing data by iterative. Australian credit scoring datasets and Chinese banks credit scoring datasets were selected for our simulation. Results show that our method reduced the computational load without decreasing too much accuracy of imputation.
出处 《数学的实践与认识》 北大核心 2017年第8期147-153,共7页 Mathematics in Practice and Theory
基金 教育部人文社会科学重点研究基地重大项目(15JJD910002)
关键词 征信数据 缺失插补 样本距离 随机森林 credit data imputation sample distance random forest
  • 相关文献

参考文献1

二级参考文献36

  • 1陈雷.国际信用卡欺诈与预防[J].中国信用卡,2004(6):43-47. 被引量:11
  • 2吴冲,吕静杰,潘启树,刘云焘.基于模糊神经网络的商业银行信用风险评估模型研究[J].系统工程理论与实践,2004,24(11):1-8. 被引量:50
  • 3夏桂梅,曾建潮.一种基于轮盘赌选择遗传算法的随机微粒群算法[J].计算机工程与科学,2007,29(6):51-54. 被引量:27
  • 4Sustersic M, Mramor D, Zupan J. Consumer credit scoring models with limited data[ J]. Expert Systems with Applications, 2009, 36(3): 4736-4744.
  • 5Panigrahi S, Kundu A, Sural S, et al. Credit card fraud detection: A fusion approach using Dempster-Shafer theory and Bayesian learning[ J]. Information Fusion, 2009, 10(4) : 354-363.
  • 6Desai V S, Crook J N, Overstreet G A. A comparison of neural networks and linear scoring models in the credit union envi- ronment[ J]- European Journal of Operational Research, 1996, 95(1) : 24-37.
  • 7Chen F L, Li F C. Combination of feature selection approaches with SVM in credit scoring[ J ]. Expert Systems with Applica- tions, 2010, 37(7):4902-4909.
  • 8Hand D J, Henley W E. Statistical classification methods in consumer credit scoring : A review [ J ]. Journal of the RoyalStatistical Society: Series A (Statistics in Society), 1997, 160(3) : 523-541.
  • 9Kim J, Hwang K J, Bae J K. Prediction of personal credit rates with incomplete data sets using cognitive mapping [ C ]// IEEE Computer Society Washington, 2007: 1912-1917.
  • 10Rubin D B. Multiple Imputations for Nonresponse in Surveys[ M]. New York: John Wiley and Sons, 1987.

共引文献28

同被引文献50

引证文献5

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部