期刊文献+

R环境下MICE填补方法在多变量缺失数据中的应用 被引量:2

Application of MICE in R for imputing incomplete multivariate data
下载PDF
导出
摘要 目的 简要介绍R 环境下MICE填补方法(Multivariate imputation by chained equations)的填补估算应用并评价其填补效果.方法以实际数据阐述填补估算流程,比较MICE与常见的缺失数据处理方法(删除法、均(众)数法、回归法)填补估算效果的差异.结果当数据缺失率为10%时,MICE与常见的缺失数据处理方法估算结果无明显差异,各填补方法的3种变量的回归系数估计的相对误差在10%左右.随着缺失率的增加(20%,40%),各方法回归系数估计的相对误差都增加,但MICE 3种变量的回归系数的相对误差稳定在10%~20%左右,MICE表现优于其他方法而且结果稳定,回归法次之,删除法和均(众)数法较差.当缺失率达50%时,3种类型的变量估算的误差已经较大,所有方法填补估算效果欠佳.结论 MICE较其他多重填补软件操作简便,与常见的缺失数据处理方法相比,可充分地利用缺失记录的信息,能较准确地反应调查的真实情况,值得在实际工作中推广应用. Objective To introduce briefly the basic R procedure of MICE( Multivariate imputation by chained equa- tions) in imputing incomplete multivariate data, and to assess the imputation effects of MICE. Methods Based on real data sets with missing variables and different missing rate, we introduce R procedure of MICE and compare the imputation results between MICE and common methods, including deletion method, conditional mean (mode) imputation method, and regression method. Results There are no obvious differences for these methods as missing rate is 10%, the relative error of three kinds of variables for all methods is around 10%. As the missing rate is increasing by between 20% to 40%, the relative error of parameter esti- mate is also increasing,but the relative error of three kinds of variables for MICE is around 10% -20%. MICE is superior to oth- er methods and has stable performance, and regression method is prefer to delete and mean (mode) method. Whereas missing rate is more than 50%, neither is appropriate. Conclusion MICE is attractive than other multiple imputation soft for its easy and simple usage. Compared with common methods, MICE provides better effects in higher missing rate and is worth using widely in incomplete multivariate data.
出处 《中国医院统计》 2011年第4期309-312,共4页 Chinese Journal of Hospital Statistics
关键词 MICE 多重填补 缺失数据 多变量分析 MICE Multivariate imputation Missing data Multivariable analysis
  • 相关文献

参考文献13

  • 1Rubin DB.Multiple imputation of non-Responses in surveys[M].New York:John Wiley & Sons,1987.
  • 2Rubin DB.Multiple imputation:a primer[J].Statistical Methods in Medical Research,1999,8(1):3-15.
  • 3Schafer JL,Maren K Olsen.Multiple Imputation for Multivariate Missing Date Problem[M].New York:Chapman & Hall,1997.
  • 4刘桂芬,冯志兰.缺失数据多重估算NORM软件应用[J].数理医药学杂志,2005,18(3):259-262. 被引量:3
  • 5欧春泉,陈平雁,黄浙明,何礼明.缺失值估计的专用软件SOLAS简介[J].数理医药学杂志,2006,19(3):305-308. 被引量:1
  • 6Stef van Buuren,Oudshoorn K.MICE:Multivariate Imputation by Chained Equations in R[J].Journal of Statistical Software,2010,7(16):1-68.
  • 7Stef van Buuren,Oudshoorn K.Multivariate Imputation by Chained Equations:MICE V1.0 User′s manual[J].TNO Prevention and Health,2000,7(3):1-39.
  • 8Schnoll RA,Rukstalis M,Wileyto EP,et al.Smoking Cessation Treatment by Primary Care Physicians[J].American Journal of Preventive Medicine,2006,31(3):233-239.
  • 9Stef van Buuren,Boshuizen HC,Reijneveld SA.Toward targeted hypertension screening guide lines[J].Medical Decision Making,2006,26(2):145-153.
  • 10Fernandes AS,Jarman IH,Terence A Etchells,et al.Missing data imputation in longitudinal cohort studies Application of PLANNARD in breast cancer survival[R].In Proceedings 7th International Conference on Machine Learning and Applications,ICMLA 2008,644-649.

二级参考文献20

  • 1James M Robins, Naisyin Wang.Inference far imputation estimators[J] .Biometrika, 2000; 87 (1): 113-124.
  • 2S Van Buuren, HC Boshuizen, DL Knook.Multiple imputation of missing blood pressure covariates in survival analysis [ J ] .Statistics in Medicine, 1999; 18:681-694.
  • 3MCMC Method of Arbitrary Missing Data [M] .SAS/STAT 9 User's Guide. North Carolina: SAS Institute Inc, 2002:159 - 169.
  • 4Arnold Alice M, Kronmal Richard A. Multiple Imputation of Baseline Data in the Cardiovascular Health Study [J] .American Journal of Epidemiology, 2003; 157 (1): 74-84.
  • 5Barzi Federica, Woodward, Mark. Imputation of Missing Values in Practice: Results firm Imputations of Serum Cholesterol in 28 Cohort Studies [J] .American Journal Epidemiology, 2004; 160 (1): 34- 35.
  • 6Patricia A Patrician. Focus on Research Methods Multiple Imputation for Missing Data [J] .Research in Nursing & Health, 2002; 25:76.
  • 7Amdd Alice M, Kronmal Richard A.Multiple Imputation of Baseline Data in the Cardiovascular Health Study [J] .American Journal of Epidemiology, 2003; 157 (1): 74-84.
  • 8Diane L Fairclough, Harriet F Peterson, Victor Chang.Why are missing quality of life data a problem in clinical trials of cancer therapy[J] .Statistics in Medicine, 1998; 17: 667-677.
  • 9RJA Little, DB Rubin.Statistical Analysis with Missing Data [M] .New York: John Wiley & Sons; 1987.
  • 10Abraham, W Todd, Russell, et al. Missing data: a review of current methods and applications in epidemiological research [ J ] .Current Opinionin Psychiatry, 2004; 17 (4): 315-321.

共引文献6

同被引文献9

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部