期刊文献+

基于R软件的缺失数据MICE填补效果研究 被引量:6

A Study on Effects of Multivariate Imputation by Chained Equation Based on R Software
下载PDF
导出
摘要 目的研究不同缺失率、不同缺失机制下,MICE(multivariate imputation by chained equations)多重填补的效果,探讨该填补方法的适用情况。方法依托某现况调查的完全数据,使用R软件构造不同缺失率、不同缺失机制的缺失数据。计算列表删除和MICE多重填补后分析结果的标准偏倚,并进行比较。单独对分类变量计算多重填补后的平均错分率。结果在单变量缺失率分别为10%、20%和30%的随机缺失三种情况下,MICE多重填补表现优良;其他模拟情况下,MICE多重填补相比于列表删除并未表现出明显的优势。对于分类变量,MICE填补后的平均错分率均超过60%。结论对于随机缺失数据,且单变量缺失率不超过30%时,建议采用MICE多重填补进行处理;但对于资料中的分类变量,不建议直接引用MICE填补后的具体数值。 Objective To evaluate the effects of multivariate imputation by chained equations (MICE) for data with dif- ferent missing mechanisms and various missing proportions,and explore the application situations of this method. Methods A complete dataset from a cross-sectional study was used to simulate missing datasets with different missing mechanisms and vari- ous missing proportions by R software. The standard bias of the incomplete datasets obtained by listwise deletion was compared with that of the imputed datasets obtained by MICE. Additionally, for binomial variable, the average misclassification ratio was calculated. Results MICE performed well for "missing at random" data with the univariate missing proportion of 10% ,20% and 30%. In other scenarios, MICE failed to show advantage over listwise deletion. For binomial variable, the average misclassi- fication ratios were more than 60%. Conclusion When the data was missing at random and the univariate missing proportion was no more than 30% ,MICE was recommended to use,but the imputed value in binomial variable was not suggested to be re- presented in raw data directly.
出处 《中国卫生统计》 CSCD 北大核心 2015年第4期580-584,共5页 Chinese Journal of Health Statistics
基金 山东省科技发展计划(No.2014GGH218019)
关键词 MICE 缺失数据 模拟研究 多重填补 MICE Missing data Simulation Multiple imputation
  • 相关文献

参考文献20

二级参考文献46

  • 1胡红晓,谢佳,韩冰.缺失值处理方法比较研究[J].商场现代化,2007(05X):352-353. 被引量:18
  • 2殷杰,石锐.SAS中处理数据集缺失值方法的对比研究[J].计算机应用,2007,27(B06):438-439. 被引量:8
  • 3[1]Scheffe J. Dealing with missing data[J]. Res Lett Inf Math Sci,2002,3:153-156.
  • 4[3]Schafer JL, Olsen MK. Multiple imputation for multivariate missing-data problems: a data analyst's perspective[J]. Multivariate Behavioral Research,1998,33(4): 545-571.
  • 5[5]Darmawan I GN. NORM software review: handling missing values with multiple imputation methods[J]. Evaluat J Australastia, 2002,2(1): 51-57
  • 6[6]Bernards CA, Farmer MM, Qi K, et al. Comparison of two multiple imputation procedures in a cancer screening survey [J]. J Data Sci, 2003,1(1): 1-20.
  • 7Cios K J,Kurgan L A. Trends in Data Mining and Knowledge Discovery. In: Knowledge discovery in advanced information systems, Pal, N. R. , Jain, L. C. , Teoderesku N. eds. Springer,2002
  • 8H Liu,Motoda H. Feature Extraction, Construction and Selection: A Data Mining Perspective, Kluwer Academic, Boston:MA, 1998
  • 9Troyanskaya O, et al. Missing value estimation methods for DNA, Bioinformatics,2001. 520-525
  • 10Kantardzic M. Data Mining Concepts, Models, Methods and Algorithms, Wiley-IEEE Computer Society Pr, 2003

共引文献95

同被引文献43

引证文献6

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部