Multiple Imputation of Missing Data:A Simulation Study on a Binary Response

下载PDF

导出

摘要 Currently, a growing number of programs become available in statistical software for multiple imputation of missing values. Among others, two algorithms are mainly implemented: Expectation Maximization (EM) and Multiple Imputation by Chained Equations (MICE). They have been shown to work well in large samples or when only small proportions of missing data are to be imputed. However, some researchers have begun to impute large proportions of missing data or to apply the method to small samples. A simulation was performed using MICE on datasets with 50, 100 or 200 cases and four or eleven variables. A varying proportion of data (3% - 63%) was set as missing completely at random and subsequently substituted using multiple imputation by chained equations. In a logistic regression model, four coefficients, i.e. non-zero and zero main effects as well as non-zero and zero interaction effects were examined. Estimations of all main and interaction effects were unbiased. There was a considerable variance in the estimates, increasing with the proportion of missing data and decreasing with sample size. The imputation of missing data by chained equations is a useful tool for imputing small to moderate proportions of missing data. The method has its limits, however. In small samples, there are considerable random errors for all effects. Currently, a growing number of programs become available in statistical software for multiple imputation of missing values. Among others, two algorithms are mainly implemented: Expectation Maximization (EM) and Multiple Imputation by Chained Equations (MICE). They have been shown to work well in large samples or when only small proportions of missing data are to be imputed. However, some researchers have begun to impute large proportions of missing data or to apply the method to small samples. A simulation was performed using MICE on datasets with 50, 100 or 200 cases and four or eleven variables. A varying proportion of data (3% - 63%) was set as missing completely at random and subsequently substituted using multiple imputation by chained equations. In a logistic regression model, four coefficients, i.e. non-zero and zero main effects as well as non-zero and zero interaction effects were examined. Estimations of all main and interaction effects were unbiased. There was a considerable variance in the estimates, increasing with the proportion of missing data and decreasing with sample size. The imputation of missing data by chained equations is a useful tool for imputing small to moderate proportions of missing data. The method has its limits, however. In small samples, there are considerable random errors for all effects.

作者 Jochen Hardt Max Herke Tamara Brian Wilfried Laubach

机构地区 Medizinische Psychologie und Medizinische Soziologie

出处《Open Journal of Statistics》 2013年第5期370-378,共9页 统计学期刊（英文）

基金 supported by the Stiftung Rheinland-Pfalz fur Innovation(959).

关键词 Multiple Imputation Chained Equation Large Proportion Missing Main Effect Interaction Effect

分类号 R73 [医药卫生—肿瘤]

引文网络
相关文献

1Nuri H. Salem Badi.Asymptomatic Distribution of Goodness-of-Fit Tests in Logistic Regression Model[J].Open Journal of Statistics,2017,7(3):434-445.
2H. P. R. R. Pathirana,N. Varathan.Statistical Analysis of Variables Influencing Type of Birth in Sri Lanka: A Logistic Regression Approach[J].Open Journal of Statistics,2018,8(2):317-326.
3Xiaoran Xie,Jingjing Wu.Some Improvement on Convergence Rates of Kernel Density Estimator[J].Applied Mathematics,2014,5(11):1684-1696. 被引量：1
4Ann Gaba.Recent Studies on Nutrition and Parkinson’s Disease Prevention: A Systematic Review[J].Open Journal of Preventive Medicine,2015,5(5):197-205. 被引量：3
5Nazzareno Diodato,Libera Esposito,Gianni Bellocchi,Luisa Vernacchia,Francesco Fiorillo,Francesco Maria Guadagno.Assessment of the Spatial Uncertainty of Nitrates in the Aquifers of the Campania Plain (Italy)[J].American Journal of Climate Change,2013,2(2):128-137.
6Irene García Camacha Gutiérrez,Raúl Martín Martín.The Construction of Locally D-Optimal Designs by Canonical Forms to an Extension for the Logistic Model[J].Applied Mathematics,2014,5(5):824-831.
7Maen Mahfouz,Ismail Masri,Haneen Mahfouz,Yara Mahfouz.Correlation between Vitamin C Deficiency and Hydroxyproline Amino Acid in Young Children of Northern Part in Palestine[J].Open Journal of Pediatrics,2015,5(2):151-155.
8Laurie A. Theeke,Jennifer A. Mallow,Emily R. Barnes,Elliott Theeke.The Feasibility and Acceptability of LISTEN for Loneliness[J].Open Journal of Nursing,2015,5(5):416-425. 被引量：3
9Cliff R. Kikawa,Michael Y. Shatalov,Petrus H. Kloppers,Andrew C. Mkolesia.On the Estimation of a Univariate Gaussian Distribution: A Comparative Approach[J].Open Journal of Statistics,2015,5(5):445-454. 被引量：1
10Philippe Granato,Shreekumar Vinekar,Jean-Pierre Van Gansberghe,Raymond Bruyer.Evidence of impaired facial emotion recognition in mild Alzheimer’s disease: A mathematical approach and application[J].Open Journal of Psychiatry,2012,2(3):171-186.

Open Journal of Statistics

2013年第5期

浏览历史

内容加载中请稍等...

Multiple Imputation of Missing Data:A Simulation Study on a Binary Response

相关作者

相关机构

相关主题

浏览历史