In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, lead...In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, leading to the conclusion that a small m (≤5) would be sufficient for MI. However, evidence has been accumulating that many more imputations are needed. Why would the apparently sufficient m deduced from the RE be actually too small? The answer may lie with γ. In this research, γ was determined at the fractions of missing data (δ) of 4%, 10%, 20%, and 29% using the 2012 Physician Workflow Mail Survey of the National Ambulatory Medical Care Survey (NAMCS). The γ values were strikingly small, ranging in the order of 10?6 to 0.01. As δ increased, γ usually increased but sometimes decreased. How the data were analysed had the dominating effects on γ, overshadowing the effect of δ. The results suggest that it is impossible to predict γ using δ and that it may not be appropriate to use the γ-based RE to determine sufficient m.展开更多
We consider multivariate small area estimation under nonignorable, not missing at random(NMAR) nonresponse. We assume a response model that accounts for the different patterns ofthe observed outcomes, (which values ar...We consider multivariate small area estimation under nonignorable, not missing at random(NMAR) nonresponse. We assume a response model that accounts for the different patterns ofthe observed outcomes, (which values are observed and which ones are missing), and estimatethe response probabilities by application of the Missing Information Principle (MIP). By this principle, we first derive the likelihood score equations for the case where the missing outcomes areactually observed, and then integrate out the unobserved outcomes from the score equationswith respect to the distribution holding for the missing data. The latter distribution is definedby the distribution fitted to the observed data for the respondents and the response model. Theintegrated score equations are then solved with respect to the unknown parameters indexingthe response model. Once the response probabilities have been estimated, we impute the missing outcomes from their appropriate distribution, yielding a complete data set with no missingvalues, which is used for predicting the target area means. A parametric bootstrap procedure isdeveloped for assessing the mean squared errors (MSE) of the resulting predictors. We illustratethe approach by a small simulation study.展开更多
文摘In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, leading to the conclusion that a small m (≤5) would be sufficient for MI. However, evidence has been accumulating that many more imputations are needed. Why would the apparently sufficient m deduced from the RE be actually too small? The answer may lie with γ. In this research, γ was determined at the fractions of missing data (δ) of 4%, 10%, 20%, and 29% using the 2012 Physician Workflow Mail Survey of the National Ambulatory Medical Care Survey (NAMCS). The γ values were strikingly small, ranging in the order of 10?6 to 0.01. As δ increased, γ usually increased but sometimes decreased. How the data were analysed had the dominating effects on γ, overshadowing the effect of δ. The results suggest that it is impossible to predict γ using δ and that it may not be appropriate to use the γ-based RE to determine sufficient m.
文摘We consider multivariate small area estimation under nonignorable, not missing at random(NMAR) nonresponse. We assume a response model that accounts for the different patterns ofthe observed outcomes, (which values are observed and which ones are missing), and estimatethe response probabilities by application of the Missing Information Principle (MIP). By this principle, we first derive the likelihood score equations for the case where the missing outcomes areactually observed, and then integrate out the unobserved outcomes from the score equationswith respect to the distribution holding for the missing data. The latter distribution is definedby the distribution fitted to the observed data for the respondents and the response model. Theintegrated score equations are then solved with respect to the unknown parameters indexingthe response model. Once the response probabilities have been estimated, we impute the missing outcomes from their appropriate distribution, yielding a complete data set with no missingvalues, which is used for predicting the target area means. A parametric bootstrap procedure isdeveloped for assessing the mean squared errors (MSE) of the resulting predictors. We illustratethe approach by a small simulation study.