期刊文献+

EMB多重填补法在横断面健康体检资料定量变量填补中应用 被引量:1

Application of expectation maximization with bootstrapping in multiple imputation of quantitative variables for cross-sectional health examination data
原文传递
导出
摘要 目的研究基于bootstrap抽样的期望最大化算法(EMB)的多重填补方法在横断面健康体检定量变量缺失数据的填补效果,为健康体检数据选择恰当的多重填补方法提供相关依据。方法基于人群横断面健康体检实测数据,采用EMB法多重填补法,应用R 3.5.0统计软件中的Amelia II程序包对2013年1-12月在陕西省西安市西京医院健康体检中心进行常规体检的1 634名员工的健康体检数据进行多重填补分析。结果对于横断面定量健康体检资料,在单变量缺失率分别为<10%、20%和70%3种随机缺失情况下,EMB多重填补法相对于列表删除法其估计误差均降低;基于相同数据,EMB多重填补次数不同,资料的填补效果不同,本研究资料较为合适的填补次数为m=10次;填补前后概率密度曲线分布图显示,填补次数m=10时多重填补值与实际观察值的概率密度曲线图吻合程度较好;变量过拟合诊断图进一步显示,填补次数m=10时各变量大多数观测值的90%CI包含了其最佳拟合线,且其可信区间较窄;基于列表删除法和EMB多重填补法处理后的2个不同分析数据集分别构建的多因素回归模型中包含的变量不同。结论对于不同缺失率随机缺失的定量变量,EMB多重填补法的填补效果均优于列表删除法;不同缺失资料的最优填补次数不同。 Objective To evaluate the effect of expectation maximization with bootstrapping(EMB) in multiple imputation of quantitative variables for cross-sectional health examination data and to provide evidences for choosing appropriate multiple imputation method for health examination data. Methods We collected data on 1 634 people taking routine physical examination at Xijing Hospital Health Checkup Center in Xi′an, Shaanxi province from January to December2013. The data were analyzed with Amelia II package in R 3.5.0 statistical software and EMB multiple imputation method was used to fill missing values in the data set. Results The estimated errors of the multiple imputations with EMB were decreased compared to those with list deletion method for the data set with the missing rate of less than 10%, 20%, or 70%for univariate quantitative variables. The effect of the EMB multiple imputation differed by the time of the imputation process and the appropriate imputation time for the used data set was 10. The probability density distribution curves for the data set before and after the imputation demonstrated that the imputed values were in a good agreement with the observed values when 10 imputations completed;the over-fitting diagnostic plot further revealed that the majority of the 90%confidence intervals for most observations of each variable contained the best fit line, with the narrow ranges for the confidence intervals. Different variables were included in the multivariate logistic regression models constructed for the same data set processed with multiple imputation with list deletion and the EMB method. Conclusion For quantitative variables with different random missing rate, the effect of EMB based multiple imputation is better than that of list deletion method and the optimal imputation times vary for data sets with different missing profile.
作者 石福艳 马洁 黄璐 许小珊 孙娜 孟维静 王素珍 杨丽平 SHI Fu-yan;MA Jie;HUANG Lu(School of Public Health and Management,Weifang Medical University,Weifang,Shandong Province 201053,China)
出处 《中国公共卫生》 CAS CSCD 北大核心 2019年第11期1536-1539,共4页 Chinese Journal of Public Health
基金 国家自然科学基金(81473071) 陕西省科技统筹创新工程计划项目(2016KTZDSF02–07–01) 山东省科技发展计划项目(2015WS0067) 潍坊医学院博士启动基金(2017BSQD51)
关键词 健康体检 缺失数据 基于bootstrap抽样的期望最大化算法(EMB) 多重填补 health examination missing data expectation-maximization with bootstrapping multiple imputation
  • 相关文献

二级参考文献17

  • 1周艺彪,姜庆五,赵根明.调查研究中数据缺失的机制及处理方法[J].中国卫生统计,2005,22(5):318-321. 被引量:11
  • 2Arnold Alice M, Kronmal Richard A. Multiple Imputation of Baseline Data in the Cardiovascular Health Study. American Journal of Epidemiology,2003;157(1):74.
  • 3Abraham WT, Russell DW. Missing data:a review of current methods and applications in epidemiological research. Current Opinion in Psychiatry,2004;17(4):315.
  • 4Mary Beth Landrum,Mark P Becker. A multiple imputation strategy for incomplete longitudinal data. Statistics in Medicine,2001;20:2741.
  • 5Yang C Yuan. Multiple imputation for missing data:Concepts and new development. SAS Institute Inc,1999:267-25.
  • 6Rubin DB. Multiple imputation for nonresponse in surveys. New York: John Wiley & Sons,1987:15-22.
  • 7Van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine,1999;18:681.
  • 8Schafer JL,Maren K Olsen. Multiple imputation for multivariate missing-data problems:a data analysis's perspective. Multivariate Behavioural Research,1998;33:545.
  • 9MCMC Method for Arbitrary Missing Data. SAS/STAT 9 User's guide. North Carolina:SAS Institute Inc,2002:159-169.
  • 10Combining Inferences from Multiple Imputed Data Sets. SAS/STAT 9 User's Guide. North Carolina:SAS Institute Inc,2002:211-213.

共引文献10

同被引文献6

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部