期刊文献+

不同缺失场景下各缺失值不同处理方法的结果比较 被引量:3

Missing Data Replacement Methods in Different Scenarios
原文传递
导出
摘要 目的收集四川省肿瘤医院头颈部肿瘤患者住院病案信息数据,探讨不同缺失场景下数据缺失值通过完成者数据集法、期望-极大化法(EM)、马尔可夫链-蒙特卡洛法(MCMC)3种方法处理后的标准化住院天数对标准化住院费用对数值的回归系数估计值r的优劣。方法运用R 3.4.1软件,采用蒙特卡洛模拟,通过设定缺失比例和缺失机制模拟不同场景的缺失数据集,运用完成者数据集法、期望-极大化法、马尔可夫链-蒙特卡洛法估计不同缺失场景的模拟数据集中标准化住院天数对标准化住院费用对数值的回归系数估计值r,并与完整数据集的回归系数估计值r_c结果进行比较,从准确度(各种方法估计的r与r_c比较)和精确度(各种方法的r的变异程度s)两个角度进行评价。结果 3种缺失值处理方法的优劣在不同的缺失场景中均有所差异,完全随机缺失(MCAR)和随机缺失(MAR)(1∶2)机制下,当缺失比例小于30%时3种方法的估计值r均在可接受范围(r_c±0.5s_c);MAR(比例=2∶1)机制任意缺失比例下3种方法的估计值r均在可接受范围内;任意缺失场景下用EM法估计的r的标准误s最小,且与r_c的标准误sc最为接近。结论在选择缺失值处理方法时,应该考虑数据的缺失比例和缺失机制。 Objective To compare the effect of different approaches of missing data replacement on the regression coefficient estimates rof"length of stay"on"hospital expenditure".Methods Data were extracted from the medical records of patients with head and neck neoplasms who were admitted to Sichuan Cancer Hospital.R3.4.1 was used for generating and processing simulated datasets.Various scenarios were established by setting up different proportions of missing data and missing mechanisms using Monte Carlo method.Three strategies were tested for replacing missing data:Complete Case method,Expectation Maximization(EM),and Markov Chain Monte Carlo method(MCMC).The regression coefficient estimates r of standardized "length of stay"on standardized logarithmic"hospital expenditure"were calculated using these strategies and compared with that of the original complete dataset,in terms of their accuracy(magnitude of differences in r)and precision(differences in the standard error of r).Results The three replacement methods were all acceptable(within the limit rc±0.5 sc)when missing data were generated using MAR(2∶1)mechanism,or less than 30% data were simulated as missing using the MCAR and MAR(1∶2)mechanism.The EM method had the best estimation precision.Conclusion Missing data replacement should consider the proportion of missing data and potential mechanisms involved.
作者 邱建青 周雨秋 岳廷妍 裴姣 税春燕 李晓松 张韬 QIU Jian-qing;ZHOUYu-qiu;YUE Ting-yan;PEI Jiao;SHUI Chun-yan;LI Xiao-song;ZHANG Tao(Department of Epidemiology and Biostatistics, West China School of Public Health, Sichuan University, Chengdu 610041, China;Sichuan Cancer Hospital Institute, Sichuan Cancer Center, School of Medicine, University of Electronic Science and Technology of China, Chengdu 610041, China)
出处 《四川大学学报(医学版)》 CAS CSCD 北大核心 2018年第3期430-435,共6页 Journal of Sichuan University(Medical Sciences)
基金 国家自然科学基金青年科学基金项目(No.81602935) 四川大学青年教师科研启动基金(No.2016SCU11006)资助
关键词 缺失值 缺失机制 缺失比例 期望-极大化法 马尔可夫链-蒙特卡洛法 Missing data replacement Mechanism of missing Proportion of missing Expectation maximization (EM) Markov chain-Monte Carlo (MCMC)
  • 相关文献

参考文献8

二级参考文献54

  • 1乔珠峰,田凤占,黄厚宽,陈景年.缺失数据处理方法的比较研究[J].计算机研究与发展,2006,43(z1):171-175. 被引量:13
  • 2庄严,邢艳春,马文卿.含有缺失机制的多元纵向数据分析[J].中国卫生统计,2008,25(5):489-493. 被引量:5
  • 3Albert, J. H. (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling. Journal of Educational Statistics, 17, 251-269.
  • 4Huisman, M. (2000). Imputation of missing item responses: some simple techniques. Quality and Quantity, 34, 331-351.
  • 5Jones, D. H., & Nediak, M. (2000). Item parameter calibration of LSAT items using MCMC approximation of Bayes posterior distributions. Rutcor research report, 7-2000.
  • 6Karkee, T., & Finkelman, M. (2007, April). Missing Data treatment Methods in Parameter Recovery for a Mixed-Format Test. Paper presented at the annual meeting of the American Educational Research Association, Chicago.
  • 7利特尔,鲁宾.(2004).缺失数据统计分析.(孙山泽译). 北京:中国统计出版社.
  • 8Ludlow, L. H., & O’Leary M. (1999). Scoring omitted and not-reached items: practical data analysis implications. Educational and Psychological Measurement, 59, 615-630.
  • 9Maris, G., & Bechger, T. M. (2005). An introduction to the DA-T Gibbs sampler for the two-parameter logistic (2PL) model and beyond. Psicológica, 26, 327-352.
  • 10Muraki, E., & Bock, R. D. (1993). PARSCALE: IRT based test scoring and item analysis for graded open-ended exercises and performance tasks. Chicago: Scientific software Int.

共引文献84

同被引文献29

引证文献3

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部