期刊文献+

认知诊断缺失数据处理方法的比较:零替换、多重插补与极大似然估计法 被引量:7

Comparison of missing data handling methods in cognitive diagnosis:Zero replacement,multiple imputation and maximum likelihood estimation
下载PDF
导出
摘要 数据缺失在测验中经常发生,认知诊断评估也不例外,数据缺失会导致诊断结果的偏差。首先,通过模拟研究在多种实验条件下比较了常用的缺失数据处理方法。结果表明:(1)缺失数据导致估计精确性下降,随着人数与题目数量减少、缺失率增大、题目质量降低,所有方法的PCCR均下降,Bias绝对值和RMSE均上升。(2)估计题目参数时,EM法表现最好,其次是MI,FIML和ZR法表现不稳定。(3)估计被试知识状态时,EM和FIML表现最好,MI和ZR表现不稳定。其次,在PISA2015实证数据中进一步探索了不同方法的表现。综合模拟和实证研究结果,推荐选用EM或FIML法进行缺失数据处理。 The problem of missing data is common in research,and there is no exception for cognitive diagnostic assessment(CDA).Some studies have revealed that both the presence of missing values and the selection of different missing data processing methods would affect the results of CDA.Therefore,it is necessary to attach more attention to the problem in CDA and choose appropriate methods to deal with it.Although the problem in CDA has been explored before,previous studies did not consider multiple imputation(MI)and full information maximum likelihood(FIML),which are widely used in the field of missing data analysis.Moreover,previous studies neglected the comparison using empirical data and saturation models such as GDINA model.In summary,the main purpose of this study are to introduce MI and FIML into CDA,thus making a comprehensive comparison of different missing data handling methods,and further putting forward suggestions for handling missing data in practice.Simulation study considered six factors:(1)Sample size:200 participants,400 participants,and 1000 participants;(2)Test length:15 test items and 30 test items;(3)Quality of items:high quality,medium quality,and low quality;(4)Missing data mechanisms:missing completely at random(MCAR),missing at random(MAR),and missing not at random(MNAR);(5)Missing rate:10%,20%,and 30%;(6)Missing data handling methods:zero replacement(ZR),MI-CART,MI-PMM,MI-LOGREG.BOOT,Expectation-Maximization algorithm(EM),and FIML.The GDINA model was used,and the analysis process was realized by the GDINA package in R software.Secondly,the PISA 2015 computer-based mathematics data were applied to compare the practical value of the proposed methods.The results of simulation study revealed that:(1)Missing data results in a decrease in estimation accuracy.The absolute value of Bias and RMSE both increased and PCCR values of all methods decreased as the sample size,test length and the quality of the items decreased and the missing rate increased;(2)When estimating item parameters,EM performed best,followed by MI.Meanwhile,FIML and ZR methods were unstable;(3)When estimating the KS of participants,EM and FIML performed best as the missing data mechanism was MAR or MCAR.When the missing data mechanism was MNAR,EM,FIML and ZR performed best.The empirical study results further supported the simulation research results.It showed that:(1)For all empirical indicators,EM,FIML,and MI-PMM perform best on one or more indicators;(2)The results obtained under the empirical study and simulation study under the MNAR mechanism are very similar;(3)EM performs well on all indicators,and ZR and FIML methods are slightly worse than EM,followed by MI-PMM,LOGREG.BOOT and MI-CART.In addition,based on the research results,the following suggestions were provided:(1)EM and FIML should be the first choice.However,if researchers do not want to get the complete data set,FIML could be used as a priority for missing data handling;(2)When the missing data mechanism was MAR or MCAR and the test length was not enough,researchers should avoid using the ZR method to deal with missing data.Finally,this paper ends with the prospects of future researches:(1)The multilevel scoring situation should also be studied;(2)The effectiveness of these methods should be tested in longitudinal research;(3)The performance of more methods of information matrix can be further compared in calculating the standard error to handle missing data;(4)Future research could focus on the missing mechanisms of data onto the real data.
作者 宋枝璘 郭磊 郑天鹏 SONG Zhilin;GUO Lei;ZHENG Tianpeng(Faculty of Psychology,Southwest University,Chongqing 400715,China;Southwest University Branch,Collaborative Innovation Center of Assessment toward Basic Education Quality,Chongqing 400715,China;Collaborative Innovation Center of Assessment for Basic Education Quality(CICA-BEQ)at Beijing Normal University,Beijing 100088,China)
出处 《心理学报》 CSSCI CSCD 北大核心 2022年第4期426-440,I0002-I0005,共19页 Acta Psychologica Sinica
基金 国家自然科学基金青年项目(31900793) 北京师范大学中国基础教育质量监测协同创新中心重大成果培育性项目(2019-06-023-BZPK01) 中央高校基本科研业务费专项资金(SWU2109222)资助。
关键词 认知诊断 GDINA 模型 缺失数据 多重插补 极大似然估计 cognitive diagnosis GDINA model missing data multiple imputation maximum likelihood estimation
  • 相关文献

参考文献4

二级参考文献66

  • 1茅群霞,李晓松.多重填补法Markov Chain Monte Carlo模型在有缺失值的妇幼卫生纵向数据中的应用[J].四川大学学报(医学版),2005,36(3):422-425. 被引量:7
  • 2风笑天.追踪研究:方法论意义及其实施[J].华中师范大学学报(人文社会科学版),2006,45(6):43-47. 被引量:27
  • 3张佩(2002).心理学论文写作规范,北京:科学出版社.
  • 4Barzi, F., & Woodward, M. (2004). Imputations of missing values in practice: Results from imputations of serum cholesterol in 28 cohort studies. American Journal of Epidemiology, 160(1), 3445.
  • 5Barzi, F., Woodward, M., Marfisi, R. M., Tognoni, G., & Marchioli, R. (2006). Analysis of the benefits of a Mediterranean diet in the GISSI-Prevenzione study: A case study in imputation of missing values from repeated measurements. European Journal of Epidemiology, 21(1), 15-24.
  • 6Burton, A., &Altman, D. G. (2004). Missing covariate data within cancer prognostic studies: A review of current reporting and proposed guidelines. British Journal of Cance, 91(1),4-8.
  • 7Clarke, P., & Hardy, R. (2007). Methods for handling missing data. In A. Pickles, B. Maughan, & M. Wadsworth (Eds.), Epidemiological methods in life course research (Vol. 1, pp. 157-197).
  • 8New York: Oxford University Press. Daniels, M. J., & Hogan, J. W. (2008). Missing data in longitudinal studies: Strategies for bayesian modeling and sensitivity analysis. Boca Raton, Florida: CRC Press.
  • 9Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1-38.
  • 10Diggle, P. J. (1989). Testing for random dropouts in repeated measurement data. Biometrics, 45(4), 1255-1258.

共引文献41

同被引文献92

引证文献7

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部