认知诊断缺失数据处理方法的比较:零替换、多重插补与极大似然估计法被引量：7

Comparison of missing data handling methods in cognitive diagnosis:Zero replacement,multiple imputation and maximum likelihood estimation

下载PDF

导出

摘要数据缺失在测验中经常发生,认知诊断评估也不例外,数据缺失会导致诊断结果的偏差。首先,通过模拟研究在多种实验条件下比较了常用的缺失数据处理方法。结果表明:(1)缺失数据导致估计精确性下降,随着人数与题目数量减少、缺失率增大、题目质量降低,所有方法的PCCR均下降,Bias绝对值和RMSE均上升。(2)估计题目参数时,EM法表现最好,其次是MI,FIML和ZR法表现不稳定。(3)估计被试知识状态时,EM和FIML表现最好,MI和ZR表现不稳定。其次,在PISA2015实证数据中进一步探索了不同方法的表现。综合模拟和实证研究结果,推荐选用EM或FIML法进行缺失数据处理。 The problem of missing data is common in research,and there is no exception for cognitive diagnostic assessment(CDA).Some studies have revealed that both the presence of missing values and the selection of different missing data processing methods would affect the results of CDA.Therefore,it is necessary to attach more attention to the problem in CDA and choose appropriate methods to deal with it.Although the problem in CDA has been explored before,previous studies did not consider multiple imputation(MI)and full information maximum likelihood(FIML),which are widely used in the field of missing data analysis.Moreover,previous studies neglected the comparison using empirical data and saturation models such as GDINA model.In summary,the main purpose of this study are to introduce MI and FIML into CDA,thus making a comprehensive comparison of different missing data handling methods,and further putting forward suggestions for handling missing data in practice.Simulation study considered six factors:(1)Sample size:200 participants,400 participants,and 1000 participants;(2)Test length:15 test items and 30 test items;(3)Quality of items:high quality,medium quality,and low quality;(4)Missing data mechanisms:missing completely at random(MCAR),missing at random(MAR),and missing not at random(MNAR);(5)Missing rate:10%,20%,and 30%;(6)Missing data handling methods:zero replacement(ZR),MI-CART,MI-PMM,MI-LOGREG.BOOT,Expectation-Maximization algorithm(EM),and FIML.The GDINA model was used,and the analysis process was realized by the GDINA package in R software.Secondly,the PISA 2015 computer-based mathematics data were applied to compare the practical value of the proposed methods.The results of simulation study revealed that:(1)Missing data results in a decrease in estimation accuracy.The absolute value of Bias and RMSE both increased and PCCR values of all methods decreased as the sample size,test length and the quality of the items decreased and the missing rate increased;(2)When estimating item parameters,EM performed best,followed by MI.Meanwhile,FIML and ZR methods were unstable;(3)When estimating the KS of participants,EM and FIML performed best as the missing data mechanism was MAR or MCAR.When the missing data mechanism was MNAR,EM,FIML and ZR performed best.The empirical study results further supported the simulation research results.It showed that:(1)For all empirical indicators,EM,FIML,and MI-PMM perform best on one or more indicators;(2)The results obtained under the empirical study and simulation study under the MNAR mechanism are very similar;(3)EM performs well on all indicators,and ZR and FIML methods are slightly worse than EM,followed by MI-PMM,LOGREG.BOOT and MI-CART.In addition,based on the research results,the following suggestions were provided:(1)EM and FIML should be the first choice.However,if researchers do not want to get the complete data set,FIML could be used as a priority for missing data handling;(2)When the missing data mechanism was MAR or MCAR and the test length was not enough,researchers should avoid using the ZR method to deal with missing data.Finally,this paper ends with the prospects of future researches:(1)The multilevel scoring situation should also be studied;(2)The effectiveness of these methods should be tested in longitudinal research;(3)The performance of more methods of information matrix can be further compared in calculating the standard error to handle missing data;(4)Future research could focus on the missing mechanisms of data onto the real data.

作者宋枝璘郭磊郑天鹏 SONG Zhilin;GUO Lei;ZHENG Tianpeng(Faculty of Psychology,Southwest University,Chongqing 400715,China;Southwest University Branch,Collaborative Innovation Center of Assessment toward Basic Education Quality,Chongqing 400715,China;Collaborative Innovation Center of Assessment for Basic Education Quality(CICA-BEQ)at Beijing Normal University,Beijing 100088,China)

机构地区西南大学心理学部中国基础教育质量监测协同创新中心西南大学分中心北京师范大学中国基础教育质量监测协同创新中心

出处《心理学报》 CSSCI CSCD 北大核心 2022年第4期426-440,I0002-I0005,共19页 Acta Psychologica Sinica

基金国家自然科学基金青年项目(31900793) 北京师范大学中国基础教育质量监测协同创新中心重大成果培育性项目(2019-06-023-BZPK01) 中央高校基本科研业务费专项资金(SWU2109222)资助。

关键词认知诊断 GDINA 模型缺失数据多重插补极大似然估计 cognitive diagnosis GDINA model missing data multiple imputation maximum likelihood estimation

分类号 B841 [哲学宗教—基础心理学]

引文网络
相关文献

参考文献4

1叶素静,唐文清,张敏强,曹魏聪.追踪研究中缺失数据处理方法及应用现状分析[J].心理科学进展,2014,22(12):1985-1994. 被引量：19
2刘彦楼,辛涛,李令青,田伟,刘笑笑.改进的认知诊断模型项目功能差异检验方法——基于观察信息矩阵的Wald统计量[J].心理学报,2016,48(5):588-598. 被引量：14
3郭磊,周文杰.基于选项层面的认知诊断非参数方法[J].心理学报,2021,53(9):1032-1043. 被引量：9
4高旭亮,汪大勋,蔡艳,涂冬波.认知诊断模型的比较及其应用研究:饱和模型、简化模型还是混合方法[J].心理科学,2018,41(3):727-734. 被引量：4

二级参考文献66

1茅群霞,李晓松.多重填补法Markov Chain Monte Carlo模型在有缺失值的妇幼卫生纵向数据中的应用[J].四川大学学报（医学版）,2005,36(3):422-425. 被引量：7
2风笑天.追踪研究:方法论意义及其实施[J].华中师范大学学报（人文社会科学版）,2006,45(6):43-47. 被引量：27
3张佩(2002).心理学论文写作规范,北京:科学出版社.
4Barzi, F., & Woodward, M. (2004). Imputations of missing values in practice: Results from imputations of serum cholesterol in 28 cohort studies. American Journal of Epidemiology, 160(1), 3445.
5Barzi, F., Woodward, M., Marfisi, R. M., Tognoni, G., & Marchioli, R. (2006). Analysis of the benefits of a Mediterranean diet in the GISSI-Prevenzione study: A case study in imputation of missing values from repeated measurements. European Journal of Epidemiology, 21(1), 15-24.
6Burton, A., &Altman, D. G. (2004). Missing covariate data within cancer prognostic studies: A review of current reporting and proposed guidelines. British Journal of Cance, 91(1),4-8.
7Clarke, P., & Hardy, R. (2007). Methods for handling missing data. In A. Pickles, B. Maughan, & M. Wadsworth (Eds.), Epidemiological methods in life course research (Vol. 1, pp. 157-197).
8New York: Oxford University Press. Daniels, M. J., & Hogan, J. W. (2008). Missing data in longitudinal studies: Strategies for bayesian modeling and sensitivity analysis. Boca Raton, Florida: CRC Press.
9Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1-38.
10Diggle, P. J. (1989). Testing for random dropouts in repeated measurement data. Biometrics, 45(4), 1255-1258.

共引文献41

1李令青,韩笑,辛涛,刘彦楼.认知诊断评价在个性化学习中的功能与价值[J].中国考试,2019(1):40-44. 被引量：10
2黄菲菲,张敏强.社会网络分析中缺失数据的处理方法[J].心理技术与应用,2016,4(8):456-464. 被引量：3
3周敏林,章海涛,陆梦洁,钟伟华,刘玉秀.临床纵向数据缺失的随机效应模式混合模型及SAS实现[J].中国临床药理学与治疗学,2016,21(9):1012-1017. 被引量：2
4谢翘楚,姚毅.电网历史数据缺失及补录研究[J].四川理工学院学报（自然科学版）,2017,30(2):21-25. 被引量：1
5董书阳,梁熙,张莹,王争艳.母亲积极养育行为对儿童顺从行为的早期预测与双向作用：从婴儿到学步儿[J].心理学报,2017,49(4):460-471. 被引量：7
6张杉杉,陈楠,刘红云.LGM模型中缺失数据处理方法的比较:ML方法与Diggle-Kenward选择模型[J].心理学报,2017,49(5):699-710. 被引量：3
7刘彦楼,辛涛,田伟.项目反应理论与认知诊断模型的参数估计:模型整合视角[J].北京师范大学学报（自然科学版）,2017,53(6):742-748. 被引量：4
8林睿,陈鲁雁,王嘉梅,范菁,袁长森.基于语言模型的缺失数据追踪方法与应用分析[J].计算机与数字工程,2018,46(10):2034-2038. 被引量：1
9于力超.纵向抽样调查中缺失值的预防和处理方法[J].统计与决策,2018,0(20):9-13.
10胥彦,李超平.追踪研究在组织行为学中的应用[J].心理科学进展,2019,27(4):600-610. 被引量：29

同被引文献92

1危前进,魏继鹏,古天龙,常亮,文益民.粗糙集多目标并行属性约简算法[J].软件学报,2022,33(7):2599-2617. 被引量：3
2齐俊德,张定华,李山,陈冰.考虑测量空间的机器人绝对定位精度标定[J].机械科学与技术,2020,0(1):68-73. 被引量：6
3杨晓波.基于最小约简的粗糙集数据挖掘算法研究[J].计算机与数字工程,2023,51(1):148-151. 被引量：1
4王晓辉,吴禄慎,陈华伟.基于法向量距离分类的散乱点云数据去噪[J].吉林大学学报（工学版）,2020,50(1):278-288. 被引量：20
5周浩,龙立荣.共同方法偏差的统计检验与控制方法[J].心理科学进展,2004,12(6):942-950. 被引量：3723
6王雁飞.社会支持与身心健康关系研究述评[J].心理科学,2004,27(5):1175-1177. 被引量：422
7高正亮,童辉杰.积极情绪的作用:拓展-建构理论[J].中国健康心理学杂志,2010,18(2):246-249. 被引量：118
8陈平,辛涛.认知诊断计算机化自适应测验中的项目增补[J].心理学报,2011,43(7):836-850. 被引量：27
9吴艳,温忠麟.结构方程建模中的题目打包策略[J].心理科学进展,2011,19(12):1859-1867. 被引量：634
10王鑫强,张大均.初中生生活满意度的发展趋势及心理韧性的影响:2年追踪研究[J].心理发展与教育,2012,28(1):91-98. 被引量：73

引证文献7

1游晓锋,杨建芹,秦春影,刘红云.认知诊断测评中缺失数据的处理:随机森林阈值插补法[J].心理学报,2023,55(7):1192-1206. 被引量：2
2王莉军,方圆,李超,陈祉妍.初中生一般归属感和生活满意度的关系:领悟社会支持和亲社会行为的链式中介作用[J].教育生物学杂志,2023,11(6):456-461.
3李晓梅,黄建勇,张泽治.基于改进权函数距离的机器人运动偏差补偿算法设计[J].吉林大学学报（信息科学版）,2024,42(1):86-92. 被引量：1
4金琦,吴双赢,代婷婷,庄严.基于结构方程模型的广东省人均卫生费用影响因素研究[J].中国卫生统计,2024,41(1):80-83.
5洪德华,赵林燕,雷沁怡,孙佳丽.基于自抗扰控制的散乱数据随机插补优化模型[J].电子设计工程,2024,32(14):130-133.
6Yuedong Qiu,Qi Sun,Jie Zhou,Ni Jiang,Wenyu Zeng,Biyun Wu,Fang Li.Is Peer Victimization Associated with Higher Online Trolling among Adolescents?The Mediation of Hostile Attribution Bias and the Moderation of Trait Mindfulness[J].International Journal of Mental Health Promotion,2024,26(8):623-632.
7李庆波,赵宇兰,张如静.基于粗糙集约简与概率图的认知诊断模型研究[J].西南大学学报（自然科学版）,2024,46(11):217-226.

二级引证文献3

1甄珍,刘昱鑫,陈斌,任海萍,刘亚芝.基于乌鸦搜索算法的医疗数据填补方法[J].现代仪器与医疗,2024,30(3):48-53.
2Yuedong Qiu,Qi Sun,Jie Zhou,Ni Jiang,Wenyu Zeng,Biyun Wu,Fang Li.Is Peer Victimization Associated with Higher Online Trolling among Adolescents?The Mediation of Hostile Attribution Bias and the Moderation of Trait Mindfulness[J].International Journal of Mental Health Promotion,2024,26(8):623-632.
3邹涛,黄敬然,项超群,张建辉.工业打磨机器人机械臂运动偏离距离预补偿算法[J].广州大学学报（自然科学版）,2024,23(4):1-8.

1史浩杰,李幸,贾俊铖,匡健,那幸仪.基于认知诊断和神经网络的试题得分预测[J].计算机技术与发展,2022,32(2):39-44. 被引量：1
2郭磊,周文杰.基于选项层面的认知诊断非参数方法[J].心理学报,2021,53(9):1032-1043. 被引量：9
3韦连慧,郭晓晶,许金芳,陈晨鑫,陈枭,迟立杰,郑轶,梁际洲,贺佳,叶小飞.缺失数据处理在药物流行病学研究中的应用及进展[J].药物流行病学杂志,2022,31(1):60-64. 被引量：1
4裴敏玥,沈翀,李楠,赵一鸣.Bootstrap多重插补在填补医学研究缺失数据中的应用[J].中华儿科杂志,2022,60(1):2-2. 被引量：2
5郑云清.落实“双减”政策的一种策略选择:小初科学衔接[J].新教师,2022(1):12-13.
6张所娟,黄松,余晓晗,陈恩红.基于模糊测度的知识关联性建模方法[J].模式识别与人工智能,2022,35(2):95-105. 被引量：3
7纪德洋,金锋,冬雷,张姗,于坤洋.基于皮尔逊相关系数的光伏电站数据修复[J].中国电机工程学报,2022,42(4):1514-1522. 被引量：62
8吴鑫育,李心丹,马超群.基于期权与高频数据信息的VaR度量研究[J].中国管理科学,2021,29(8):13-23. 被引量：8
9王金虹,张晓薇,马斌.逻辑回归与关联分析膳食习惯对慢性代谢疾病的影响[J].电子技术与软件工程,2021(20):175-178.
10汪广明,何滔,熊玺,卢玉龙,王明涛.基于改进循环神经网络的多数据流缺失值估计[J].粘接,2022(2):108-111. 被引量：2

心理学报

2022年第4期

浏览历史

内容加载中请稍等...

认知诊断缺失数据处理方法的比较:零替换、多重插补与极大似然估计法被引量：7

参考文献4

二级参考文献66

共引文献41

同被引文献92

引证文献7

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

认知诊断缺失数据处理方法的比较:零替换、多重插补与极大似然估计法 被引量：7

参考文献4

二级参考文献66

共引文献41

同被引文献92

引证文献7

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

认知诊断缺失数据处理方法的比较:零替换、多重插补与极大似然估计法被引量：7