分类数据测量等价性检验方法及其比较:项目阈值(难度)参数的组间差异性检验被引量：3

Testing Measurement Equivalence of Categorical Items' Threshold/Difficulty Parameters:A Comparison of CCFA and (M)IRT Approaches

下载PDF

导出

摘要测量工具满足等价性是进行多组比较的前提,测量等价性的检验方法主要有基于CFA的多组比较法和基于IRT的DIF检验两类方法。文章比较了单维测验情境下基于CCFA的DIFFTEST检验方法和基于IRT模型的IRT-LR检验方法,以及多维测验情境下DIFFTEST和基于MIRT的卡方检验方法的差异。通过模拟研究的方法,比较了几种方法的检验力和第一类错误,并考虑了样本总量、样本量的组间均衡性、测验长度、阈值差异大小以及维度间相关程度的影响。研究结果表明:(1)在单维测验下,IRT-LR是比DIFFTEST更为严格的检验方法;多维测验下,在测验较长、测验维度之间相关较高时,MIRT-MG比DIFFTEST更容易检验出项目阈值的差异,而在测验长度较短、维度之间相关较小时,DIFFTEST的检验力反而略高于MIRT-MG方法。(2)随着阈值差值增加,DIFFTEST、IRT-LR和MIRT-MG三种方法的检验力均在增加,当阈值差异达到中等或较大时,三种方法都可以有效检验出测验阈值的不等价性。(3)随着样本总量增加,DIFFTEST、IRT-LR和MIRT-MG方法的检验力均在增加;在总样本量不变,两组样本均衡情况下三种方法的检验力均高于不均衡的情况。(4)违背等价性题目个数不变时,测验越长DIFFTEST的检验力会下降,而IRT-LR和MIRT-MG检验力则上升。(5)DIFFTEST方法的一类错误率平均值接近名义值0.05;而IRT-LR和MIRT-MG方法的一类错误率平均值远低于0.05。 Multiple group confirmatory factor analyses and differential item functioning basing on the unidimensional or the multidimensional item response theory were the two most commonly used methods to assess the measurement equivalence of categorical items. Unlike the traditional linear factor analysis, multiple-group categorical confirmatory factor analysis （CCFA） could model the categorical measures with a threshold structure appropriately, which is comparable to the difficulty parameters in the multidimensional IRT [（M）IRT）]. In this study, we compared the multiple-group categorical CFA （CCFA） and （M）IRT in terms of their power to detect violations of measurement invariance （i.e., DIF） with the Monte Carlo method. Moreover, given the limitation of the assumptions under the traditional unidimensional IRT model, this study extended the DIF test method to the （M）IRT model. Simulation studies under both unidimensional and multidimensional conditions were conducted to compare the DIFFTEST method, IRT-LR method （for unidimensional scale）, and MIRT-MG （for multidimensional scale） with respect to their power to detect the lack of invariance across groups. Results indicated that the three methods, namely, DIFFTEST, IRT-LR, and MIRT-MG, showed reasonable power to identify the measurement non-equivalence when the difference of threshold was large. For unidimensional scale, the IRT-LR test demonstrated superior power to DIFFTEST. Whereas, for multidimensional scale, the results were not completely consistent across different conditions. The power of MIRT-MG was higher than that of DIFFTEST when test length was long and the correlation between dimensions was high. In contrast, the power of DIFFTEST was higher than that of MIRT-MG when test length was short and the correlations between dimensions were low. For a fixed number of noninvariant items, the power of the DIFFTEST method became smaller as the test length increased, whereas the power of the IRT-LR and MIRT-MG methods became larger as the test length increased. The number of respondents per group （sample size） was found to be one of the most important factors affecting the performance of these three approaches. The power of the DIFFTEST, IRT-LR, and, MIRT-MG methods would increase as the sample size increased. For a finite number of observations, the power of all three methods was larger under the balanced design when the two groups were equal in size than when two groups were unequal in size in the unbalanced design. For the DIFFTEST method, the Type I errors reached the nominal error rate at 5%, while the IRT-LR and MIRT-MG methods produced much lower Type I error rates.

作者刘红云李冲张平平骆方

机构地区北京师范大学心理学院北京新东方学校学习与发展中心北京师范大学认知神经科学与学习国家重点实验室

出处《心理学报》 CSSCI CSCD 北大核心 2012年第8期1124-1136,共13页 Acta Psychologica Sinica

基金国家自然科学基金(31100759) 全国教育科学"十二五"规划教育部重点课题(GFA111001) 教育部人文社会科学基金(11YJC190016)资助

关键词分类数据验证性因素分析项目功能差异 (多维)项目反应理论测量等价性 categorical data confirmatory factor analysis differential item functioning （multidimensional） item response theory measurement equivalence

分类号 B841 [哲学宗教—基础心理学]

引文网络
相关文献

参考文献41

1Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29(1), 67-91.
2Asparouhov, T., & Muthen, B.O. (2006). Robust chi square difference testing with mean and variance adjusted test statistics. Mplus Web Notes no. 10.Retrieved from: http://statmode12.com/download/webnotes/webnote10.pdf.
3Babakus, E., Ferguson, C. E., & Joreskog, K. G. (1987). The sensitivity of confirmatory maximum likelihood factor analysis to violations of measurement scale and distributional assumptions. Journal of Marketing Research, 24(2), 222-228.
4白新文,陈毅文.测量等价性的概念及其判定条件[J].心理科学进展,2004,12(2):231-239. 被引量：24
5Bollen, K. A. (1989). Structural Equations with Latent Variables. New York: John Wiley.
6Borsboom, D. (2006). When does measurement invariance matter? Medical Care, 44(11), 176-181.
7蔡华俭,林永佳,伍秋萍,严乐,黄玄凤.网络测验和纸笔测验的测量不变性研究——以生活满意度量表为例[J].心理学报,2008,40(2):228-239. 被引量：37
8Clauser, B. E., Nungester, R. J., Mazor, K., & Ripkey, D. (1996). A comparison of alternative matching strategies for DIF detection in tests that are multidimensional. Journal of Educational Measurement, 33,202-214.
9Douglas, J. A., Roussos, L. A., & Stout, W. (1996). Item-Bundle DIF hypothesis testing: Identifying suspect bundles and assessing their differential functioning. Journal of Educational Measurement, 33(4), 465-484.
10Drasgow, F. (1984). Scrutinizing psychological tests: Measurement equivalence and equivalent relations with external variables are the central issues. Psychological Bulletin, 95(1), 134-135.

二级参考文献81

1刘军,吴维库.心理测量平衡性研究与实例[J].心理科学,2005,28(1):170-174. 被引量：6
2李锐,宋铁英.国内网络调查研究分析[J].情报科学,2005,23(6):891-895. 被引量：19
3方佳明,邵培基,粟婕,张谦,田禹.基于网络的问卷调查回复率影响因素实证研究[J].管理评论,2006,18(10):12-17. 被引量：14
4[1]Drasgow F. Biased test items and differential validity. Psychological Bulletin,1982, 92: 526～531
5[2]Bobko P, Kehoe J F. On the fair use of bias: a comment on Drasgow. Psychology Bulletin, 1983, 93: 604～608
6[3]Drasgow F. Scrutinizing psychological tests: measurement equivalence and equivalent relations with external variables are the central issues. Psychological Bulletin, 1984, 95: 34～135
7[4]Drasgow F. Study of the measurement bias of two standardized psychological tests. Journal of Applied psychology, 1987, 72: 19～29
8[5]Byrne B M, Shavelson R J, Muthen B. Testing for the equivalence of factor covariance and mean structures:the issue of partial measurement invariance. Psychological Bulletin, 1989,105: 456～466
9[6]Drasgow F, Kanfer R. Equivalence of psychological measurement in heterogeneous populations. Journal of Applied Psychology, 1985, 70: 662～680
10[7]Idaszak J R, Bottom W P, Drasgow F. A test of the measurement equivalence of the revised job diagnostic survey: past problems and current solutions. Journal of Applied Psychology, 1988, 73: 647～656

共引文献55

1王惠.中小学生静态行为及其与主观幸福感的关系研究[J].中国健康教育,2020(6):540-544. 被引量：7
2王海江,胡巧婷.人力资源管理领域的纵向研究设计与数据分析[J].人力资源管理评论,2019(1):44-59. 被引量：1
3蔡华俭,林永佳,伍秋萍,严乐,黄玄凤.网络测验和纸笔测验的测量不变性研究——以生活满意度量表为例[J].心理学报,2008,40(2):228-239. 被引量：37
4蔡华俭,黄玄凤,宋海荣.性别角色和主观幸福感的关系模型：基于中国大学生的检验[J].心理学报,2008,40(4):474-486. 被引量：95
5赵必华,顾海根.多维自我概念量表的测量等价性检验[J].心理学探新,2008,28(4):90-94. 被引量：3
6李晓军,涂阳军.道教学员16PF特征及其生活满意度研究[J].郧阳师范高等专科学校学报,2009,29(6):10-12.
7李静,郭永玉.物质主义价值观量表在大学生群体中的修订[J].心理与行为研究,2009,7(4):280-283. 被引量：72
8黄飞,祝卓宏,王文忠,张建新,纪阳,章魁,刘宁,汪浩.手机与纸笔测验的心理测量学等值性:以儿童版事件冲击量表为例[J].中国临床心理学杂志,2010,18(1):31-33. 被引量：6
9吴瑞林,王建中,马喜亭.16PF问卷网络与纸笔施测方式的比较[J].心理学探新,2010,30(5):78-83. 被引量：5
10谭芙蓉,吴文峰,姚树桥.MASC在中意儿童样本间因子结构等值性比较[J].中国临床心理学杂志,2010,18(6):704-706. 被引量：1

同被引文献350

1汤丹丹,温忠麟.共同方法偏差检验:问题与建议[J].心理科学,2020,43(1):215-223. 被引量：499
2单志艳,孟庆茂.心理学中定量研究的几个问题[J].心理科学,2002,25(4):466-467. 被引量：18
3胡中锋,莫雷.论因素分析方法的整合[J].心理科学,2002,25(4):474-475. 被引量：26
4崔丽霞,郑日昌.20年来我国心理学研究方法的回顾与反思[J].心理学报,2001,33(6):564-570. 被引量：34
5温忠麟,张雷,侯杰泰,刘红云.中介效应检验程序及其应用[J].心理学报,2004,36(5):614-620. 被引量：7659
6周浩,龙立荣.共同方法偏差的统计检验与控制方法[J].心理科学进展,2004,12(6):942-950. 被引量：3697
7王惠文,付凌晖.PLS路径模型在建立综合评价指数中的应用[J].系统工程理论与实践,2004,24(10):80-85. 被引量：49
8傅珏生,田晓明.最大似然方法和Bayes方法在结构方程模型分析中的讨论(英文)[J].数理统计与管理,2004,23(6):53-58. 被引量：2
9张建平.一种新的统计方法和研究思路——结构方程建模述评[J].心理学报,1993,25(1):93-101. 被引量：25
10关丹丹,张厚粲,李中权.差异分数的信度分析[J].心理科学,2005,28(1):161-163. 被引量：2

引证文献3

1吴傅蕾,黄青梅,杨瑒,蔡婷婷,袁长蓉.项目反应理论在患者报告结局测量工具研究中的应用及展望[J].护士进修杂志,2021,36(5):408-412. 被引量：4
2温忠麟,方杰,沈嘉琦,谭倚天,李定欣,马益铭.新世纪20年国内心理统计方法研究回顾[J].心理科学进展,2021,29(8):1331-1344. 被引量：23
3王阳,温忠麟,李伟,方杰.新世纪20年国内结构方程模型方法研究与模型发展[J].心理科学进展,2022,30(8):1715-1733. 被引量：46

二级引证文献73

1席小莉,张海明,赵玲霞.基于结构方程模型的高职学前教育专业学生满意度研究[J].郑州师范教育,2024,13(4):88-91.
2陈凯,张彦秋,朱卓凡.国内中小学体育教师把控运动安全意愿的影响因素探究——基于自我决定理论的结构方程模型分析[J].体育视野,2023(21):18-21.
3张洋,刘萌萌,孙洪涛.士兵认知失败问卷的初步编制[J].武警医学,2023,34(3):190-193.
4张玲,刘映宏,陈郎,彭丽华,李微,李显蓉.结直肠癌患者自我报告结局的研究进展[J].现代临床护理,2021,20(10):79-84. 被引量：1
5庄云珠,万崇华,杨铮,蒋建明,李晓梅,杜进林.基于经典测量理论与项目反应理论的药物成瘾生命质量测定量表QLICD-DA(V2.0)条目分析[J].现代预防医学,2022,49(5):778-782. 被引量：7
6张凯欣,张瑞宏.基于心理授权的中介作用探讨护士组织内人际和谐对职业高原的影响[J].中国医疗管理科学,2022,12(3):85-91. 被引量：2
7崔洪波,樊晏辰,蒋玉露.家庭功能对青少年利他行为的影响:有调节的中介效应[J].贵州师范学院学报,2022,38(6):61-66.
8许岳培,陆春雷,王珺,宋琼雅,贾彬彬,胡传鹏.评估零效应的三种统计方法[J].应用心理学,2022,28(4):369-384. 被引量：5
9温忠麟,谢晋艳,方杰,王一帆.新世纪20年国内假设检验及其关联问题的方法学研究[J].心理科学进展,2022,30(8):1667-1681. 被引量：7
10温忠麟,陈虹熹,方杰,叶宝娟,蔡保贞.新世纪20年国内测验信度研究[J].心理科学进展,2022,30(8):1682-1691. 被引量：15

1骆方,张厚粲.检验项目功能差异的两类方法—CFA和IRT的比较[J].心理学探新,2006,26(1):74-78. 被引量：12
2刘红云,骆方,王玥,张玉.多维测验项目参数的估计：基于SEM与MIRT方法的比较[J].心理学报,2012,44(1):121-132. 被引量：7
3吴滤.测量等价性在国内心理测量中的应用述评[J].长安学刊,2015,0(6):108-109.
4方红庆.知识、断定与价值——索萨的等价论证及其批判[J].世界哲学,2015(5):103-110.
5白新文,陈毅文.测量等价性的概念及其判定条件[J].心理科学进展,2004,12(2):231-239. 被引量：24
6肖影影,毕重增,狄轩康.一般自我效能感量表的性别与跨文化项目功能差异分析[J].心理研究,2013,6(5):38-41. 被引量：8
7李杜芳.改善儿童同伴关系的几种方法[J].中国教师,2008(20):5-6.
8六种方法消除心理疲劳[J].应用写作,2006(6):50-50.
9张勋,李凌艳,刘红云,孙研.IRT_Δb法和修正LR法对矩阵取样DIF检验的有效性[J].心理学报,2013,45(8):921-934. 被引量：2
10林芳珊.矫治孩子自卑心理的几种方法[J].家长,2002,0(12):30-30.

心理学报

2012年第8期

浏览历史

内容加载中请稍等...

分类数据测量等价性检验方法及其比较:项目阈值(难度)参数的组间差异性检验被引量：3

参考文献41

二级参考文献81

共引文献55

同被引文献350

引证文献3

二级引证文献73

相关作者

相关机构

相关主题

浏览历史

分类数据测量等价性检验方法及其比较:项目阈值(难度)参数的组间差异性检验 被引量：3

参考文献41

二级参考文献81

共引文献55

同被引文献350

引证文献3

二级引证文献73

相关作者

相关机构

相关主题

浏览历史

分类数据测量等价性检验方法及其比较:项目阈值(难度)参数的组间差异性检验被引量：3