多维题组反应模型:多维随机系数多项Logistic模型的应用拓展被引量：4

Multidimentional Rasch Testlet Model: An Extension and Generalization of MRCMLM

下载PDF

导出

摘要本文将多维随机系数多项Logistic模型(MRCMLM)拓展应用到多维题组领域,得到适用于多维目标能力和多维题组效应的多维题组反应模型(MTRM),该模型具有高度灵活性和适用性。本文通过两个模拟研究和一个应用研究探究MTRM参数估计精度和模型适用性,以及与two-tier模型的差异,结果发现:(1)能力维度间相关和项目评分等级是影响模型参数估计的重要因素;(2)MTRM对项目参数估计准确性和稳定性高于two-tier模型,对题组效应大小估计更为准确。(3)MTRM在考虑项目内多维题组情况下模型拟合度更高,为测验分析提供了更广泛的模型结构选择,具有显著的应用价值。 Testlets have been widely used in educational assessment. It has been shown that ignoring testlet effects when analyzing response data often results in inaccurate estimates of reliability coefficients and latent trait standard errors, increased bias of item parameter estimates, inaccurate test equating, and failure to detect DIF. As such, there is increasing interest among researchers in using testlet models instead of standard item response models. Different types of testlet models have been proposed to partial out the influence of testlet factors from the estimation of latent proficiency. However, most of the previous models target testlet effects for which 1） only one latent trait is measured, and 2） each item belongs to only one testlet（between-item multidimensional）. As an alternative, the two-tier model can be used to deal with multidimensional latent traits. However, the two-tier model is usually used within the framework of confirmatory factor analysis. This research extends the multidimensional random coefficients multinomial logistic model（MRCMLM） to the multidimensional testlet response model（MTRM）, with the aim to take within-item multidimensional testlets and multiple ability into the consideration under IRT framework. With different model constraints, the MTRM can be used to model a variety of multidimensional test structures. Two studies based on simulated data and one empirical study based on a large-scale math assessment data are discussed. In simulation study 1, we considered different correlations among trait dimensions. We compared the MRCMLM which ignores the testlet effects with the MTRM in terms of the accuracy of estimation. In simulation study 2, the MTRM was compared to a two-tier model for polytomous data in terms of item and person parameter estimation accuracy. In the third study which analyzed real large-scale math test results, three-dimensional proficiencies in math were modeled and estimated. In total, seven testlets were identified. Some items were loaded on more than one testlets, indicating within-item multidimensional testlet effects. Model fit and estimation of three different models（MRCMLM, MTRM-1 with only uncrossed testlets considered, and MTRM-2 with all the seven testlets considered） were compared. All the analysis was conducted in Con Quest, using Monte Carlo estimation. Estimation accuracy in simulation studies was evaluated using bias, RMSE, and correlation coefficients between the true and estimated values. Results of simulation 1 indicated that the MTRM produced more accurate estimated item difficulties for items within testlets than the MRCMLM, while both models reached accurate results for independent items. It was also discovered that the recovery of item difficulties in the MTRM was less influenced by the correlations among the latent traits. In addition, as the correlation coefficients between abilities decreased, the ability and item difficulty estimates were more biased if testlet effects were not modeled. As discovered in simulation 2,both the MTRM and the two-tier model accurately estimated item and person parameters. When testlets effects were present, estimates of both item and person parameters in the MTRM were more stable than two-tier model, indicating that the MTRM is not influenced by complex test structures or extreme responses patterns. Results of the empirical data analysis showed that the MTRM with all seven testlets considered fits the data the best. The application of the MTRM reduces incorrect estimation of the reliability and standard error for each primary trait, even for moderate testlet effects and high correlations between ability dimensions. The present study proposes the multidimensional testlet model, supplementing previous testlet models by taking both within-item multidimensional testlets and multiple abilities into account. A new integrated model, the MTRM, was developed based on MRCMLM. This model can be applied to a variety of educational tests where complex testlets are embedded and multidimensional proficiencies are estimated, through identifying an appropriate ability-judge（score） matrix and testlet-judge（design） matrix. A promising attribute of this model is that the parameter estimation is easily achieved through using the software Con Quest. We suggest that in many assessment contexts, ignoring testlets effects can add ambiguity to the interpretation of test scores, thus data should be appropriately fitted to testlet models.

作者魏丹刘红云张丹慧

机构地区北京师范大学中国基础教育质量监测协同创新中心北京师范大学心理学院

出处《心理学报》 CSSCI CSCD 北大核心 2017年第12期1604-1614,共11页 Acta Psychologica Sinica

基金全国教育科学"十二五"规划2013年度教育部青年课题(EBA130370)资助

关键词多维目标能力多维题组 two-tier模型 MRCMLM 估计精度 multidimensional ability multidimensional testlet two-tier model MRCMLM estimated accuracy

分类号 B841 [哲学宗教—基础心理学]

引文网络
相关文献

参考文献2

1詹沛达,王文中,王立君.项目反应理论新进展之题组反应理论[J].心理科学进展,2013,21(12):2265-2280. 被引量：16
2刘玥,刘红云.贝叶斯题组随机效应模型的必要性及影响因素[J].心理学报,2012,44(2):263-275. 被引量：16

二级参考文献25

1Bradlow, E. T., Wainer, H., & Wang, X. H. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153-168.
2Chen, C. T., & Wang, W. C. (2007). Effects of ignoring item interaction on item parameter estimation and detection of interacting items. Applied Psychological Measurement, 31(5), 388-411.
3DeMars, C. E. (2006). Application of the bi-factor multidimensional item response theory model to testlet-based tests. Journal of Educational Measurement, 43(2), 145-168.
4Gelman, A., & Rubin, D. B. (1993). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457-472.
5Lee, G., Dunbar, S. B., & Frisbie, D. A. (2001). The relative appropriateness of eight measurement models for analyzing scores from tests composed of testlets. Educational and Psychological Measurement, 61(6), 958-975.
6Li, Y. M., Bolt, D. M., & Fu, J. B. (2006). A comparison ofalternative models for testlets. Applied Psychological Measurement, 30(1), 3-21.
7Lord, F. M., Novick, M. R., & Birnbaum, A. (1968). Statistical theories of mental test scores. Reeding, MA: Addison-Wesley.
8Rijmen, F. (2009). Three multidimensional models for testlet-based tests: Formal relations and an empirical comparison. Tech. Rep. No. RR-09-37). Educational Testing Service.
9Wainer, H., Bradlow, E. T., & Wang, X. H. (2007). Testlet response theory and its applications. New York, NY: Cambridge University Press.
10Wainer, H., & Wang, X. H. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37(3), 203-220.

共引文献27

1魏丹,张丹慧,刘红云.基于多维题组反应模型的项目功能差异检验探究[J].心理科学,2020,43(1):206-214. 被引量：3
2胡小甜,张敏强,田文娜,梁淑仪,张楠楠,黄牧蕙.不同参数分布形态下GIRM方法和传统GT方法的对比研究[J].心理学探新,2013,33(3):246-251.
3肖祝祝,张敏强,王霞,熊思娉.题组测验中处理局部项目依赖性(LID)的模型发展[J].心理科学进展,2013,21(10):1893-1900. 被引量：2
4詹沛达,王文中,王立君.项目反应理论新进展之题组反应理论[J].心理科学进展,2013,21(12):2265-2280. 被引量：16
5胡小甜,张敏强,郭凯茵,黎光明.GIRM方法与传统GT方法的比较[J].统计与决策,2014,30(3):89-92.
6马洁,刘红云.高中英语阅读测验中题组模型的选择与应用[J].心理科学,2018,41(6):1374-1381. 被引量：1
7陈礼丹.题组效应影响等值结果的实证分析——以某省2013年英语听说口语考试为例[J].林区教学,2014,0(7):45-46. 被引量：1
8詹沛达,王文中,王立君,李晓敏.多维题组效应Rasch模型[J].心理学报,2014,46(8):1208-1222. 被引量：11
9田文娜,张敏强,胡小甜,梁淑仪,张楠楠,黄牧蕙.题组反应理论及其在中学英语考试中的应用研究[J].心理学探新,2014,34(5):441-445. 被引量：2
10陈飞鹏,詹沛达,王立君,陈春晓,蔡毛.高阶项目反应模型的发展与应用[J].心理科学进展,2015,23(1):150-157. 被引量：4

同被引文献25

1张厚粲,王晓平.瑞文标准推理测验在我国的修订[J].心理学报,1989,21(2):113-121. 被引量：183
2涂冬波,蔡艳,戴海琦,丁树良.一种多级评分的认知诊断模型:P-DINA模型的开发[J].心理学报,2010,42(10):1011-1020. 被引量：55
3郑蝉金,郭聪颖,边玉芳.变通的题组项目功能差异检验方法在篇章阅读测验中的应用[J].心理学报,2011,43(7):830-835. 被引量：13
4周莉,耿岳,王佶旻.题组DIF检验方法在HSK(高等)阅读理解中的应用[J].考试研究,2012,8(1):67-78. 被引量：2
5刘玥,刘红云.贝叶斯题组随机效应模型的必要性及影响因素[J].心理学报,2012,44(2):263-275. 被引量：16
6郭聪颖,边玉芳.题组项目功能差异(DIF)检验方法的应用探索[J].心理学探新,2013,33(5):423-429. 被引量：3
7詹沛达,王文中,王立君.项目反应理论新进展之题组反应理论[J].心理科学进展,2013,21(12):2265-2280. 被引量：16
8廖虹宇,王立君.两种DIF检测方法的模拟研究[J].中国考试,2014(5):12-17. 被引量：2
9詹沛达,王文中,王立君,李晓敏.多维题组效应Rasch模型[J].心理学报,2014,46(8):1208-1222. 被引量：11
10陈飞鹏,詹沛达,王立君,陈春晓,蔡毛.高阶项目反应模型的发展与应用[J].心理科学进展,2015,23(1):150-157. 被引量：4

引证文献4

1魏丹,张丹慧,刘红云.基于多维题组反应模型的项目功能差异检验探究[J].心理科学,2020,43(1):206-214. 被引量：3
2郭小军,罗照盛,严娟.项目间多维测验作答时间数据分析:潜在特质速度间效应建模[J].心理科学,2022,45(5):1222-1229. 被引量：2
3周文杰,童望望,郭磊.多级评分认知诊断题组模型[J].应用心理学,2023,29(5):470-479.
4郭小军,柏小云,罗照盛.作答时间与反应依赖关系建模:基于双因子模型视角[J].心理学报,2024,56(3):352-362.

二级引证文献5

1孙小坚,刘彦楼,王诗梦,辛涛,宋乃庆,周蔓.认知诊断测验中基于信息矩阵的多群组DIF检验[J].心理科学,2022,45(3):710-717.
2吴琼琼,赵悦,刘彦楼.方差—协方差矩阵在认知诊断中的作用[J].心理学探新,2023,43(3):262-268.
3杨志明,徐庆树.基于项目作答反应时间的作弊甄别研究进展[J].心理学探新,2023,43(3):278-288.
4郭小军,柏小云,罗照盛.作答时间与反应依赖关系建模:基于双因子模型视角[J].心理学报,2024,56(3):352-362.
5韩雨婷,袁克海,刘红云.无需先验信息的两步项目功能差异检验方法[J].心理科学,2024,47(3):734-743.

1聂弯,于法稳.新型城镇化背景下农民进城定居选择行为研究——基于多项Logistic模型的实证研究[J].宜宾学院学报,2017,17(9):9-15. 被引量：2
2赵婷,郭昊.PBL教学法在妊娠期糖尿病疾病临床实习教学中的应用研究[J].糖尿病新世界,2017,20(12):21-22. 被引量：3
3牟俊霖,王阳.财政政策、货币政策对发达国家就业与经济增长的影响研究——基于随机系数的面板向量自回归模型的估计[J].金融教育研究,2017,30(4):3-14. 被引量：2
4刘洪涛.我国上市公司投资非效率的动态识别与预警——基于Two-tier Stochastic frontier的视角[J].邵阳学院学报（自然科学版）,2017,14(5):100-108.
5陈瑜,曾丽娟,杨文娇.本科护生对精神病患者歧视现状及其与精神病接触程度的相关性分析[J].护理学报,2017,24(21):34-37. 被引量：7
6刘光宪,祝水兰,周巾英,熊慧薇,幸胜平,付晓记,冯健雄.表儿茶素在模拟生理条件下对人血清蛋白糖基化反应的影响[J].食品工业科技,2017,38(24):65-68.
7陈宜平.前列腺癌筛查是否重要?[J].心血管病防治知识,2017,0(12):59-61.
8汪磊.如何培养小学生的阅读兴趣[J].科教导刊（电子版）,2017,0(33):137-137.
9林焘宇,肖作鹏.深圳小学生上学交通特征及方式选择影响因素[J].交通与运输,2017,33(A02):90-94. 被引量：5
10李艾丹,李春梅,杨思维.科研人员信用评价指标体系研究[J].中国科技论坛,2017(12):123-130. 被引量：17

心理学报

2017年第12期

浏览历史

内容加载中请稍等...

多维题组反应模型:多维随机系数多项Logistic模型的应用拓展被引量：4

参考文献2

二级参考文献25

共引文献27

同被引文献25

引证文献4

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

多维题组反应模型:多维随机系数多项Logistic模型的应用拓展 被引量：4

参考文献2

二级参考文献25

共引文献27

同被引文献25

引证文献4

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

多维题组反应模型:多维随机系数多项Logistic模型的应用拓展被引量：4