贝叶斯题组随机效应模型的必要性及影响因素被引量：16

When Should We Use Testlet Model? A Comparison Study of Bayesian Testlet Random-Effects Model and Standard 2-PL Bayesian Model

下载PDF

导出

摘要题组模型可以解决传统IRT模型由于题目间局部独立性假设违背时所导致的参数估计偏差。为探讨题组随机效应模型的适用范围,采用MonteCarlo模拟研究,分别使用2-PL贝叶斯题组随机效应模型(BTRM)和2-PL贝叶斯模型(BM)对数据进行拟合,考虑了题组效应、题组长度、题目数量和局部独立题目比例的影响。结果显示:(1)BTRM不受题组效应和题组长度影响,BM对参数估计的误差随题组效应和题组长度增加而增加。(2)BTRM具有一定的普遍性,且当题组效应大,题组长,题目数量大时使用该模型能减少估计误差,但是当题目数量较小时,两个模型得到的能力估计误差都较大。(3)当局部独立题目的比例较大时,两种模型得到的参数估计差异不大。 A testlet is comprised of a group of multiple choice items based on a common stimuli. When a testlet is used, the traditional item response models may not be appropriate due to the violation of the assumption of local independence （LI）. A variety of new models have been proposed to analyze response data sets for testlets. Among them, the Bayesian random effects model proposed by Bradlow, Wainer and Wang （1999） is one of the most promising. However, in many situations it is not clear to practitioners whether the traditional IRT methods should still be used instead of a newly proposed testlet model. The objective of the current study is to investigate the effects of model selection in various situations. In simulation 1, simulated response data sets were generated under three simulation factors, which were： testlet variance （0, 0.5, 1, 2）; testlet size （2, 5, 10）; and test length （20, 40, 60）. For each simulation condition, the test structure was determined by fixing the number of examinees as I =2000, and the percentage of testlet items in a test as 50%. Under each condition, 30 replications were generated. Both two-parameter Bayesian testlet random effect model and standard two-parameter Bayesian model were fitted to every dataset using MCMC method. The computer program SCORIGHT was used to conduct all the analysis across different conditions. Two models were compared corresponding to seven criteria： bias, mean absolute error, root mean square error, correlation between estimated and true values, 95% posterior interval width, 95% coverage probability,These indexes were computed for all parameters separately Simulation 2 compared the two models under two factors： the proportion of independent items （1/3, 1/2, 2/3）; test length （20, 30, 40, 60）. The data generation, analyze process and criteria mimicked those of simulation 1. The results showed that：（1） The accuracy of the estimation of all parameters under 2-PL Bayesian testlet random-effect model remained stable with varying levels of testlet effect and testlet size. However, the estimate errors of all the parameters under 2-PL Bayesian model increased dramatically as the testlet effect and testlet size became larger. Besides, using Bayesian testlet random-effect model, the error for every parameter was always less than that for 2-PL Bayesian model. It was especially necessary to choose 2-PL Bayesian testlet random-effects model when testlet variance and testlet size were large. （2） Even though, the accuracy of estimation of item parameters in Bayesian testlet random-effect model wasn＇t affected by test length, the accuracy of ability parameter was. Moreover, as the test got shorter, the errors of all parameters under 2-PL Bayesian model increased dramatically. In all, under short test conditions, even if there was large testlet effect, Bayesian testlet random-effect model couldn＇t work well, meanwhile, if items were all independent, using Bayesian testlet random-effect model would result in much worse ability estimations than 2-PL Bayesian model. （3） When the proportion of independent items was large, and the test length was larger than 20 items, the estimations of two models didn＇t show significant differences. In conclusion, 2-PL Bayesian testlet random-effect model is more general. Using the more complex testlet model when items are all independent, will lead almost the same accuracy of the parameter estimations as using the 2-PL Bayesian model. It is better to choose 2-PL Bayesian testlet random-effect model when testlet variance, testlet size, and test length are large. However, when test length is short, even the Bayesian testlet random- effects model couldn＇t provide accurate estimations of parameters when local dependence happened. So it is important to make sure the test was comprised of enough items before applying a testlet model. We also give some suggestions for practitioners. In the test construction period, first it is better for items to be independent, if not, shorter testlets and larger proportion of independent items should be included. While in the test analysis period, local dependence should be detected first. If evidence shows that there is dependence structure, then an appropriate model should be chosen to avoid estimation errors.

作者刘玥刘红云

机构地区北京师范大学心理学院

出处《心理学报》 CSSCI CSCD 北大核心 2012年第2期263-275,共13页 Acta Psychologica Sinica

关键词题组 2-PL贝叶斯题组随机效应模型 2-PL贝叶斯模型 MCMC算法 testlet 2-PL Bayesian testlet random-effect model 2-PL Bayesian model MCMC method

分类号 B841 [哲学宗教—基础心理学]

引文网络
相关文献

参考文献14

1Bradlow, E. T., Wainer, H., & Wang, X. H. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153-168.
2Chen, C. T., & Wang, W. C. (2007). Effects of ignoring item interaction on item parameter estimation and detection of interacting items. Applied Psychological Measurement, 31(5), 388-411.
3DeMars, C. E. (2006). Application of the bi-factor multidimensional item response theory model to testlet-based tests. Journal of Educational Measurement, 43(2), 145-168.
4Gelman, A., & Rubin, D. B. (1993). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457-472.
5Lee, G., Dunbar, S. B., & Frisbie, D. A. (2001). The relative appropriateness of eight measurement models for analyzing scores from tests composed of testlets. Educational and Psychological Measurement, 61(6), 958-975.
6Li, Y. M., Bolt, D. M., & Fu, J. B. (2006). A comparison ofalternative models for testlets. Applied Psychological Measurement, 30(1), 3-21.
7Lord, F. M., Novick, M. R., & Birnbaum, A. (1968). Statistical theories of mental test scores. Reeding, MA: Addison-Wesley.
8Rijmen, F. (2009). Three multidimensional models for testlet-based tests: Formal relations and an empirical comparison. Tech. Rep. No. RR-09-37). Educational Testing Service.
9Wainer, H., Bradlow, E. T., & Wang, X. H. (2007). Testlet response theory and its applications. New York, NY: Cambridge University Press.
10Wainer, H., & Wang, X. H. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37(3), 203-220.

同被引文献140

1辛涛.项目反应理论研究的新进展[J].中国考试,2005(7):18-21. 被引量：25
2丁树良,罗芬,戴海琦,朱玮.多题多做测验模型及其应用[J].心理学报,2007,39(4):730-736. 被引量：2
3薛毅,陈立萍.统计建模与R软件[M].北京:清华大学出版社,2006.
4余嘉元.项目反应理论中若干模型的比较.心理学报,1990,:30-34.
5薛毅,陈立萍.(2006).统计罄摸与R软件.清华大学出版社.
6杨志明,张雷.(2003).冽评的概化理论及其应用.北京:教育科学出版社.
7Brennan, R. L. (2001). Generalizability theory. New York:Springer - Verlag.
8Briggs, D. C. , & Wilson, M. (2007). Generalizability in item re- sponse modeling. Journal of Educational Measurement, 44 (2) ,131 -155.
9Chien, Y. M. (2008). An investigation of testlet - based item re- sponse models with a random facets design in generalizability theory. Unpublished Doctoral dissertation,The University of I- owa.
10Eggen,T. J./-I. M. ,& Veldkamp,B. P. (2012). Psychometrics in practice at RCEC. University of Twente, Enschede, Nether- lands.

引证文献16

1魏丹,张丹慧,刘红云.基于多维题组反应模型的项目功能差异检验探究[J].心理科学,2020,43(1):206-214. 被引量：2
2胡小甜,张敏强,田文娜,梁淑仪,张楠楠,黄牧蕙.不同参数分布形态下GIRM方法和传统GT方法的对比研究[J].心理学探新,2013,33(3):246-251.
3肖祝祝,张敏强,王霞,熊思娉.题组测验中处理局部项目依赖性(LID)的模型发展[J].心理科学进展,2013,21(10):1893-1900. 被引量：2
4詹沛达,王文中,王立君.项目反应理论新进展之题组反应理论[J].心理科学进展,2013,21(12):2265-2280. 被引量：16
5胡小甜,张敏强,郭凯茵,黎光明.GIRM方法与传统GT方法的比较[J].统计与决策,2014,30(3):89-92.
6马洁,刘红云.高中英语阅读测验中题组模型的选择与应用[J].心理科学,2018,41(6):1374-1381. 被引量：1
7陈礼丹.题组效应影响等值结果的实证分析——以某省2013年英语听说口语考试为例[J].林区教学,2014,0(7):45-46. 被引量：1
8詹沛达,王文中,王立君,李晓敏.多维题组效应Rasch模型[J].心理学报,2014,46(8):1208-1222. 被引量：11
9田文娜,张敏强,胡小甜,梁淑仪,张楠楠,黄牧蕙.题组反应理论及其在中学英语考试中的应用研究[J].心理学探新,2014,34(5):441-445. 被引量：2
10吴瑞林,卫静远.中文篇章测验的题组效应分析[J].中国考试,2014(12):42-50. 被引量：3

二级引证文献44

1林琳.中职学生英语快速阅读能力制约因素与对策[J].校园英语,2020(31):85-86.
2魏丹,张丹慧,刘红云.基于多维题组反应模型的项目功能差异检验探究[J].心理科学,2020,43(1):206-214. 被引量：2
3马洁,刘红云.高中英语阅读测验中题组模型的选择与应用[J].心理科学,2018,41(6):1374-1381. 被引量：1
4詹沛达,王文中,王立君,李晓敏.多维题组效应Rasch模型[J].心理学报,2014,46(8):1208-1222. 被引量：11
5陈飞鹏,詹沛达,王立君,陈春晓,蔡毛.高阶项目反应模型的发展与应用[J].心理科学进展,2015,23(1):150-157. 被引量：4
6吴瑞林,卫静远.中文篇章测验的题组效应分析[J].中国考试,2014(12):42-50. 被引量：3
7詹沛达,李晓敏,王文中,边玉芳,王立君.多维题组效应认知诊断模型[J].心理学报,2015,47(5):689-701. 被引量：7
8陈勃,邓稳根,李慧琦.等级反应题组模型在《基于实际操作的老年人日常问题解决能力测验》中的应用[J].心理与行为研究,2015,13(6):832-838. 被引量：1
9张华华,汪文义.“互联网+”测评:自适应学习之路[J].江西师范大学学报（自然科学版）,2016,40(5):441-455. 被引量：20
10陈凯强.陆航飞行员胜任特征调查问卷的初步编制[J].人民军医,2017,60(1):9-11. 被引量：3

1涂冬波,蔡艳,漆书青,丁树良,戴海琦.项目反应理论新进展——题组模型及其参数估计的实现[J].心理科学,2009,32(6):1433-1435. 被引量：8
2周宗奎,刘丽中,田媛,牛更枫.青少年气质性乐观与心理健康的元分析[J].心理与行为研究,2015,13(5):655-663. 被引量：15
3吴鹏,刘华山.道德推理与道德行为关系的元分析[J].心理学报,2014,46(8):1192-1207. 被引量：31
4张淑华,刘兆延.组织认同与离职意向关系的元分析[J].心理学报,2016,48(12):1561-1573. 被引量：34
5吴锐,丁树良,甘登文.含题组的测验等值[J].心理学报,2010,42(3):434-442. 被引量：5
6王拥军,俞国良.Hunter-Schmidt元分析范式:特征和应用[J].心理科学,2010,33(2):406-408. 被引量：3
7张立英,刘新文.从三段论扩充到命题逻辑[J].西南大学学报（社会科学版）,2013,39(1):27-31.
8赵娜,郑昱,王二平.面板数据分析的发展和应用[J].人类工效学,2012,18(1):87-90. 被引量：3
9詹沛达,王文中,王立君.项目反应理论新进展之题组反应理论[J].心理科学进展,2013,21(12):2265-2280. 被引量：16
10涂冬波,漆书青,蔡艳,戴海琦,丁树良.IRT模型参数估计的新方法--MCMC算法[J].心理科学,2008,31(1):177-180. 被引量：18

心理学报

2012年第2期

浏览历史

内容加载中请稍等...

贝叶斯题组随机效应模型的必要性及影响因素被引量：16

参考文献14

同被引文献140

引证文献16

二级引证文献44

相关作者

相关机构

相关主题

浏览历史

贝叶斯题组随机效应模型的必要性及影响因素 被引量：16

参考文献14

同被引文献140

引证文献16

二级引证文献44

相关作者

相关机构

相关主题

浏览历史

贝叶斯题组随机效应模型的必要性及影响因素被引量：16