两种新的计算机化自适应测验在线标定方法被引量：7

Two new online calibration methods for computerized adaptive testing

下载PDF

导出

摘要在线标定技术由于具有诸多优点而被广泛应用于计算机化自适应测验(CAT)的新题标定。Method A是想法最直接、算法最简单的CAT在线标定方法,但它具有明显的理论缺陷——在标定过程中将能力估计值视为能力真值。将全功能极大似然估计方法(FFMLE)与"利用充分性结果"估计方法(ECSE)的误差校正思路融入Method A(新方法分别记为FFMLE-Method A和ECSE-Method A),从理论上对能力估计误差进行校正,进而克服Method A的标定缺陷。模拟研究的结果表明:(1)在大多数实验条件下,两种新方法较Method A总体上可以改进标定精度,且在测验长度为10的短测验上的改进幅度最大;(2)当CAT测验长度较短或中等(10或20题)时,两种新方法的表现与性能最优的MEM已非常接近。当测验长度较长(30题)时,ECSE-Method A的总体表现最好、优于MEM;(3)样本量越大,各种方法的标定精度越高。 With the development of computerized adaptive testing （CAT）, many new issues and challenges have been raised. For example, as the test is continuously administered, some new items should be written, calibrated, and added to the item bank periodically to replace the flawed, obsolete, and overexposed items. The new items have to be precisely calibrated because the calibration precision will directly affect the accuracy of ability estimation. The technique of online calibration has been widely used to calibrate new items on-the-fly in CAT, since it offers several advantages over the traditional offline calibration approach. As the simplest and most straightforward online calibration method, Method A （Stocking, 1988） has an obvious theoretical limitation in that it treats the estimated abilities as true values and ignores the measurement errors in ability estimation. To overcome this weakness, we combined a full functional maximum likelihood estimator （FFMLE） and an estimator which made use of the consequences of sufficiency （ECSE）（Stefanski ＆ Carroll, 1985） with Method A respectively to correct for the estimation error of ability, and the new methods are referred to as FFMLE-Method A and ECSE-Method A. A simulation study was conducted to compare the two new methods with three other methods： the original Method A [denoted as Method A （Original）], the original Method A which plugs in the true abilities of examinees [Method A （True）], and the “multiple EM cycles” method （MEM）. These five methods were evaluated in terms of item-parameter recovery and calibration efficiency under three levels of sample sizes （1000, 2000 and 3000） and three levels of CAT test lengths （10, 20 and 30）, assuming the new items are randomly assigned to examinees. Under the two-parameter logistic model, the true abilities for the three groups of examinees were randomly drawn from the standard normal distribution [N （0,1）]. For all conditions, 1000 operational items were simulated to constitute the CAT item bank in which the item parameter vector were randomly generated from a multivariate normal distribution MVN （u, S） following the procedures of Chen and Xin （2014）. Furthermore, the process of simulating and calibrating new items were replicated 100 times, and 20 new items were generated and the simulation method was the same as that of the operational items. Maximum Fisher Information method was employed to select the following items, and EAP method combined with MLE method was used to estimate the examinees’ abilities. Fixed-length rule was utilized to stop the CAT test. The results showed that the two new approaches, FFMLE-Method A and ECSE-Method A, improved the calibration precision over the Method A （Original） in almost all conditions, and the magnitude of improvement reached maximum when the test length was small （e.g., 10）. Furthermore, the performance of the two new methods was very close to that of the best-performing MEM for small and medium-sized test length （i.e., 10 and 20）, whereas ECSE-Method A had the best performance among all methods when the test length was relatively longer （i.e., 30）. Also, larger sample size resulted in more precise item-parameter recovery for all online calibration methods. Though the simulation results are very promising, several future directions for research, such as variable-length CAT and more complex CAT conditions, merit investigation （e.g., including item exposure control, content balancing and allowing item review, etc.）.

作者陈平

机构地区北京师范大学中国基础教育质量监测协同创新中心

出处《心理学报》 CSSCI CSCD 北大核心 2016年第9期1184-1198,共15页 Acta Psychologica Sinica

基金国家自然科学基金青年基金项目(31300862) 高等学校博士学科点专项科研基金项目新教师类(20130003120002) 东北师范大学应用统计教育部重点实验室开放课题(KLAS 130028614)资助

关键词全功能极大似然估计计算机化自适应测验项目反应理论在线标定题库建设 full functional maximum likelihood estimator computerized adaptive testing item response theory online calibration construction of item bank

分类号 B841 [哲学宗教—基础心理学]

引文网络
相关文献

参考文献6

1游晓锋,丁树良,刘红云.计算机化自适应测验中原始题项目参数的估计[J].心理学报,2010,42(7):813-820. 被引量：14
2陈平,张佳慧,辛涛.在线标定技术在计算机化自适应测验中的应用[J].心理科学进展,2013,21(10):1883-1892. 被引量：9
3陈平,辛涛.认知诊断计算机化自适应测验中的项目增补[J].心理学报,2011,43(7):836-850. 被引量：27
4汪文义,丁树良,游晓锋.计算机化自适应诊断测验中原始题的属性标定[J].心理学报,2011,43(8):964-976. 被引量：32
5田建全,苗丹民,杨业兵,何宁,肖玮.应征公民计算机自适应化拼图测验的编制[J].心理学报,2009,41(2):167-174. 被引量：7
6陈平,辛涛.认知诊断计算机化自适应测验中在线标定方法的开发[J].心理学报,2011,43(6):710-724. 被引量：28

二级参考文献139

1陈平,丁树良,林海菁,周婕.等级反应模型下计算机化自适应测验选题策略[J].心理学报,2006,38(3):461-467. 被引量：38
2Alley, W. E. (2004). Experimental Test Development for the AFOQT Form "S" ( pp. 23 - 27 ). Draft Report. Operational Technologies Corporation ( OpTech), San Antonio, TX.
3Bergstrom, B.A., &Lunz, M. E. (1999). CAT for certification and licensure. In F. Drasgow & J. Olson-Buchanan (Eds.), Innovations in computerized assessment (pp. 67 -91 ). Mahwah NJ : Lawrence Erlbaum Associates, Inc.
4Bock, R. D. , & Aitkin, M. ( 1981 ). Marginal maximum likelihood estimation of item parameters : Application of an EM algorithm. Psychometric,46,433 - 459.
5Chang, H. , &Ying, Z. (1999). A-stratified multistage computerized adaptive testing. Applied Measurement in Education, 23,211 -222.
6Hau, K. -T. , &Chang, H. H. (2001). Item selection in computerized adaptive testing: Should more discriminating items be used lust?. Journal of Educational Measurement,38,249 -266
7Hetter, R. D. , &Sympson, J. B. (1997). Item exposure control in CAT-ASVAB. In William Sands, Brian K. Waters, and James R. McBride ( Eds. ), Computerized adaptive testing-from inquiry to operation (pp. 141 - 144). Washington, D.C.
8Howard W. (1990). Computerized adaptive testing : A primer ( pp. 17 - 102 ) Hillsdale, NJ: Lawrence Erlbaum Associates.
9Luecht, R. M. (1998). A framework for exploring and controlling risks associated with test item exposure over time. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.
10Luo Zhengxue. (2004). A Constructive Analysis and Predictive Study of Soldiers Job Performance ( pp. 42 - 63 ). Unpublished doctoral dissertation. Fourth Military Medical University.

共引文献62

1唐军华,武圣君,刘旭峰,杨业兵,田建全,汤晶晶,贺超,苗丹民.分裂型人格倾向与智力的关系研究[J].中华行为医学与脑科学杂志,2010,19(2):130-132. 被引量：3
2程小扬,丁树良,严深海,朱隆尹.引入曝光因子的计算机化自适应测验选题策略[J].心理学报,2011,43(2):203-212. 被引量：35
3程小扬,丁树良.子题库题量不平衡的按a分层选题策略[J].江西师范大学学报（自然科学版）,2011,35(1):5-9. 被引量：10
4陈平,辛涛.认知诊断计算机化自适应测验中在线标定方法的开发[J].心理学报,2011,43(6):710-724. 被引量：28
5雷辉,戴晓阳.计算机自适应测验方式在艾森克人格问卷中的应用[J].中国临床心理学杂志,2011,19(3):306-308. 被引量：1
6陈平,辛涛.认知诊断计算机化自适应测验中的项目增补[J].心理学报,2011,43(7):836-850. 被引量：27
7汪文义,丁树良,游晓锋.计算机化自适应诊断测验中原始题的属性标定[J].心理学报,2011,43(8):964-976. 被引量：32
8汪文义,丁树良.题库结构对原始题在线属性标定准确性之影响研究[J].心理科学,2012,35(2):452-456. 被引量：5
9唐小娟,丁树良,俞宗火.计算机化自适应测验在认知诊断中的应用[J].心理科学进展,2012,20(4):616-626. 被引量：14
10辛涛,乐美玲,张佳慧.教育测量理论新进展及发展趋势[J].中国考试,2012(5):3-11. 被引量：34

同被引文献17

1康春花,辛涛.测验理论的新发展:多维项目反应理论[J].心理科学进展,2010,18(3):530-536. 被引量：35
2游晓锋,丁树良,刘红云.计算机化自适应测验中原始题项目参数的估计[J].心理学报,2010,42(7):813-820. 被引量：14
3陈青,丁树良,朱隆尹,许志勇.3参数等级反应模型及其参数估计[J].江西师范大学学报（自然科学版）,2010,34(2):117-122. 被引量：11
4陈平,辛涛.认知诊断计算机化自适应测验中在线标定方法的开发[J].心理学报,2011,43(6):710-724. 被引量：28
5陈平,辛涛.认知诊断计算机化自适应测验中的项目增补[J].心理学报,2011,43(7):836-850. 被引量：27
6汪文义,丁树良,游晓锋.计算机化自适应诊断测验中原始题的属性标定[J].心理学报,2011,43(8):964-976. 被引量：32
7陈平,张佳慧,辛涛.在线标定技术在计算机化自适应测验中的应用[J].心理科学进展,2013,21(10):1883-1892. 被引量：9
8杜文久,周娟,李洪波.二参数逻辑斯蒂模型项目参数的估计精度[J].心理学报,2013,45(10):1179-1186. 被引量：11
9郭磊,郑蝉金,边玉芳.变长CD-CAT中的曝光控制与终止规则[J].心理学报,2015,47(1):129-140. 被引量：16
10熊建华,罗慧,王晓庆,丁树良.基于GRM的在线校准研究[J].江西师范大学学报（自然科学版）,2018,42(1):62-66. 被引量：3

引证文献7

1熊建华,罗慧,王晓庆,丁树良.基于GRM的在线校准研究[J].江西师范大学学报（自然科学版）,2018,42(1):62-66. 被引量：3
2张雪琴,毛秀珍,李佳.基于CAT的在线标定:设计与方法[J].心理科学进展,2020,28(11):1970-1978. 被引量：1
3任赫,陈平.两种新的多维计算机化分类测验终止规则[J].心理学报,2021,53(9):1044-1058. 被引量：1
4谭青蓉,汪大勋,罗芬,蔡艳,涂冬波.一种高效的CD-CAT在线标定新方法:基于熵的信息增益与EM视角[J].心理学报,2021,53(11):1286-1298. 被引量：1
5任赫,黄颖诗,陈平.计算机化分类测验终止规则的类别、特点及应用[J].心理科学进展,2022,30(5):1168-1182.
6谭青蓉,蔡艳,汪大勋,罗芬,涂冬波.CD-CAT中基于SCAD惩罚和EM视角的在线标定方法开发——G-DINA模型[J].心理学报,2024,56(5):670-688.
7杨森,何引红,祁媛媛.基于D-最优和A-最优设计的多维在线标定设计研究[J].应用数学进展,2023,12(1):81-95.

二级引证文献6

1张雪琴,毛秀珍,李佳.基于CAT的在线标定:设计与方法[J].心理科学进展,2020,28(11):1970-1978. 被引量：1
2任赫,黄颖诗,陈平.计算机化分类测验终止规则的类别、特点及应用[J].心理科学进展,2022,30(5):1168-1182.
3童昊,喻晓锋,秦春影,彭亚风,钟小缘.多级计分测验中基于残差统计量的被试拟合研究[J].心理学报,2022,54(9):1126-1140.
4钟小缘,喻晓锋,苗莹,秦春影,彭亚风,童昊.基于作答时间数据的改变点分析在检测加速作答中的探索——已知和未知项目参数[J].心理学报,2022,54(10):1277-1292. 被引量：1
5谭青蓉,蔡艳,汪大勋,罗芬,涂冬波.CD-CAT中基于SCAD惩罚和EM视角的在线标定方法开发——G-DINA模型[J].心理学报,2024,56(5):670-688.
6杨森,何引红,祁媛媛.基于D-最优和A-最优设计的多维在线标定设计研究[J].应用数学进展,2023,12(1):81-95.

1陈平,张佳慧,辛涛.在线标定技术在计算机化自适应测验中的应用[J].心理科学进展,2013,21(10):1883-1892. 被引量：9
2陈平,辛涛.认知诊断计算机化自适应测验中在线标定方法的开发[J].心理学报,2011,43(6):710-724. 被引量：28
3唐小娟,丁树良,俞宗火.计算机化自适应测验在认知诊断中的应用[J].心理科学进展,2012,20(4):616-626. 被引量：14
4李银河.快乐与内疚[J].喜剧世界（下）,2016,0(8):1-1.
5李银河.快乐与内疚[J].国学（吉林）,2016,0(6):72-72.
6汪文义,宋丽红,丁树良.基于探索性因素分析的Q矩阵标定方法[J].江西师范大学学报（自然科学版）,2015,39(2):138-144. 被引量：12
7宋丽红,汪文义,丁树良.测验Q矩阵的修正方法及其比较研究[J].江西师范大学学报（自然科学版）,2015,39(6):623-630. 被引量：3
8高旭亮,涂冬波,王芳,张龙,李雪莹.可修改答案的计算机化自适应测验的方法[J].心理科学进展,2016,24(4):654-664. 被引量：2
9罗芬,丁树良,王晓庆.多级评分计算机化自适应测验动态综合选题策略[J].心理学报,2012,44(3):400-412. 被引量：13
10陈平,辛涛.认知诊断计算机化自适应测验中的项目增补[J].心理学报,2011,43(7):836-850. 被引量：27

心理学报

2016年第9期

浏览历史

内容加载中请稍等...

两种新的计算机化自适应测验在线标定方法被引量：7

参考文献6

二级参考文献139

共引文献62

同被引文献17

引证文献7

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

两种新的计算机化自适应测验在线标定方法 被引量：7

参考文献6

二级参考文献139

共引文献62

同被引文献17

引证文献7

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

两种新的计算机化自适应测验在线标定方法被引量：7