期刊文献+

两种新的计算机化自适应测验在线标定方法 被引量:7

Two new online calibration methods for computerized adaptive testing
下载PDF
导出
摘要 在线标定技术由于具有诸多优点而被广泛应用于计算机化自适应测验(CAT)的新题标定。Method A是想法最直接、算法最简单的CAT在线标定方法,但它具有明显的理论缺陷——在标定过程中将能力估计值视为能力真值。将全功能极大似然估计方法(FFMLE)与"利用充分性结果"估计方法(ECSE)的误差校正思路融入Method A(新方法分别记为FFMLE-Method A和ECSE-Method A),从理论上对能力估计误差进行校正,进而克服Method A的标定缺陷。模拟研究的结果表明:(1)在大多数实验条件下,两种新方法较Method A总体上可以改进标定精度,且在测验长度为10的短测验上的改进幅度最大;(2)当CAT测验长度较短或中等(10或20题)时,两种新方法的表现与性能最优的MEM已非常接近。当测验长度较长(30题)时,ECSE-Method A的总体表现最好、优于MEM;(3)样本量越大,各种方法的标定精度越高。 With the development of computerized adaptive testing (CAT), many new issues and challenges have been raised. For example, as the test is continuously administered, some new items should be written, calibrated, and added to the item bank periodically to replace the flawed, obsolete, and overexposed items. The new items have to be precisely calibrated because the calibration precision will directly affect the accuracy of ability estimation. The technique of online calibration has been widely used to calibrate new items on-the-fly in CAT, since it offers several advantages over the traditional offline calibration approach. As the simplest and most straightforward online calibration method, Method A (Stocking, 1988) has an obvious theoretical limitation in that it treats the estimated abilities as true values and ignores the measurement errors in ability estimation. To overcome this weakness, we combined a full functional maximum likelihood estimator (FFMLE) and an estimator which made use of the consequences of sufficiency (ECSE) (Stefanski & Carroll, 1985) with Method A respectively to correct for the estimation error of ability, and the new methods are referred to as FFMLE-Method A and ECSE-Method A. A simulation study was conducted to compare the two new methods with three other methods: the original Method A [denoted as Method A (Original)], the original Method A which plugs in the true abilities of examinees [Method A (True)], and the “multiple EM cycles” method (MEM). These five methods were evaluated in terms of item-parameter recovery and calibration efficiency under three levels of sample sizes (1000, 2000 and 3000) and three levels of CAT test lengths (10, 20 and 30), assuming the new items are randomly assigned to examinees. Under the two-parameter logistic model, the true abilities for the three groups of examinees were randomly drawn from the standard normal distribution [N (0,1)]. For all conditions, 1000 operational items were simulated to constitute the CAT item bank in which the item parameter vector were randomly generated from a multivariate normal distribution MVN (u, S) following the procedures of Chen and Xin (2014). Furthermore, the process of simulating and calibrating new items were replicated 100 times, and 20 new items were generated and the simulation method was the same as that of the operational items. Maximum Fisher Information method was employed to select the following items, and EAP method combined with MLE method was used to estimate the examinees’ abilities. Fixed-length rule was utilized to stop the CAT test. The results showed that the two new approaches, FFMLE-Method A and ECSE-Method A, improved the calibration precision over the Method A (Original) in almost all conditions, and the magnitude of improvement reached maximum when the test length was small (e.g., 10). Furthermore, the performance of the two new methods was very close to that of the best-performing MEM for small and medium-sized test length (i.e., 10 and 20), whereas ECSE-Method A had the best performance among all methods when the test length was relatively longer (i.e., 30). Also, larger sample size resulted in more precise item-parameter recovery for all online calibration methods. Though the simulation results are very promising, several future directions for research, such as variable-length CAT and more complex CAT conditions, merit investigation (e.g., including item exposure control, content balancing and allowing item review, etc.).
作者 陈平
出处 《心理学报》 CSSCI CSCD 北大核心 2016年第9期1184-1198,共15页 Acta Psychologica Sinica
基金 国家自然科学基金青年基金项目(31300862) 高等学校博士学科点专项科研基金项目新教师类(20130003120002) 东北师范大学应用统计教育部重点实验室开放课题(KLAS 130028614)资助
关键词 全功能极大似然估计 计算机化自适应测验 项目反应理论 在线标定 题库建设 full functional maximum likelihood estimator computerized adaptive testing item response theory online calibration construction of item bank
  • 相关文献

参考文献6

二级参考文献139

  • 1陈平,丁树良,林海菁,周婕.等级反应模型下计算机化自适应测验选题策略[J].心理学报,2006,38(3):461-467. 被引量:38
  • 2Alley, W. E. (2004). Experimental Test Development for the AFOQT Form "S" ( pp. 23 - 27 ). Draft Report. Operational Technologies Corporation ( OpTech), San Antonio, TX.
  • 3Bergstrom, B.A., &Lunz, M. E. (1999). CAT for certification and licensure. In F. Drasgow & J. Olson-Buchanan (Eds.), Innovations in computerized assessment (pp. 67 -91 ). Mahwah NJ : Lawrence Erlbaum Associates, Inc.
  • 4Bock, R. D. , & Aitkin, M. ( 1981 ). Marginal maximum likelihood estimation of item parameters : Application of an EM algorithm. Psychometric,46,433 - 459.
  • 5Chang, H. , &Ying, Z. (1999). A-stratified multistage computerized adaptive testing. Applied Measurement in Education, 23,211 -222.
  • 6Hau, K. -T. , &Chang, H. H. (2001). Item selection in computerized adaptive testing: Should more discriminating items be used lust?. Journal of Educational Measurement,38,249 -266
  • 7Hetter, R. D. , &Sympson, J. B. (1997). Item exposure control in CAT-ASVAB. In William Sands, Brian K. Waters, and James R. McBride ( Eds. ), Computerized adaptive testing-from inquiry to operation (pp. 141 - 144). Washington, D.C.
  • 8Howard W. (1990). Computerized adaptive testing : A primer ( pp. 17 - 102 ) Hillsdale, NJ: Lawrence Erlbaum Associates.
  • 9Luecht, R. M. (1998). A framework for exploring and controlling risks associated with test item exposure over time. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.
  • 10Luo Zhengxue. (2004). A Constructive Analysis and Predictive Study of Soldiers Job Performance ( pp. 42 - 63 ). Unpublished doctoral dissertation. Fourth Military Medical University.

共引文献62

同被引文献17

引证文献7

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部