计算机化多阶段自适应测验研究述评被引量：3

Research Progress in Computerized Multistage Adaptive Testing

下载PDF

导出

摘要计算机化多阶段自适应测验是基于计算机技术的测验形式,它将题目集合作为测试单元,通过多阶段自适应的形式对被试进行测试和评分。近年来通过研究各种测验形式,发现其比计算机化自适应测验和纸笔测验突显出更大优势。与纸笔测验相比,其具有参数不变性、能力估计更精确等优势。与计算机化自适应测验相比,其具有可控制题目特性、被试可检查题目等优势。如何减小测量误差,使其应用更加便捷、有效,是未来研究的发展方向。 Computerized multistage adaptive testing （MST） is a kind of test format based on computerized technology, consisting of sets of items scored and administered as a unit. These sets of items are called modules or testlets. They are a number of short linear tests, which provide a certain percentage of test information to reduce the measurement errors. Items in a module may centre on one or several common stems, such as a paragraph and a diagram, or they may have no relevance with each other. In MST, adaptations occur at the items sets level, based on the cumulative performance of previous items, then the next module is selected. MST has fewer adaptations than item level computerized adaptive testing （CAT）, but more adaptations than conventional paper-and-pencil （P＆P） testing. It combines the components of conventional P＆P with the adaptive characteristic of CAT. And the advantage of these two test forms combined can overcome their individual disadvantages. Thus, there is no doubt that it is a compromise of the two tests forms How to build a MST is the first thing that test developers should consider. The number of stages, the modules in every stage, and the items in every module, all these must have been decided before the test has been built. Target statistics, and qualitative specification also should be considered before the test has been built. The ways of scoring, adapting and assembling the test are the components as vital as the ones listed earlier. After the test has been set up but before it is executed, the test developers can check the items for non-statistical properties, including content balance, ordering and the potential for context effects, cognitive level, item format, answer key position, word count, and any other characteristics of interest or concern in developing the modules. MST may assure the item response theory （IRT） assumptions of local independence and unidimensionality among modules. Items in one stem which violates local independence assumptions are treated as polytomous ones. Therefore, all modules should be allocated optimally. When subjects take the test, they can preview and review items in a module, and modify the false ones. Then, the subjects may operate the modules optimally. Both the test developers and subjects could operate the module optimally in order to obtain a better result in the exam. MST appeared to provide the opportunity to improve the quality of examinations. It has already been used in many large evaluation tests, such as the Uniform CPA Examination and the Graduate Record Examination （GRE）. Along with the study of various tests, we can find that compared with the conventional P＆P and CAT, MST is obviously superior. Compared with the conventional P＆P, its advantages are the parameter invariance, time saving, timely feedback, accurate estimation, and so on. Compared with the CAT, its advantages include the control of non-statistical properties and item exposure, the opportunity to check the items, etc. The direction of future research is how to minimize measurement errors in order to make the application of MST more convenient and effective.

作者王钰彤罗照盛王睿

机构地区江西师范大学心理学院

出处《心理科学》 CSSCI CSCD 北大核心 2015年第2期452-456,共5页 Journal of Psychological Science

关键词计算机化多阶段自适应测验纸笔测验计算机化自适应测验阶段模块 computerized multistage adaptive testing （MST）, paper-and-pencil test （P＆P）, computerized adaptive test （CAT）, stage, module

分类号 B841 [哲学宗教—基础心理学]

引文网络
相关文献

参考文献31

1关丹丹,刘庆思.计算机自适应序列考试概述[J].中国考试,2011(1):29-35. 被引量：9
2刘庆思,关丹丹.PETS-CAST的效度研究[J].中国考试,2013(9):3-10. 被引量：2
3Armstrong, R. D., Jones, D. H., Koppel, N. B., & Pashley, P. J. (2004). Computerized adaptive testing with multiple-form structures. Applled Psychological Measurement, 28, 147-164.
4Armstrong, R. D., Kung, M. T., & Roussos, L. A. (2010). Determining targets for multi-stage adaptive tests using integer programming. European Journal of Operational Research, 205, 709-718.
5Bimbaum, A. (1969). Statistical theory for logistic mental test models with a prier distribution of ability.Journal of MatbemaScalPsychology, 6, 258-276.
6Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psyehnmettika, 37, 29-51.
7Breithaupt, K., & Hare, D. R. (2007). Automated simultaneous assembly of multistage testlets for a high-stakes licensing examination. Education and Psycbologieal Measurement, 67, 5-20.
8Chuah, S. C., Drasgow, F., & Luecht, R. (2006). How big is big enough? Sample size requirements for CAST item parameter estimation. Applied Measurement in Educagon, 19, 241-255.
9Crotts, K., Sireci, G. S., & Zenisky, A. (2012). Evaluating the content validity of multistage-adaptive tests.Journal of Applied Testing Technology, 13, 1-26.
10Edwards, M. C., Flora, D. B., & Thissen, D. (2012). Multistage computerized adaptive testing with uniform item exposure. Applied Measurement in Educagon, 25, 118-141.

二级参考文献15

1Wainer, H. Introduction and history. In H.Wainer (ED.), Computer Adaptive Testing: A Primer. (pp.1-21). New Jersey: Lawrance Erlbaum. 1990.
2Luechl, R. M. & Nungester, R.J. Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 1998,35, 229-249.
3Luecht, R. M., & Nungester, R. J. Computer-adaptive Sequential Testing. In W. J. van der Linden and C. A. W. Glas (Ed.), Computerized Adaptive Testing: Theocy and Practices. (pp.117-128). Netherlands: Kluwer Academic Publishers. 2003.
4Luecht, R. M. Computer-assisted test assembly using optimization heuristics. Applied Psychological Measurement, 1998, 22, 224-236.
5Luecht, R. M., Brumfield, T. & Breithaupt, K. A Testlel Assembly Design for Adaptive Multislage Tests. Applied Measurement in Education, 2006,19(3), 189-202.
6NBME. Author The 1996 Step 2 Field Test Study of a Computerized System for USMLE. The National Board Examiner, 43 (4). Philadelphia, PA: National Board of Medical Examiners. 1996.
7NBME. Author. Summary of the 1997 USMLE Step 1 Computerized Fieht Test. The National Board Examiner, 44 (4). Philadelphia, PA: National Board of Medieal Examiners. 1997.
8Bougbtxm, K. A. & Gierl, M. J. Automated Test Assembly Prcedures for Criterion-Referenced Testing Using Optimization Heuristies. Paper Presented at the Annual Meeting of the American Educational Research Assoeiation (AERA), New Orleans, LA. 2000, April.
9Jodoin, M. G. , Zenisky, A.,&Hambleton, R. K. Comparison of the psychometric properties of several computer-based test designs for eredentialing exams With Multiple Purposes. Applied Measurement in Education, 2006,19(3), 203-220.
10Hambleton, R. K. & Xing, D. Optimal and Nonoptimal Computer-Based Test Designs for Making Pass Fail Decisions. Applied Measurement in Education, 2006,19(3), 221-239.

共引文献8

1关丹丹,刘庆思,莫春晖.PETS计算机自适应序列测试设计与模拟研究[J].心理学探新,2011,31(5):467-471. 被引量：4
2冯凯平,李凌.计算机自适应测试组卷方法策略与设计[J].科技广场,2012(5):45-48.
3关丹丹,刘庆思.两种PETS计算机自适应序列测试框架比较研究[J].中国考试,2013(1):16-22. 被引量：4
4薛东,冯超颖,冯凯平.一种确定试题区分度值的方法[J].计算机与现代化,2013(5):235-238. 被引量：2
5刘庆思,关丹丹.PETS-CAST的效度研究[J].中国考试,2013(9):3-10. 被引量：2
6詹沛达,高椿雷,边玉芳,罗照盛.使用题组反应模型缓解局部题目依赖性对多阶段测验的危害[J].心理科学,2017,40(1):216-223. 被引量：1
7简小珠,陈平.计算机化分类测验的特点与发展述评[J].考试研究,2020,16(6):77-89. 被引量：2
8简小珠,张敏强.基于IRT的计算机化适应性测验的概念、类型及特征[J].中国考试,2024(9):66-75.

同被引文献13

1关丹丹,刘庆思.计算机自适应序列考试概述[J].中国考试,2011(1):29-35. 被引量：9
2郑蝉金,郭聪颖,边玉芳.变通的题组项目功能差异检验方法在篇章阅读测验中的应用[J].心理学报,2011,43(7):830-835. 被引量：13
3关丹丹,刘庆思,莫春晖.PETS计算机自适应序列测试设计与模拟研究[J].心理学探新,2011,31(5):467-471. 被引量：4
4关丹丹,刘庆思.两种PETS计算机自适应序列测试框架比较研究[J].中国考试,2013(1):16-22. 被引量：4
5詹沛达,王文中,王立君.项目反应理论新进展之题组反应理论[J].心理科学进展,2013,21(12):2265-2280. 被引量：16
6詹沛达,王文中,王立君,李晓敏.多维题组效应Rasch模型[J].心理学报,2014,46(8):1208-1222. 被引量：11
7吴瑞林,卫静远.中文篇章测验的题组效应分析[J].中国考试,2014(12):42-50. 被引量：3
8杨志明.高考原始分合成:问题与改进思路[J].教育测量与评价（理论版）,2015(10):61-64. 被引量：24
9杨志明.一年多考背景下分数等值的意义和方法[J].教育测量与评价（理论版）,2015,0(12):58-61. 被引量：14
10杨志明.考试组卷的若干测量学要求[J].教育测量与评价,2016(1):62-64. 被引量：5

引证文献3

1杨志明.计算机化多阶段自适应测试探析--以《中国青少年学能发展量表》为例[J].教育测量与评价,2016(8):4-9. 被引量：3
2詹沛达,高椿雷,边玉芳,罗照盛.使用题组反应模型缓解局部题目依赖性对多阶段测验的危害[J].心理科学,2017,40(1):216-223. 被引量：1
3李贵玉,涂冬波,戴步云,宗一涛,高旭亮,苗莹.计算机多阶段自适应测验的组卷方法[J].江西师范大学学报（自然科学版）,2017,41(5):460-467. 被引量：2

二级引证文献6

1杨志明.综合素质评价背景下的学习能力测试--《中国青少年学能发展量表》之结果解读[J].教育测量与评价,2017(3):5-11. 被引量：5
2李莉,任杰.项目反应理论的局部独立性与局部依赖性研究述评[J].中国考试,2018(8):28-33. 被引量：3
3李宇斌,蔡艳,涂冬波.手机依赖的计算机化自适应测量及其效果评估[J].心理科学,2020,43(3):748-755. 被引量：2
4杨志明,夏胜俊.“双减”背景下计算机化自适应多阶段测试的设计与算法改进[J].教育测量与评价,2021(11):3-9. 被引量：4
5杨志明,徐庆树.考试难度及其测量学调控手段[J].教育测量与评价,2022(4):3-10. 被引量：3
6何琦敏,宋康明,李黎,连达军,富尔江,张克非.智能化测量学教学辅助系统与组卷策略的设计及研究[J].苏州科技大学学报（自然科学版）,2024,41(1):61-68.

1漆书青.计算机化自适应测验的编制和应用[J].教育学术月刊,1999(2):65-67. 被引量：9
2曾灵秀,李然.计算机化自适应测验的理论与应用[J].四川教育学院学报,2006,22(B12):59-60.
3丁金芳,李广洲,邓海山.基于Web的计算机自适应测验的设计[J].计算机与应用化学,2002,19(4):480-482. 被引量：6
4李广洲,丁金芳,邓海山.基于Web的化学计算机化自适应测验系统的实现[J].计算机与应用化学,2002,19(5):661-664. 被引量：3
5张心,涂冬波.计算机化自适应测验中几种常用能力估计方法的特性与评价[J].中国考试,2014(5):18-25. 被引量：2
6范宇,吕涛.基于项目反应理论的临床医学考试系统的实现技术[J].中国教育技术装备,2008(16):135-137.
7骆聪,王霞,钟阳,张敏强.CD-CAT选题策略及其应用[J].心理研究,2014,7(2):23-27. 被引量：1
8乔际平,万勇.项目反应理论的优越性及其应用[J].首都师范大学学报（自然科学版）,1993,14(1):42-46.
9李新建.让孩子不再自卑[J].数学小灵通（启智版）（低年级）,2009(10):44-46.
10杨丹,刘汉明.基于CAT的远程学习者能力估计[J].教育技术资讯,2009(9):24-27.

心理科学

2015年第2期

浏览历史

内容加载中请稍等...

计算机化多阶段自适应测验研究述评被引量：3

参考文献31

二级参考文献15

共引文献8

同被引文献13

引证文献3

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

计算机化多阶段自适应测验研究述评 被引量：3

参考文献31

二级参考文献15

共引文献8

同被引文献13

引证文献3

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

计算机化多阶段自适应测验研究述评被引量：3