允许CAT题目检查的区块题目袋方法被引量：3

The Block Item Pocket Method to Allow Item Review in CAT

下载PDF

导出

摘要允许题目检查能够促进计算机化自适应测验(CAT)在实际中的应用。在不影响能力估计精度和测验公平性的前提下,允许CAT题目检查能够缓解考生考试焦虑,减少无关因素引起的测量误差。区块题目袋方法是连续区块方法与题目袋方法的结合,不仅能允许CAT题目检查,还能够弥补题目袋方法的不足。研究结果表明:(1)合理作答策略下,区块题目袋方法的估计精度在低能力水平上要优于题目袋方法;(2)在应对类似Wainer作答策略时,区块题目袋方法的估计精度在所有能力水平上均优于题目袋方法。(3)随着区块数的增加,区块题目袋方法的能力估计精度越接近无修改的基线水平。 Most computerized adaptive testing （CAT） do not allow examinees to review items because it will drastically decrease measurement precision and bring about extra cheating strategies （Wainer, 1993; Wise, 1996） Allowing item review is essential to make CAT comparable with traditional tests. It also matters in application. Item review enables examinees to correct mistakes due to carelessness, which can further improve the precision of ability estimation. No such option may cause some negative consequences for their overall performance especially in high-stake examinations, such as tension or anxiety （Vispoel, Henderickson, ＆ Bleiler, 2000）. Therefore, it is worth trying if allowing item review could alleviate problems mentioned at the beginning （Wise, 1996; Vispoel, 2000, 2005）. Several methods have been proposed, including the successive block method （Stocking, 1997） and the item pocket （IP） method （Han, 2013）. However, both methods are limited in some ways. Stocking＇s method does not allow examinees to skip items and requires a large number of blocks which may bring about some extra adverse effects because of frequent decision to go to next block. Han＇s method can avoid limitations of Stocking＇s. But it requires an appropriate IP size and may result in high bias in large IP size situation. The present study proposed the block item pocket （BIP） method which sets fewer but larger blocks with a proper total IP size. This method keeps advantages of Stocking＇s and Han＇s and overcomes their disadvantages. Two simulation studies of two response strategies were conducted to evaluate validity of the BIP method. Item parameters were randomly drawn from uniform distribution （b - U （-3, 3）） and （a - U （0, 2））. Each examinee was administered a fixed-length CAT with 30 items. The initial item for each examinee was randomly drawn from 0 - U （-0.5, 0.5）. For the CAT administration, the Maximum Fisher Information method was adopted to select items. The interim and final scores were estimated using MLE method in most conditions. When responses were less than 5 or when all answers were correct or wrong, EAP method was adopted. Each study contained five conditions： non-review, 1 blocks IP method, 2 blocks, 3 blocks and 6 blocks BIP method. Statistics like BIAS, MAE, and RMSE were used as evaluation criteria. Results indicated that：（1） BIP method had better estimate precision than IP method at low ability level under normal strategy; （2） When dealing with Wainer-like strategy, BIP method was far more precise than item pocket method at all ability levels; （3） As the number of blocks increased, estimate precision got closer to non-review condition. Advantages of this new method and future directions were discussed.

作者林喆陈平辛涛

机构地区北京师范大学发展心理研究所中国基础教育质量监测协同创新中心

出处《心理学报》 CSSCI CSCD 北大核心 2015年第9期1188-1198,共11页 Acta Psychologica Sinica

基金国家自然科学基金面上项目(31371047) 国家自然科学基金青年科学基金项目(31300862) 高等学校博士学科点专项科研基金项目(20130003120002) 中央高校基本科研业务费专项资金资助(2013YB26)

关键词计算机化自适应测验题目检查题目袋题目修改区块题目袋 computerized adaptive testing item review item pocket method answer change block item pocket method

分类号 B841 [哲学宗教—基础心理学]

引文网络
相关文献

参考文献25

1Benjamin, L. T., Cavell, T. A., & Shallenberger, W. R. (1987). Staying with initial answers on objective tests: Is it a myth? In M. E. Ware & R. J. Millard (Eds.), Handbook on student development: Advising, career development, and field placement (pp. 45-53).
2Hillsdale, N J: Lawrence Erlbaum. Bowles, R., & Pommerich, M. (2001, April). An examination of item review on a CAT using the specific information item selection algorithm. In The annual meeting of the National Council of Measurement in Education. Seattle, WA.
3Chang, H. H., & Ying, Z. L. (2008). To weight or not to weight? Balancing influence of initial items in adaptive testing. Psychometrika, 73(3), 441450.
4陈平,丁树良.允许检查并修改答案的计算机化自适应测验[J].心理学报,2008,40(6):737-747. 被引量：6
5陈平,张佳慧,辛涛.在线标定技术在计算机化自适应测验中的应用[J].心理科学进展,2013,21(10):1883-1892. 被引量：9
6Han, K. T. (2013). Item pocket method to allow response review and change in computerized adaptive testing. Applied Psychological Measurement, 37(4), 259-275.
7Kingsbury, G. G. (1996). Item review and adaptive testing. In Annual meeting of the National Council on Measurement in Education, New York.
8Lord, F. M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika, 48(2), 233-245.
9Lunz, M. E., Bergstrom, B. A., & Wright, B. D. (1992). The effect of review on student ability and test efficiency for computerized adaptive tests. Applied Psychological Measurement, 16(1), 33-40.
10McMorris, R. F. (1991). Why do young students change answers on tests?. ERIC Document Reproduction Service, EL, 342803.

二级参考文献43

1陈平,丁树良,林海菁,周婕.等级反应模型下计算机化自适应测验选题策略[J].心理学报,2006,38(3):461-467. 被引量：38
2Gershon R, Bergstrom B. Does cheating on CAT pay: Not. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, April, 1995
3Olea J, Revuelta J, Ximenez M C, et al. Psychometric and psychological effects of review on computerized fixed and adaptive tests. Psicologica, 2000, 21 : 157 - 173
4Wise S L. A critical analysis of the arguments for and against itern review in computerized adaptive testing, Paper presented at the annual meeting of the National Council on Measurernent in Education, New York City, April, 1996
5Wainer H. Some practical considerations when converting a linearly administered test to an adaptive format. Educational Measurement: Issues and Practice, 1993, 12(1): 15 -20
6Stocking M L. Revising item responses in computerized adaptive tests: A comparison of three models. Applied Psychological Measurement, 1997, 21(2): 129 - 142
7Papanastasiou E C. A ‘ rearrangement procedure' for scoring adaptive tests with review options. Paper presented at the National Council of Measurement in Education, New Orleans April, 2002
8Vispoel W P, Rocklin T R, Wang T, et al. Can examinees use a review option to obtain positively biased ability estimates on a computerized adaptive test? Journal of Educational Measurement, 1999, 36(2): 141 - 157
9Revuelta J, Ximenez M C, Olea J. Psychometric and psychological effects of item selection and review on computerized testing. Educational and Psychological Measurement, 2003, 63(5): 791 - 808
10Lord F M. Estimation of latent ability and item parameters when there are omitted responses. Psychometrika, 1974, 39:247 -264

共引文献13

1刘永,涂冬波.认知诊断测验Q矩阵估计方法比较[J].中国考试,2015(5):53-63. 被引量：2
2汪文义,丁树良,宋丽红.认知诊断中基于条件期望的距离判别方法[J].心理学报,2015,47(12):1499-1510. 被引量：9
3宋丽红,汪文义,丁树良.测验Q矩阵的修正方法及其比较研究[J].江西师范大学学报（自然科学版）,2015,39(6):623-630. 被引量：3
4简小珠,戴步云,陈平.计算机化自适应测验模拟方法的研究范式与特点[J].中国考试,2016(1):16-22.
5蔡艳.CAT中能力参数估计方法的改进:R-MLE估计法[J].心理学探新,2016,36(1):92-96. 被引量：2
6高旭亮,涂冬波,王芳,张龙,李雪莹.可修改答案的计算机化自适应测验的方法[J].心理科学进展,2016,24(4):654-664. 被引量：2
7陈平.两种新的计算机化自适应测验在线标定方法[J].心理学报,2016,48(9):1184-1198. 被引量：7
8简小珠,戴步云,陈平.CAT模拟结果的分析模式与评价指标[J].中国考试,2016(12):19-28.
9熊建华,罗慧,王晓庆,丁树良.基于GRM的在线校准研究[J].江西师范大学学报（自然科学版）,2018,42(1):62-66. 被引量：3
10张雪琴,毛秀珍,李佳.基于CAT的在线标定:设计与方法[J].心理科学进展,2020,28(11):1970-1978. 被引量：1

同被引文献21

1李刚,辛涛,赵茜.从四省市PISA 2018结果看我国基础教育发展的经验与挑战[J].中国教育学刊,2020,0(1):7-12. 被引量：15
2吕军,曹效英.基于语音识别的汉语发音自动评分系统的初步设计[J].现代教育技术,2006,16(3):51-54. 被引量：3
3蔡华俭,林永佳,伍秋萍,严乐,黄玄凤.网络测验和纸笔测验的测量不变性研究——以生活满意度量表为例[J].心理学报,2008,40(2):228-239. 被引量：37
4陈平,丁树良.允许检查并修改答案的计算机化自适应测验[J].心理学报,2008,40(6):737-747. 被引量：6
5涂冬波,蔡艳,戴海崎,漆书青.现代测量理论下四大认知诊断模型述评[J].心理学探新,2008,28(2):64-68. 被引量：20
6顾海根.一种新的测验形式——计算机自适应测验[J].上海教育科研,1999(5):31-33. 被引量：5
7毛秀珍,辛涛.计算机化自适应测验选题策略述评[J].心理科学进展,2011,19(10):1552-1562. 被引量：22
8罗芬,丁树良,王晓庆.多级评分计算机化自适应测验动态综合选题策略[J].心理学报,2012,44(3):400-412. 被引量：13
9辛涛,乐美玲,张佳慧.教育测量理论新进展及发展趋势[J].中国考试,2012(5):3-11. 被引量：35
10陆宏,高佳佳,胡一平.计算机自适应测验在美国州立K-12教育测评中的实践与探索[J].全球教育展望,2015,44(2):72-79. 被引量：10

引证文献3

1高旭亮,涂冬波,王芳,张龙,李雪莹.可修改答案的计算机化自适应测验的方法[J].心理科学进展,2016,24(4):654-664. 被引量：2
2孙小坚,宋乃庆,辛涛.PISA测试中多阶段自适应测验的实施及启示[J].现代教育技术,2021,31(6):72-78. 被引量：2
3陈平,代艺,黄颖诗.测验模式效应:来源、检测与应用[J].心理科学进展,2023,31(10):1966-1980.

二级引证文献4

1李心钰,陆宏.计算机化线性测验与自适应测验的等效性研究[J].现代教育技术,2022,32(1):85-93.
2张所帅,黄志军.PIRLS2021:国际阅读素养进展研究新动向及启示[J].基础教育课程,2022(7):72-80. 被引量：4
3陈平,代艺,黄颖诗.测验模式效应:来源、检测与应用[J].心理科学进展,2023,31(10):1966-1980.
4李小红,李珍,王克志.PISA的新进展分析及其启示[J].教育科学,2024,40(3):31-37.

1洪兰.倒着写字背后的大脑秘密[J].父母世界,2016(7):11-11.
2章沪超,丁树良,戴勰,关潮辉.基于抽样原理的计算机化自适应测验选题策略[J].江西师范大学学报（自然科学版）,2014,38(2):119-123.
3小栎.不是低能,只是善良[J].中学生百科（文综理综）,2010(1):54-55.
4高旭亮,涂冬波,王芳,张龙,李雪莹.可修改答案的计算机化自适应测验的方法[J].心理科学进展,2016,24(4):654-664. 被引量：2
5刘文浩.无限大的空间与无限短的时间的统一[J].重庆科技学院学报（社会科学版）,2009(9):33-35.
6蔡艳.CAT中能力参数估计方法的改进:R-MLE估计法[J].心理学探新,2016,36(1):92-96. 被引量：2
7杨淑群,蔡声镇,丁树良,丁秋林.基于FCA具有认知诊断功能CAT的设计与实现[J].南京航空航天大学学报,2008,40(5):696-701. 被引量：5
8王金奎.情绪体验的心脏——隔膜格式塔模型[J].太原经济管理干部学院学报,2004(2):176-177.
9王晓庆,罗芬,丁树良,熊建华.多级评分计算机化自适应测验动态调和平均选题策略[J].心理学探新,2016,36(3):270-275. 被引量：2
10唐小娟,丁树良,俞宗火.计算机化自适应测验在认知诊断中的应用[J].心理科学进展,2012,20(4):616-626. 被引量：15

心理学报

2015年第9期

浏览历史

内容加载中请稍等...

允许CAT题目检查的区块题目袋方法被引量：3

参考文献25

二级参考文献43

共引文献13

同被引文献21

引证文献3

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

允许CAT题目检查的区块题目袋方法 被引量：3

参考文献25

二级参考文献43

共引文献13

同被引文献21

引证文献3

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

允许CAT题目检查的区块题目袋方法被引量：3