基于GPCM的计算机自适应测验选题策略比较被引量：21

Item Selection Strategies for Computerized Adaptive Testing with the Generalized Partial Credit Model

下载PDF

导出

摘要选题策略是计算机自适应测验(Computerized Adaptive Testing,CAT)研究的一项重要内容,它的好坏直接关系到考试的信度、效度及考试的安全性。CAT的许多研究与应用,都建立在0-1二级评分模型基础上,对多级评分CAT的选题策略的研究很少报导。目前国内虽已开展了基于GRM的CAT研究,但基于GPCM的CAT的研究尚未见有关报道。本文通过计算机模拟程序,对基于拓广分部评分模型(Generalized Partial Credit Model,GPCM)下的CAT的四种选题策略在多种情况下进行了比较研究。研究结果表明:被试能力呈正态分布时,选题策略的使用效果与项目步骤参数分布有很大的关系。(1)项目步骤参数均服从正态分布时,采用能力与项目步骤参数匹配选题策略效果最佳;(2)项目步骤参数均服从均匀分布时,能力与项目步骤参数平均数匹配选题策略效果最佳。 The objective of computerized adaptive testing （CAT） is to construct an optimal test for each examinee. Item Selection Strategy （ISS） is an important part of CAT research, whose quality is directly related to the reliability, efficiency, and security of the test. Many researches and applications of CAT are based on a dichotomously scored model. It is highly evident that more information can be obtained from examinees using a polytomously scored model rather than a dichotomous model. Moreover, it is necessary for us to further explore CAT research based on a polytomously scored model. Both the Generalized of a polytomously scored Partial Credit Modal （GPCM） and the Graded Response Model （GRM） are within the range model. However, they differ from each other. In the GRM, the item grade difficulties ascend monotonously as the grades increase; while the GPCM shows the performing process of the item, which is separated into some line-steps to put forwards. In the GPCM, each item contains several step parameters, and there are no specific rules governing them. The posterior step cannot advance when the earlier step has not been completed, and the posterior＇s step parameter may be lower than that of the previous one. Considerable research is already being conducted on CAT using the GRM; however, in our country, there are few reports pertaining to research on CAT using the GPCM. This study investigated the four types of ISS in comparison with CAT in various circumstances, using the GPCM through computer simulated programs. They are implemented in four item pools, and each item pool has a capacity of 1000 items. Each item has five step parameters; further, the discrimination parameter and step parameters are distributed as follows： b - N （0,1）, lna - N （0,1）, b - N （0,1）, a - U （0.2,2.5）, b - U （-3,3）, lna - N （0,1）, b - U （-3,3）, and a - U （0.2,2.5）. Item parameters are generated based on the Monte Carlo simulation method. Responses to the items are generated according to the GPCM for a sample of 3000 simulatees θ - N（0,1） whose trait level was also generated using the Monte Carlo simulation method in some types of ISS. During the course of responses, the simulatees＇ ability is estimated based on the response obtained. In addition, after the four item pools are sorted by the discrimination parameter to complete the a -stratified design, the abovementioned process is performed repeatedly. Thirty-two simulated CATs are administered with the output evaluated with regard to the following measurements： precision, ISS steady, item used even, average use of item per person, χ^2, efficiency, and item overlap. The data in tables 1 and 2 include both the index values used for evaluation （which were obtained from the CAT process using four types of ISS when the item pool did not adopt the stratified design and instead adopted the a-stratified design） and values that are calculated after summing the weight of every index value： We can draw the following conclusions from the data in the tables： all the ability estimates are highly accurate and have fewer differences. Moreover, we compare the value by summing every means weight, we learn that the item step parameter distribution greatly influences the choices of ISS. On the condition that the examinee＇s trait level follows normal distribution, the application results of the ISS and the item step parameter distribution share a very close relationship. （1） If the item＇s step parameters follow a normal distribution, the efficiency of the ISS for a random step parameter matching the trait level is much better than that for others. （2） If the item＇s step parameters follow a uniform distribution, the efficiency of the item selection strategy for the item＇ s average step parameter matching the trait level is much better than that for others.

作者刘珍丁树良林海菁

机构地区江西师范大学信息工程学院

出处《心理学报》 CSSCI CSCD 北大核心 2008年第5期618-625,共8页 Acta Psychologica Sinica

基金国家自然科学基金(60263005) 江西省科技厅攻关项目江西省教育厅科技项目卫生部课题(JM20060070,KY200704) 高等学校博士学科点专项科研基金(8020070414001)资助

关键词 IRT 多级评分模型 GPCM a-分层选题策略 IRT, polytomously scored model, GPCM, a -stratified design, item selection strategy

分类号 B841 [哲学宗教—基础心理学]

引文网络
相关文献

参考文献12

1Meijer R R,Nering M L.Computerized Adaptive Testing:Overview and Introduction.Applied Psychological Measurement,1999,23(3):187-194.
2Wainer H.Computerized Adaptive Testing:A Primer.Hillsdale.NJ:Lawrence Erlbaum,1990.
3Dodd B G,De Ayala R J,Koch W R.Computerized Adaptive Testing with Polytomous Items.Applied Psychological Measutcment,1995,19(1):5-22.
4戴海琦,陈德枝,丁树良,邓太萍.多级评分题计算机自适应测验选题策略比较[J].心理学报,2006,38(5):778-783. 被引量：30
5陈平,丁树良,林海菁,周婕.等级反应模型下计算机化自适应测验选题策略[J].心理学报,2006,38(3):461-467. 被引量：38
6Dodd B G,Koch W R and De Ayala R J.Operational Characteristics of Adaptive Testing Procedures Using the Graded Response Model,Applied Psychological Measurement,1989,13(2):129-143.
7金瑜等译．经典和现代测验理论导论．上海：华东师范大学出版社，2004：215
8张华华.计算机自适应考试设计中的误区[J].考试研究,2002,(2):35-39.
9Wainer H Bradlow,E T & Du Z.Testlet Response Theory:An Analog for the 3PL Model Useful in Testlet-based Adaptive Testing In:W J van der Linden,C A W Glass (Eds.),Computerized Adaptive Testing Theory and Practice.Netherlands:Kluwer Academic Publishers,2002.245-269.
10Chang H H,Qian J,Ying Z.A-stratified Multistage CAT with B-blocking.Applied Psychological Measurernent,2001,25:333-341.

二级参考文献20

1Chang H H, Qian J & Ying Z. A - stratified multistage CAT with b - blocking[ J ]. Applied Psychological Measurement, 2001,25: 333 -341.
2张华华.计算机自适应考试设计中的误区[J].考试研究,2002,(2):35-39.
3Chang H, Ying Z. To weight or not to weight? Balancing influence of initial and later items in adaptive testing[ C ]. Paper presented at the Annual Meeting of National Council on Measurement in education. New Orleans, LA.2002.
4Davis L L. Strategies for controlling item exposure in computerized adaptive testing with the generalized partial credit model[ J]. Applied Psychological measurement, 2004,28 (3): 165 - 185.
5Qing Yi,Hua- hua Chang. A- stratified design with content- blocking[J] .Br J Math Stat Psychol. 2003,56:359- 78.
6Chang H, Ying Z. A - stratified multistage computerized adaptive testing[ J]. Applied Psychological Measurement, 1999,23(3) :211 - 222.
7复旦大学编.数理统计(第二分册)[M].北京:高等教育出版社,1979.296-326.
8文剑冰侯杰泰.a-stratified方法在不定长度CAT中的应用[C]..第五届华人社会心理与教育学术研讨会[C].(台北),2001..
9Leung C. Chang H & Hau K. An examination of item selection rules by stratified CAT designs integrated with content balancing methods[C] .Paper presented at the Annual Meeting of the American Educational Researcher Association, Seattle, WA,2001.
10Barbara G D, De Ayala R J, William R K. Computerized adaptive testing with polytomous items. Applied Psychological Measurement,1995, 19 (1) : 5 -22

共引文献61

1王茜娟,丁树良,谭渊.按c-分层不定长CAT的研究[J].江西师范大学学报（自然科学版）,2005,29(3):227-230. 被引量：11
2崔丽娟.用安戈夫方法对网络成瘾与网络游戏成瘾的界定[J].应用心理学,2006,12(2):142-147. 被引量：38
3朱隆尹,丁树良.CAT能力估计方法的比较研究[J].江西师范大学学报（自然科学版）,2007,31(3):302-305. 被引量：1
4简小珠,戴海崎,彭春妹.IRT中Logistic模型的c、γ参数对能力估计的改善[J].心理学报,2007,39(4):737-746. 被引量：6
5罗贵明.未来测验新趋势——计算机自适应测验[J].沈阳大学学报,2007,19(4):9-11. 被引量：1
6陈平,丁树良.允许检查并修改答案的计算机化自适应测验[J].心理学报,2008,40(6):737-747. 被引量：6
7罗照盛,欧阳雪莲,漆书青,戴海琦,丁树良.项目反应理论等级反应模型项目信息量[J].心理学报,2008,40(11):1212-1220. 被引量：21
8朱隆尹,丁树良,王茜娟.不定长CAT区分度分层终止规则研究[J].心理学探新,2008,28(4):80-84. 被引量：3
9罗贵明.高考新模式的构建与思考[J].沈阳教育学院学报,2009,11(1):105-107. 被引量：1
10汪文义,丁树良.2PLM下CAT选题策略比较[J].考试研究,2009,5(3):60-70. 被引量：4

同被引文献170

1纪凌开.分部评分模型与其它几种多级模型的比较[J].心理科学,2004,27(4):1000-1001. 被引量：7
2周明元,甘登文,丁树良.GPCM模型项目参数估计程序的开发与研究[J].心理学探新,2005,25(2):57-60. 被引量：3
3王茜娟,丁树良,谭渊.按c-分层不定长CAT的研究[J].江西师范大学学报（自然科学版）,2005,29(3):227-230. 被引量：11
4纪凌开.多级评分模型中的分部评分模型[J].湖北大学学报（哲学社会科学版）,2005,32(5):583-585. 被引量：4
5陈平,丁树良,林海菁,周婕.等级反应模型下计算机化自适应测验选题策略[J].心理学报,2006,38(3):461-467. 被引量：38
6戴海琦,陈德枝,丁树良,邓太萍.多级评分题计算机自适应测验选题策略比较[J].心理学报,2006,38(5):778-783. 被引量：30
7朱小明,李向荣,林捷,赵锦红.计算机信息技术课无纸化考试的研究[J].中国教育技术装备,2007(1):11-14. 被引量：16
8林海菁,丁树良.具有认知诊断功能的计算机化自适应测验的研究与实现[J].心理学报,2007,39(4):747-753. 被引量：20
9陈德枝.(2004).Samejima等级反应模型下CAT选题策略比较研究.硕士学位论文.江西师范大学.
10陈升座.(2007).以能力分布为基础之SHC曝光率控管法.国立台中教育大学教育测验统计研究所硕士论文.台湾台中市.

引证文献21

1汪文义,丁树良.2PLM下CAT选题策略比较[J].考试研究,2009,5(3):60-70. 被引量：4
2李铭勇,张敏强,简小珠.计算机自适应测验中测验安全控制方法评述[J].心理科学进展,2010,18(8):1339-1348. 被引量：11
3程小扬,丁树良,严深海,朱隆尹.引入曝光因子的计算机化自适应测验选题策略[J].心理学报,2011,43(2):203-212. 被引量：34
4程小扬,丁树良.子题库题量不平衡的按a分层选题策略[J].江西师范大学学报（自然科学版）,2011,35(1):5-9. 被引量：10
5程小扬,丁树良.拓广分部评分模型下计算机自适应测验变加权选题策略[J].心理科学,2011,34(4):965-969. 被引量：5
6罗芬,丁树良,王晓庆.多级评分计算机化自适应测验动态综合选题策略[J].心理学报,2012,44(3):400-412. 被引量：13
7程小扬,丁树良,朱隆尹,巫华芳.等级评分模型下的最大信息量分层选题策略[J].江西师范大学学报（自然科学版）,2012,36(5):446-451. 被引量：6
8戴莹.基于GPCM的高中数学教师评价指标体系的调查与分析[J].数学的实践与认识,2013,43(9):37-43. 被引量：3
9卫芳娜,甘登文,丁树良.认知诊断计算机化自适应测验按模式分层选题策略[J].江西师范大学学报（自然科学版）,2013,37(5):445-448. 被引量：2
10戴勰,甘登文,丁树良.结合影子题库的选题策略[J].江西师范大学学报（自然科学版）,2013,37(6):657-660. 被引量：2

二级引证文献80

1刘美美.计算机适应性在语言测试中的应用及其流程[J].语言规划学研究,2020(1):90-97.
2杨文清.3PLM模型下基于MFI的定长的计算机自适应测试选题策略的研究[J].计算机产品与流通,2020,0(7):130-130.
3蔡杰,杨业兵.症状自评量表强迫分量表分析——从项目反应理论角度[J].宜宾学院学报,2013,13(7):103-106. 被引量：1
4李铭勇,张敏强,简小珠.计算机自适应测验中测验安全控制方法评述[J].心理科学进展,2010,18(8):1339-1348. 被引量：11
5毛秀珍,辛涛.计算机化自适应测验选题策略述评[J].心理科学进展,2011,19(10):1552-1562. 被引量：21
6汤楠,丁树良,余丹.结合优先级指标和曝光因子的多级评分选题策略[J].江西师范大学学报（自然科学版）,2011,35(6):646-650. 被引量：3
7罗芬,丁树良,王晓庆.多级评分计算机化自适应测验动态综合选题策略[J].心理学报,2012,44(3):400-412. 被引量：13
8何壮,赵守盈,李永政,袁淑莉.民族地区中小学教师政治信任量表编制研究[J].西南民族大学学报（自然科学版）,2012,38(4):689-694. 被引量：7
9杨建原,何壮,赵守盈.艾森克人格问卷项目质量的项目反应理论分析[J].山东师范大学学报（自然科学版）,2012,27(2):40-44. 被引量：5
10程小扬,丁树良,朱隆尹,巫华芳.等级评分模型下的最大信息量分层选题策略[J].江西师范大学学报（自然科学版）,2012,36(5):446-451. 被引量：6

1程小扬,丁树良,巫华芳,朱隆尹.多级评分模型下的题库结构对CAT的影响分析[J].心理学探新,2014,34(5):452-456. 被引量：3
2程小扬,丁树良.拓广分部评分模型下计算机自适应测验变加权选题策略[J].心理科学,2011,34(4):965-969. 被引量：5
3纪凌开.分部评分模型与其它几种多级模型的比较[J].心理科学,2004,27(4):1000-1001. 被引量：7
4纪凌开.多级评分模型中的分部评分模型[J].湖北大学学报（哲学社会科学版）,2005,32(5):583-585. 被引量：4
5戴海琦,陈德枝,丁树良,邓太萍.多级评分题计算机自适应测验选题策略比较[J].心理学报,2006,38(5):778-783. 被引量：30
6肖涵敏,杜文久,张婷婷.基于项目节点的多级评分模型的统一[J].心理学报,2011,43(12):1462-1467. 被引量：2
7陈平,丁树良,林海菁,周婕.等级反应模型下计算机化自适应测验选题策略[J].心理学报,2006,38(3):461-467. 被引量：38
8烧香定时间[J].小学数学大眼界,2013(9):40-40.
9丁树良,吴锐,张节兰,熊建华.概率分布等值法及其应用[J].心理学报,2008,40(1):101-108. 被引量：3
10达恒（Gbur,Adam）.荀子与霍布斯对“人性恶”的认识及其矫治策略比较[J].古代文明,2008,2(3):83-90. 被引量：1

心理学报

2008年第5期

浏览历史

内容加载中请稍等...

基于GPCM的计算机自适应测验选题策略比较被引量：21

参考文献12

二级参考文献20

共引文献61

同被引文献170

引证文献21

二级引证文献80

相关作者

相关机构

相关主题

浏览历史

基于GPCM的计算机自适应测验选题策略比较 被引量：21

参考文献12

二级参考文献20

共引文献61

同被引文献170

引证文献21

二级引证文献80

相关作者

相关机构

相关主题

浏览历史

基于GPCM的计算机自适应测验选题策略比较被引量：21