项目反应理论观察分数核等值的影响因素被引量：2

Effects of Several Factors on IRT Observed Score Kernel Equating

下载PDF

导出

摘要探究带宽选择方法、样本量、题目数量、等值设计、数据模拟方式对项目反应理论观察分数核等值的影响。通过两种数据模拟方式,获得研究数据,并计算局部与全域评价指标。研究发现,在随机组设计中,带宽选择方法表现相似;考生样本量和题目数量影响甚微。在非等组设计中,惩罚法与Silverman经验准则表现优异;增加题目量可降低百分相对误差和随机误差;增加样本量导致百分相对误差变大,随机误差减小。数据模拟方式可影响等值评价。未来应重点关注等值系统评估。 Attributing to its advantages of pre-smoothing and continuation of score distributions,kernel equating has been testified and shows equivalent to or better than other equating methods,especially traditional ones in terms of equating accuracy and stability.IRT observed score kernel equating is formed by integrating kernel equating and IRT observed score equating.Few studies have focused on evaluating its performance systematically.Therefore,bandwidth selection method,sample size,test length,equating design and data simulation methods were investigated about their influence on performance.To ensure ecological validity,data from a large-scale assessment were used as the sampling pool.IRT data simulation method and pseudo tests and pseudo groups simulation method were used to avoid the simulation preference in random Equivalent Groups design(EG) and Non-Equivalent groups with Anchor Test design(NEAT).In detail,bandwidth selection methods included Penalty method,Silverman’s rule of thumb method,and Double smoothing method.Levels of sample size were 1000,2000,and 5000.Meanwhile,test containing 30 items and 45 items were considered.Finally,local criteria and universal criteria were computed,the former of which were Percent Relative Error(PRE) and Standard Error of Equating(SEE),and the latter of which were Averaged Percent Relative Error(APRE) and Averaged Standard Error of Equating(ASEE).It was found that in EG,regarding local criteria,PRE increased as central moment became higher,which also meant that the distribution difference before and after equating enlarged.Nonetheless,considering that PRE was formed by multiplying initial difference with 100,bandwidth selection methods yielded similar results.On the other hand,PRE was significantly reduced by increasing sample size and lengthening tests,especially by the latter one.Similar to PRE,when it came to SEE,there was no difference compared to the effect of bandwidth selection methods.Larger sample size rendered less random error,which was contrary to test length.Furthermore,curves of SEE were “high at left but low at right” for pseudo tests and pseudo groups method,and “low at left but high at right” for IRT simulation method.As for universal criteria,APRE among bandwidth selection methods were all small.Effects of sample size and test length were the same as observed in local criteria.There was no significant difference between ASEE for two data simulation methods.In NEAT,regarding local criteria,PRE increased as central moment became higher.The results of Penalty method and Silverman’s rule of thumb method coincided,which were superior to others.Moreover,this trend was more evident when the test was shorter.PRE was significantly reduced by lengthening tests as in EG,but not by increasing sample size.Notably,the results of PRE for Double smoothing method was most influenced by sample size when test included 30 items and IRT simulation method was used,which indicated some interactions among them.When it came to SEE,bandwidth selection methods yielded similar results,only showing discrepancies at extreme scores.Increasing sample size and lengthening test could reduce random error.Meanwhile,distribution of SEE for pseudo tests and pseudo groups method was more stable than that for IRT method.As for universal criteria,the trends for APRE and ASEE were the same as those in local criteria.To summarize,performances of bandwidth selection methods were similar in EG,but Penalty method and Silverman’s rule of thumb method prevailed in NEAT.Bandwidth selection,sample size,and test length affected IRT observed score equating together.Preference of data simulation methods was spotted,which suggested researchers that multiple simulation methods and designs should be conducted before final conclusions were drawn in the field of comparison of equating method.Further study should focus more on the systematic evaluation of equating.

作者王少杰张敏强黄菲菲黄丽芳袁琪婷 Wang Shaojie;Zhang Minqiang;Huang Feifei;Huang Lifang;Yuan Qiting(School of Psychology,South China Normal University,Guangzhou,510631)

机构地区华南师范大学心理学院

出处《心理科学》 CSSCI CSCD 北大核心 2022年第4期988-997,共10页 Journal of Psychological Science

关键词 IRT观察分数核等值带宽选择方法等值设计数据模拟方式 IRT observed score kernel equating bandwidth selection methods equating design data simulation methods

分类号 B841 [哲学宗教—基础心理学]

引文网络
相关文献

参考文献2

1罗莲.一种新的等值方法:核等值法[J].心理学探新,2008,28(2):69-74. 被引量：3
2王少杰,张敏强,李拓宇,梁正妍.核等值:一种观察分数等值体系[J].心理科学进展,2020,28(5):855-870. 被引量：2

二级参考文献19

1张敏强,胡晖.略论测验等值的理论、方法和应用[J].华南师范大学学报（社会科学版）,1988(4):113-118. 被引量：6
2Holland P W, Rubin D B. Test Equating. New York:Academic, 1982.
3Holland P W, Thayer D T. Notes on the use of log - hnear models for fitting discrete probability distributions (ETS TR - 87 - 79). Princeton, NJ: ETS, 1987,
4Holland P W, Thayer D T. The kernel method of equating score distributions (ETS RR - 89 - 7). Princeton, NJ: ETS, 1989.
5Holland P W, King B F, Thayer D T. The Standard error of equating for the kernel method of equating score distributions. ( EIS TR - 89 - 83). Princeton, NJ: EIS, 1989.
6Liou M, Cheng P E. Asymptotic standard error of equipercentite equating. Journal of Educational and Behavioral Statistics, 1995,20:259 - 286.
7Dorans N J. Recentering and Realigning the SAT score distributions:how and why. Journal of Educational Measurement,2002, 39:59 - 84.
8Livingston S A. Small sample equatings with log-linear smoothing. Journal of Educational Measurement, 1993,30:23- 39.
9Hanson B A. Testing for differences in test score distributions using log- linear models. Applied Measurement in Education, 1996,9:305- 321.
10Livingston S A. An empirical tryout of kernel equating (ETS RR - 93 - 33). Princeton, NJ: ETS, 1993.

共引文献3

1王少杰,张敏强,李拓宇,梁正妍.核等值:一种观察分数等值体系[J].心理科学进展,2020,28(5):855-870. 被引量：2
2余明友,周玉娟.高校学业成绩评分偏差问题与等值处理的方法探析[J].高教论坛,2023(2):37-39. 被引量：1
3王少杰,张敏强,黄菲菲,刘颖.项目反应理论观察分数核等值研究:连续化处理方式的组合视角[J].中国考试,2023(12):66-76.

同被引文献11

1肖泽萍,张天宏.焦虑障碍精神病理内表型特征及其早期识别和优化治疗研究进展[J].上海交通大学学报（医学版）,2012,32(9):1227-1233. 被引量：8
2许婷婷,赵青,王渊,陈珏,范青,张海音,王振.强迫症患者的人格特质在早年创伤与强迫症状间的中介作用[J].中国心理卫生杂志,2017,31(10):788-792. 被引量：12
3孙卓尔,刘伟志.军事部署官兵的创伤后应激障碍及维护[J].第二军医大学学报,2019,40(10):1053-1061. 被引量：5
4王燕,侯博文,刘文锦.童年亲子关系与“好资源”对未婚男性性开放态度的影响[J].心理学报,2020,52(2):207-215. 被引量：2
5王少杰,张敏强,李拓宇,梁正妍.核等值:一种观察分数等值体系[J].心理科学进展,2020,28(5):855-870. 被引量：2
6花田甜,岳青青,许碧云,陈启光,申春悌,张华强,陈炳为.项目反应理论的SAS软件实现[J].中国卫生统计,2020,37(2):310-312. 被引量：2
7庄然,郑淑园,田甜,王振禹,肖天,刘侃.项目反应理论在基础医学综合测试免疫学试题中的应用[J].细胞与分子免疫学杂志,2020,36(1):86-94. 被引量：3
8保宏翔,苗丹民.强迫症高危人群“特质-症状”筛查范式探究[J].医学与哲学,2021,42(4):48-52. 被引量：3
9黄申,方鹏,岳敏,苗丹民,曹爽.基于自陈式量表眼动数据的军人抑郁障碍高危人群客观化识别[J].空军军医大学学报,2022,43(2):151-154. 被引量：2
10晏小琼,金律,王瑾,尹虹祥,凌瑞杰.基于项目反应理论的中国职业人群职业紧张测量核心量表条目分析[J].现代预防医学,2022,49(15):2713-2717. 被引量：4

引证文献2

1保宏翔,黄荷,隋佳汝,苗丹民.IRT框架下强迫症高危人群“症状-特质-眼动”融合筛查模式探析[J].空军军医大学学报,2023,44(10):955-960.
2王少杰,张敏强,黄菲菲,刘颖.项目反应理论观察分数核等值研究:连续化处理方式的组合视角[J].中国考试,2023(12):66-76.

1雷新勇.关于教育评价改革的若干思考[J].中国考试,2020(9):10-14. 被引量：22
2张泉慧,何惧,任杰,张颖,卢燕.临床医学专业(本科)水平测试的等值方法比较研究[J].中华医学教育杂志,2022,42(7):577-580. 被引量：3
3李明润,贾凯跃,谭宝会,梁博,张志贵,胡颖鹏.层状岩体巷道断面形状优选及支护方式研究[J].矿业研究与开发,2022,42(10):101-107. 被引量：3
4董国勇,鲁霞.新高考完形填空原创试题选登[J].教学考试,2022(43):73-77.
5黄海.复合、经验、实践的归纳法——中国传统归纳逻辑探赜[J].江海学刊,2022(5):64-71.
6骆烜赫,周焕林,孟增.功能梯度多级加筋圆柱壳的轻量化设计[J].中国科学：物理学、力学、天文学,2022,52(11):98-105. 被引量：3
7张媛.关于研究生就业的几点思考[J].中国研究生,2022(9):48-49.
8刘美,周龙.基于迁移学习与标签平滑策略的宫颈细胞分类方法[J].现代计算机,2022,28(19):1-9.
9宋文利,高福如,任傲,印志伟,施金玉.温敏性双网络复合水凝胶的制备及表征[J].金陵科技学院学报,2022,38(3):88-92.
10周晓燕.基于学习风格策略的初中Python程序设计课程游戏化教学模式研究[J].现代中小学教育,2022,38(12):15-18.

心理科学

2022年第4期

浏览历史

内容加载中请稍等...

项目反应理论观察分数核等值的影响因素被引量：2

参考文献2

二级参考文献19

共引文献3

同被引文献11

引证文献2

相关作者

相关机构

相关主题

浏览历史

项目反应理论观察分数核等值的影响因素 被引量：2

参考文献2

二级参考文献19

共引文献3

同被引文献11

引证文献2

相关作者

相关机构

相关主题

浏览历史

项目反应理论观察分数核等值的影响因素被引量：2