期刊文献+

项目反应理论观察分数核等值的影响因素 被引量:2

Effects of Several Factors on IRT Observed Score Kernel Equating
下载PDF
导出
摘要 探究带宽选择方法、样本量、题目数量、等值设计、数据模拟方式对项目反应理论观察分数核等值的影响。通过两种数据模拟方式,获得研究数据,并计算局部与全域评价指标。研究发现,在随机组设计中,带宽选择方法表现相似;考生样本量和题目数量影响甚微。在非等组设计中,惩罚法与Silverman经验准则表现优异;增加题目量可降低百分相对误差和随机误差;增加样本量导致百分相对误差变大,随机误差减小。数据模拟方式可影响等值评价。未来应重点关注等值系统评估。 Attributing to its advantages of pre-smoothing and continuation of score distributions,kernel equating has been testified and shows equivalent to or better than other equating methods,especially traditional ones in terms of equating accuracy and stability.IRT observed score kernel equating is formed by integrating kernel equating and IRT observed score equating.Few studies have focused on evaluating its performance systematically.Therefore,bandwidth selection method,sample size,test length,equating design and data simulation methods were investigated about their influence on performance.To ensure ecological validity,data from a large-scale assessment were used as the sampling pool.IRT data simulation method and pseudo tests and pseudo groups simulation method were used to avoid the simulation preference in random Equivalent Groups design(EG) and Non-Equivalent groups with Anchor Test design(NEAT).In detail,bandwidth selection methods included Penalty method,Silverman’s rule of thumb method,and Double smoothing method.Levels of sample size were 1000,2000,and 5000.Meanwhile,test containing 30 items and 45 items were considered.Finally,local criteria and universal criteria were computed,the former of which were Percent Relative Error(PRE) and Standard Error of Equating(SEE),and the latter of which were Averaged Percent Relative Error(APRE) and Averaged Standard Error of Equating(ASEE).It was found that in EG,regarding local criteria,PRE increased as central moment became higher,which also meant that the distribution difference before and after equating enlarged.Nonetheless,considering that PRE was formed by multiplying initial difference with 100,bandwidth selection methods yielded similar results.On the other hand,PRE was significantly reduced by increasing sample size and lengthening tests,especially by the latter one.Similar to PRE,when it came to SEE,there was no difference compared to the effect of bandwidth selection methods.Larger sample size rendered less random error,which was contrary to test length.Furthermore,curves of SEE were “high at left but low at right” for pseudo tests and pseudo groups method,and “low at left but high at right” for IRT simulation method.As for universal criteria,APRE among bandwidth selection methods were all small.Effects of sample size and test length were the same as observed in local criteria.There was no significant difference between ASEE for two data simulation methods.In NEAT,regarding local criteria,PRE increased as central moment became higher.The results of Penalty method and Silverman’s rule of thumb method coincided,which were superior to others.Moreover,this trend was more evident when the test was shorter.PRE was significantly reduced by lengthening tests as in EG,but not by increasing sample size.Notably,the results of PRE for Double smoothing method was most influenced by sample size when test included 30 items and IRT simulation method was used,which indicated some interactions among them.When it came to SEE,bandwidth selection methods yielded similar results,only showing discrepancies at extreme scores.Increasing sample size and lengthening test could reduce random error.Meanwhile,distribution of SEE for pseudo tests and pseudo groups method was more stable than that for IRT method.As for universal criteria,the trends for APRE and ASEE were the same as those in local criteria.To summarize,performances of bandwidth selection methods were similar in EG,but Penalty method and Silverman’s rule of thumb method prevailed in NEAT.Bandwidth selection,sample size,and test length affected IRT observed score equating together.Preference of data simulation methods was spotted,which suggested researchers that multiple simulation methods and designs should be conducted before final conclusions were drawn in the field of comparison of equating method.Further study should focus more on the systematic evaluation of equating.
作者 王少杰 张敏强 黄菲菲 黄丽芳 袁琪婷 Wang Shaojie;Zhang Minqiang;Huang Feifei;Huang Lifang;Yuan Qiting(School of Psychology,South China Normal University,Guangzhou,510631)
出处 《心理科学》 CSSCI CSCD 北大核心 2022年第4期988-997,共10页 Journal of Psychological Science
关键词 IRT观察分数核等值 带宽选择方法 等值设计 数据模拟方式 IRT observed score kernel equating bandwidth selection methods equating design data simulation methods
  • 相关文献

参考文献2

二级参考文献19

  • 1张敏强,胡晖.略论测验等值的理论、方法和应用[J].华南师范大学学报(社会科学版),1988(4):113-118. 被引量:6
  • 2Holland P W, Rubin D B. Test Equating. New York:Academic, 1982.
  • 3Holland P W, Thayer D T. Notes on the use of log - hnear models for fitting discrete probability distributions (ETS TR - 87 - 79). Princeton, NJ: ETS, 1987,
  • 4Holland P W, Thayer D T. The kernel method of equating score distributions (ETS RR - 89 - 7). Princeton, NJ: ETS, 1989.
  • 5Holland P W, King B F, Thayer D T. The Standard error of equating for the kernel method of equating score distributions. ( EIS TR - 89 - 83). Princeton, NJ: EIS, 1989.
  • 6Liou M, Cheng P E. Asymptotic standard error of equipercentite equating. Journal of Educational and Behavioral Statistics, 1995,20:259 - 286.
  • 7Dorans N J. Recentering and Realigning the SAT score distributions:how and why. Journal of Educational Measurement,2002, 39:59 - 84.
  • 8Livingston S A. Small sample equatings with log-linear smoothing. Journal of Educational Measurement, 1993,30:23- 39.
  • 9Hanson B A. Testing for differences in test score distributions using log- linear models. Applied Measurement in Education, 1996,9:305- 321.
  • 10Livingston S A. An empirical tryout of kernel equating (ETS RR - 93 - 33). Princeton, NJ: ETS, 1993.

共引文献3

同被引文献11

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部