摘要
探究带宽选择方法、样本量、题目数量、等值设计、数据模拟方式对项目反应理论观察分数核等值的影响。通过两种数据模拟方式,获得研究数据,并计算局部与全域评价指标。研究发现,在随机组设计中,带宽选择方法表现相似;考生样本量和题目数量影响甚微。在非等组设计中,惩罚法与Silverman经验准则表现优异;增加题目量可降低百分相对误差和随机误差;增加样本量导致百分相对误差变大,随机误差减小。数据模拟方式可影响等值评价。未来应重点关注等值系统评估。
Attributing to its advantages of pre-smoothing and continuation of score distributions,kernel equating has been testified and shows equivalent to or better than other equating methods,especially traditional ones in terms of equating accuracy and stability.IRT observed score kernel equating is formed by integrating kernel equating and IRT observed score equating.Few studies have focused on evaluating its performance systematically.Therefore,bandwidth selection method,sample size,test length,equating design and data simulation methods were investigated about their influence on performance.To ensure ecological validity,data from a large-scale assessment were used as the sampling pool.IRT data simulation method and pseudo tests and pseudo groups simulation method were used to avoid the simulation preference in random Equivalent Groups design(EG) and Non-Equivalent groups with Anchor Test design(NEAT).In detail,bandwidth selection methods included Penalty method,Silverman’s rule of thumb method,and Double smoothing method.Levels of sample size were 1000,2000,and 5000.Meanwhile,test containing 30 items and 45 items were considered.Finally,local criteria and universal criteria were computed,the former of which were Percent Relative Error(PRE) and Standard Error of Equating(SEE),and the latter of which were Averaged Percent Relative Error(APRE) and Averaged Standard Error of Equating(ASEE).It was found that in EG,regarding local criteria,PRE increased as central moment became higher,which also meant that the distribution difference before and after equating enlarged.Nonetheless,considering that PRE was formed by multiplying initial difference with 100,bandwidth selection methods yielded similar results.On the other hand,PRE was significantly reduced by increasing sample size and lengthening tests,especially by the latter one.Similar to PRE,when it came to SEE,there was no difference compared to the effect of bandwidth selection methods.Larger sample size rendered less random error,which was contrary to test length.Furthermore,curves of SEE were “high at left but low at right” for pseudo tests and pseudo groups method,and “low at left but high at right” for IRT simulation method.As for universal criteria,APRE among bandwidth selection methods were all small.Effects of sample size and test length were the same as observed in local criteria.There was no significant difference between ASEE for two data simulation methods.In NEAT,regarding local criteria,PRE increased as central moment became higher.The results of Penalty method and Silverman’s rule of thumb method coincided,which were superior to others.Moreover,this trend was more evident when the test was shorter.PRE was significantly reduced by lengthening tests as in EG,but not by increasing sample size.Notably,the results of PRE for Double smoothing method was most influenced by sample size when test included 30 items and IRT simulation method was used,which indicated some interactions among them.When it came to SEE,bandwidth selection methods yielded similar results,only showing discrepancies at extreme scores.Increasing sample size and lengthening test could reduce random error.Meanwhile,distribution of SEE for pseudo tests and pseudo groups method was more stable than that for IRT method.As for universal criteria,the trends for APRE and ASEE were the same as those in local criteria.To summarize,performances of bandwidth selection methods were similar in EG,but Penalty method and Silverman’s rule of thumb method prevailed in NEAT.Bandwidth selection,sample size,and test length affected IRT observed score equating together.Preference of data simulation methods was spotted,which suggested researchers that multiple simulation methods and designs should be conducted before final conclusions were drawn in the field of comparison of equating method.Further study should focus more on the systematic evaluation of equating.
作者
王少杰
张敏强
黄菲菲
黄丽芳
袁琪婷
Wang Shaojie;Zhang Minqiang;Huang Feifei;Huang Lifang;Yuan Qiting(School of Psychology,South China Normal University,Guangzhou,510631)
出处
《心理科学》
CSSCI
CSCD
北大核心
2022年第4期988-997,共10页
Journal of Psychological Science
关键词
IRT观察分数核等值
带宽选择方法
等值设计
数据模拟方式
IRT observed score kernel equating
bandwidth selection methods
equating design
data simulation methods