摘要
本文对基于局域网评分中间结果进行研究,发现阈值高低对一评、二评评分结果统计差异大小有影响,一般阈值越小,一评、二评评分结果无统计差异的越多。但阈值高低不是决定评分一致性的最重要因素,关键在于一评、二评差值的分布。阈值设置高,可能一评、二评结果也会无统计差异;阈值设置低,一评、二评结果也会出现显著差异。在考试分数“分分计较”的情况下,阈值设置应该是1分。在阈值规定的范围内,如果成对样本t检验结果无显著差异,并不意味着评分一致性一定好。如果成对样本t检验结果有显著差异,评分一致性也未必差。成对样本t检验并不是评价评分一致性的有效、可靠的方法。需要采用其他评价评分一致性的方法。
The studies on the scoring results of the raters on local network show that the threshold values have a substantial effect on the score differences between paired raters.Generally,the smaller the threshold values are,the fewer the statistical differences in scores between the raters are.However,the most important factor that affects the scoring consistency between raters is not the threshold,but the distribution of the score differences between the raters.For high threshold value,it is possible that there is no significant difference in results between two raters by Paired T-Test.But for lower threshold value,it is likely that the scores between paired raters show difference statistically.For high-stakes tests,the acceptable difference between paired raters should be one level.Nevertheless,If the Paired T-Test shows no significant difference in scores between two raters,it does not imply that high consistency is gained between the paired raters.Likewise,if there is significant difference in scores between two raters by Paired T-Test,it also does not imply that the scoring is consistent between the paired raters.The Paired T-Test is not the reliable and valid method to determine the scoring consistency.More valid method of testing the scoring consistency is needed.
出处
《考试研究》
2006年第4期64-75,共12页
Examinations Research
关键词
局域网评分
阈值
评分一致性
成对样本t检验
Scoring on Local Network Threshold value Scoring Consistency Paired T-Test