期刊文献+

基于局域网评分中阈值设置和评分一致性研究 被引量:2

Thresholds Setting and Consistency between Raters for Scoring on Local Network
下载PDF
导出
摘要 本文对基于局域网评分中间结果进行研究,发现阈值高低对一评、二评评分结果统计差异大小有影响,一般阈值越小,一评、二评评分结果无统计差异的越多。但阈值高低不是决定评分一致性的最重要因素,关键在于一评、二评差值的分布。阈值设置高,可能一评、二评结果也会无统计差异;阈值设置低,一评、二评结果也会出现显著差异。在考试分数“分分计较”的情况下,阈值设置应该是1分。在阈值规定的范围内,如果成对样本t检验结果无显著差异,并不意味着评分一致性一定好。如果成对样本t检验结果有显著差异,评分一致性也未必差。成对样本t检验并不是评价评分一致性的有效、可靠的方法。需要采用其他评价评分一致性的方法。 The studies on the scoring results of the raters on local network show that the threshold values have a substantial effect on the score differences between paired raters.Generally,the smaller the threshold values are,the fewer the statistical differences in scores between the raters are.However,the most important factor that affects the scoring consistency between raters is not the threshold,but the distribution of the score differences between the raters.For high threshold value,it is possible that there is no significant difference in results between two raters by Paired T-Test.But for lower threshold value,it is likely that the scores between paired raters show difference statistically.For high-stakes tests,the acceptable difference between paired raters should be one level.Nevertheless,If the Paired T-Test shows no significant difference in scores between two raters,it does not imply that high consistency is gained between the paired raters.Likewise,if there is significant difference in scores between two raters by Paired T-Test,it also does not imply that the scoring is consistent between the paired raters.The Paired T-Test is not the reliable and valid method to determine the scoring consistency.More valid method of testing the scoring consistency is needed.
作者 雷新勇 周群
出处 《考试研究》 2006年第4期64-75,共12页 Examinations Research
关键词 局域网评分 阈值 评分一致性 成对样本t检验 Scoring on Local Network Threshold value Scoring Consistency Paired T-Test
  • 相关文献

同被引文献12

  • 1马世晔.网上阅卷的回顾与思考[J].中国考试,2004(7):24-26. 被引量:19
  • 2Gilfert, S.,& Harada, K. ( 1992). Two composition scoringmethods : The analytic vs. holistic method. Bulletin of Facultyof Foreign Languages ,1,17 -22.
  • 3Hidi,J. C. ,& Mclaren,L. ( 1990) . Topics and writing. Studies inEducational Evaluation,16,515- 518.
  • 4Lane,S. ,& Stone,C. A. (2006). Performance assessment. In R.L. Brennan ( Eds. ) , Educational measurement ( pp. 387 —431 ) . Washington, DC : American Council on Education.
  • 5Lumley,T. ( 2002 ) . Assessment criteria in a large - scale writingtest :what do they really mean to the raters. Language Tes-ting ,19 ,246 -276.
  • 6McNamara, T. F. , & Adams, R. J. (1991), Exploring rater be-havior with rasch techniques. Paper presented at the annualLanguage Testing Research Colloquium. March, Princeton.
  • 7McQueen,J.,& Congdon’P. J. ( 1997). Rater severity in large -scale assessment : Is it invariant. Paper presented at the annu-al meeting of the American Educational Research Associa-tion. March,Chicago.
  • 8Penny,J. ,Johnson, R. L.,& Gordon, B. ( 2000 ) . The effect ofrating augmentation on inter - rater reliability : An empiricalstudy of a holistic rubric. Assessing Writing9143 - 164.
  • 9丁文,裴赟.评分趋中性现象的初步分析[J].中国考试,2008(8):14-18. 被引量:2
  • 10赵海燕,芮南.双评作文题网上阅卷评卷教师评卷水平评价维度的确定[J].中国考试,2009(2):12-17. 被引量:6

引证文献2

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部