基于局域网评分中阈值设置和评分一致性研究被引量：2

Thresholds Setting and Consistency between Raters for Scoring on Local Network

下载PDF

导出

摘要本文对基于局域网评分中间结果进行研究,发现阈值高低对一评、二评评分结果统计差异大小有影响,一般阈值越小,一评、二评评分结果无统计差异的越多。但阈值高低不是决定评分一致性的最重要因素,关键在于一评、二评差值的分布。阈值设置高,可能一评、二评结果也会无统计差异;阈值设置低,一评、二评结果也会出现显著差异。在考试分数“分分计较”的情况下,阈值设置应该是1分。在阈值规定的范围内,如果成对样本t检验结果无显著差异,并不意味着评分一致性一定好。如果成对样本t检验结果有显著差异,评分一致性也未必差。成对样本t检验并不是评价评分一致性的有效、可靠的方法。需要采用其他评价评分一致性的方法。 The studies on the scoring results of the raters on local network show that the threshold values have a substantial effect on the score differences between paired raters.Generally,the smaller the threshold values are,the fewer the statistical differences in scores between the raters are.However,the most important factor that affects the scoring consistency between raters is not the threshold,but the distribution of the score differences between the raters.For high threshold value,it is possible that there is no significant difference in results between two raters by Paired T-Test.But for lower threshold value,it is likely that the scores between paired raters show difference statistically.For high-stakes tests,the acceptable difference between paired raters should be one level.Nevertheless,If the Paired T-Test shows no significant difference in scores between two raters,it does not imply that high consistency is gained between the paired raters.Likewise,if there is significant difference in scores between two raters by Paired T-Test,it also does not imply that the scoring is consistent between the paired raters.The Paired T-Test is not the reliable and valid method to determine the scoring consistency.More valid method of testing the scoring consistency is needed.

作者雷新勇周群

机构地区上海市教育考试院命题办公室

出处《考试研究》 2006年第4期64-75,共12页 Examinations Research

关键词局域网评分阈值评分一致性成对样本t检验 Scoring on Local Network Threshold value Scoring Consistency Paired T-Test

分类号 TP393.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

同被引文献12

1马世晔.网上阅卷的回顾与思考[J].中国考试,2004(7):24-26. 被引量：19
2Gilfert, S.,& Harada, K. ( 1992). Two composition scoringmethods : The analytic vs. holistic method. Bulletin of Facultyof Foreign Languages ,1,17 -22.
3Hidi,J. C. ,& Mclaren,L. ( 1990) . Topics and writing. Studies inEducational Evaluation,16,515- 518.
4Lane,S. ,& Stone,C. A. (2006). Performance assessment. In R.L. Brennan ( Eds. ) , Educational measurement ( pp. 387 —431 ) . Washington, DC : American Council on Education.
5Lumley,T. ( 2002 ) . Assessment criteria in a large - scale writingtest :what do they really mean to the raters. Language Tes-ting ,19 ,246 -276.
6McNamara, T. F. , & Adams, R. J. (1991), Exploring rater be-havior with rasch techniques. Paper presented at the annualLanguage Testing Research Colloquium. March, Princeton.
7McQueen,J.,& Congdon’P. J. ( 1997). Rater severity in large -scale assessment : Is it invariant. Paper presented at the annu-al meeting of the American Educational Research Associa-tion. March,Chicago.
8Penny,J. ,Johnson, R. L.,& Gordon, B. ( 2000 ) . The effect ofrating augmentation on inter - rater reliability : An empiricalstudy of a holistic rubric. Assessing Writing9143 - 164.
9丁文,裴赟.评分趋中性现象的初步分析[J].中国考试,2008(8):14-18. 被引量：2
10赵海燕,芮南.双评作文题网上阅卷评卷教师评卷水平评价维度的确定[J].中国考试,2009(2):12-17. 被引量：6

引证文献2

1赵海燕,陈志国.网上阅卷双评过程可行计分方法探究[J].考试研究,2011,7(2):54-61. 被引量：7
2王博,卞冉,车宏生,王蓉.主观评分保守现象的形成机制与控制研究[J].心理学探新,2012,32(5):429-438. 被引量：6

二级引证文献13

1俞韫烨,谢小庆.基于多面Rasch模型的作文网上评卷“趋中评分”判定研究[J].中国考试,2012(1):6-13. 被引量：12
2廖欢.网上阅卷系统多评模式的分析[J].企业导报,2012(15):268-268. 被引量：2
3彭恒利,俞韫烨.主观性试题网上评阅趋中评分控制研究初探[J].中国考试,2013(6):3-9. 被引量：6
4钟亚萍,朱姝芹,季吉.护理学基础操作多项目考核的评价[J].中国实用护理杂志,2014,30(29):72-74. 被引量：1
5刘斯佳,张建新.分步增值评分——提高主观题评分质量的有效方法[J].心理学探新,2015,35(3):266-271. 被引量：1
6王玉洁.大数据时代下自学考试网上阅卷管理系统的创新与研究[J].电脑知识与技术,2015,0(6):82-83. 被引量：1
7朱京江.大数据时代下人事考试网上阅卷管理系统的创新研究[J].中国管理信息化,2015,18(18):171-172. 被引量：3
8关丹丹.高考作文改革与评分误差控制:基于测量学的视角[J].中国考试,2016(5):12-16. 被引量：4
9杨泽忠,朱铭.网上阅卷之理性审视[J].中国考试,2017(8):50-55. 被引量：9
10颜静兰,杨帆.写作评分员疲劳因素初探——以2019年TEM4写作评分为例[J].外语测试与教学,2020,0(2):56-60. 被引量：1

1刘斌,谢容容(图).电脑惊奇游记[J].少年发明与创造（小学版）,2014(1):12-15.
2刘东华,潘求丰.基于Excel的考试分数简易定量分析方法[J].中国教育信息化（基础教育）,2011(1):57-58.
3吴玉芳.ERP上线并行阶段统计差异应对策略[J].中国石化,2009(3):57-57.
4张猛,王大玲,于戈.一种基于自动阈值发现的文本聚类方法[J].计算机研究与发展,2004,41(10):1748-1753. 被引量：16
5詹双环,张鸿宾.基于小波分解和方差分析的图像信息隐藏盲检测[J].电子与信息学报,2007,29(6):1460-1463. 被引量：4
6陆军,范劲松.Research on Improving Rating Reliability for School-based Oral English Achievement Tests:The Design and Development of a Computer-aided Rating System[J].Chinese Journal of Applied Linguistics,2011,34(4):59-71.
7张震,康吉全,平西建,任远.用图像质量评价量实现的真实图像和计算机生成图像的鉴别方法[J].测绘科学技术学报,2008,25(5):355-358. 被引量：2
8孙众.大数据:信息技术研究的沃土[J].中国信息技术教育,2015(12):13-13.
9郭鑫,陈千,向阳.基于特征本体的文本流主题检测研究[J].计算机应用研究,2016,33(2):396-399.
10黄炜,赵险峰,冯登国,盛任农.基于主成分分析进行特征融合的JPEG隐写分析[J].软件学报,2012,23(7):1869-1879. 被引量：15

考试研究

2006年第4期

浏览历史

内容加载中请稍等...

基于局域网评分中阈值设置和评分一致性研究被引量：2

同被引文献12

引证文献2

二级引证文献13

相关作者

相关机构

相关主题

浏览历史

基于局域网评分中阈值设置和评分一致性研究 被引量：2

同被引文献12

引证文献2

二级引证文献13

相关作者

相关机构

相关主题

浏览历史

基于局域网评分中阈值设置和评分一致性研究被引量：2