摘要
本研究运用多层面Rasch模型测量软件FACETS分析了计算机和15名专家评分员在一次广东省高考机助英语听说考试模拟测试中批阅215名考生口语录音的阅卷行为。研究发现:计算机自动评分和专家评分员评分的严厉度虽有显著差异,但并不会对考生能力分布产生决定性影响;计算机自动评分阅卷较低的评分偏差几率说明计算机自动评分比专家人工阅卷具有更高的内部一致性。
The present study uses FACETS, a many-tacet Rasch model measurement computer program, to exptore the differences in rater severity and consistency among computer automatic scoring and 15 expert raters' rating on 215 examinees' speaking records derived from a mock examination of the Computer-based English Listening-Speaking Test (Guangdong). It finds that the rater severity differences among computer automatic scoring and expert raters' rating do not exert decisive influences on examinees' score distribution. The low bias rate of computer automatic scoring in- dicates that computer automatic scoring is better than human raters in terms of inner-consistency.
出处
《外语测试与教学》
2016年第1期22-31,共10页
Foreign Language Testing and Teaching
基金
广东省教育科学研究项目(TJW2013001)资助
关键词
多层面RASCH模型
机助英语听说考试
计算机自动评分
阅卷效度
many-facet Rasch model
Computer-based English Listening-Speaking Test
computer automatic sco-ring
marking validity