摘要
大规模语言运用测试(performance assessment)中评分员的评分质量直接关系到考生的命运和考试成绩的信效度及公平性,因此如何通过有效的评分员培训最小化评分误差是确保考试成绩有效性的重要问题。本研究运用多层面Rasch模型(MFRM)中所提供的丰富的统计信息对评分员的评分数据进行诊断分析,研究表明,MFRM输出的统计结果可以较为全面地诊断评分员在评分中所存在的问题,为评分员培训提供更加具有针对性的反馈信息,也可以作为区分评分员准确度高低的依据,为更多评分员误差研究提供有效的测量工具。
Rating accuracy in large-scale performance assessment is crucial for ensuring the test validity and fairness,which makes effective rater training a key issue in minimizing rating error and justifying the use of test scores as a decision-making tool. This study aims to explore how Many-Facet Rasch Model could be used for diagnosing rating accuracy based on its informative output,including both facet-level and element-specific statistics,to assess the general quality of ratings,the adequacy of the use of rating scale,rater severity,self-consistency and the number of biased cases. Results demonstrate that the MFRM output indices can generate a comprehensive diagnostic report on the rating patterns of each individual rater and thus can help provide more specific feedback for further rater training. Furthermore,the diagnostic report may well serve as a basis upon which raters with higher level of rating accuracy could be differentiated from those with lower level of rating accuracy,and therefore could be used as a useful instrument in studies on rater variability.
出处
《外语测试与教学》
2016年第2期47-54,共8页
Foreign Language Testing and Teaching
基金
北京外国语大学中国外语教育研究中心第七批中国外语教育基金资助