期刊文献+

基于多层面Rasch模型的评分员评分质量诊断 被引量:4

Diagnosing rating accuracy using Many-Facet Rasch Model
下载PDF
导出
摘要 大规模语言运用测试(performance assessment)中评分员的评分质量直接关系到考生的命运和考试成绩的信效度及公平性,因此如何通过有效的评分员培训最小化评分误差是确保考试成绩有效性的重要问题。本研究运用多层面Rasch模型(MFRM)中所提供的丰富的统计信息对评分员的评分数据进行诊断分析,研究表明,MFRM输出的统计结果可以较为全面地诊断评分员在评分中所存在的问题,为评分员培训提供更加具有针对性的反馈信息,也可以作为区分评分员准确度高低的依据,为更多评分员误差研究提供有效的测量工具。 Rating accuracy in large-scale performance assessment is crucial for ensuring the test validity and fairness,which makes effective rater training a key issue in minimizing rating error and justifying the use of test scores as a decision-making tool. This study aims to explore how Many-Facet Rasch Model could be used for diagnosing rating accuracy based on its informative output,including both facet-level and element-specific statistics,to assess the general quality of ratings,the adequacy of the use of rating scale,rater severity,self-consistency and the number of biased cases. Results demonstrate that the MFRM output indices can generate a comprehensive diagnostic report on the rating patterns of each individual rater and thus can help provide more specific feedback for further rater training. Furthermore,the diagnostic report may well serve as a basis upon which raters with higher level of rating accuracy could be differentiated from those with lower level of rating accuracy,and therefore could be used as a useful instrument in studies on rater variability.
作者 张洁
出处 《外语测试与教学》 2016年第2期47-54,共8页 Foreign Language Testing and Teaching
基金 北京外国语大学中国外语教育研究中心第七批中国外语教育基金资助
关键词 评分质量 多层面RASCH模型 诊断 rating accuracy Many-Facet Rasch Model diagnosis
  • 相关文献

参考文献15

  • 1Johnson J S & Lira G S. The influence of rater language background on writing performance assessment [J].Language Testing, 2009, 26(4) :485-505.
  • 2Knoch U. Investigating the effectiveness of individualized feedback to rating behavior: A longitudinal study [J]. Language Testing, 2011,28(2) :179-200.
  • 3Kondo-Brown K. A FACETS analysis of rater bias in measuring Japanese second language writing performance [J]. Language Testing, 2002, 19(1) : 3-31.
  • 4Linacre J M. Many-Facet Rasch Measurement[ M]. Chicago: MESA Press, 1989,1994.
  • 5Linacre J M. A User's Guide to FACETS: Rasch-Model Computer Program [ M]. Chicago: MESA Press, 2005.
  • 6McNamara T. Measuring Second Language Performance [ M ]. New York: Addison Wesley, Longman, 1996.
  • 7Myford C M & Wolfe E W. Monitoring sources of variability within the Test of Spoken English Assessment Sys- tem [ R ]. TOEFL Research Report NO. 65, Princeton, NJ : Educational Testing Service, 2000.
  • 8Myford C M & Wolfe E W. Detecting and measuring rater effects using Many-Facet Rasch Measurement: Part I[J]. Journal of Applied Measurement, 2003, 4(4) :386-422.
  • 9Park T. An investigation of an ESL placement test of writing using Many-Facet Rasch Measurement[ J ]. Teach- ers College,Columbia University,Working Paper in TESOL & Applied Linguistics, 2004, 4( 1 ) : 1-21.
  • 10Weigle S C. Using FACETS to model rater training effects[J]. Language Testing,1998, 15(2) : 263-287.

二级参考文献110

  • 1Bachman, L. F. 2002. Some reflections on task-based language performance assessment [J]. Language Testing 19: 453-76.
  • 2Bachman, L. F., B. K. Lynch & M. Mason. 1995. Investigating variability in tasks and rater judgments in a performance test of foreign language speaking [J]. Language Testing 12: 238-257.
  • 3Bonk, W. J. & G. J. Ockey. 2003. A many-facet Rasch analysis of the second language group oral discussion task [J]. Language Testing 20, 1: 89- 110.
  • 4Elder, C., N. Iwashita & T. F. McNamara. 2002. Estimating the difficulty of oral proficiency tasks: what does the test-taker have to offer? [J]. Language Testing 19, 4: 347-368.
  • 51996. Testing tasks: Issues in task design and the group oral[J]. Language Testing 13, 1: 23-51.
  • 6Fulcher, G. 2003. Testing Second Language Speaking [ M ]. London: Longman/Pearson Education.
  • 7IwashitaN. &T. M. McNamara. 2001. Can we predict task difficulty in an oral proficiency test? Exploring the potential of an information processing approach to task design [J]. Language Learning 51, 3: 401-436.
  • 8Jennings, M. 1999. The test-takers' choice: An investigation of the effect of topic on language- test performance [J]. Language Testing 16, 4: 426-456.
  • 9Linacre, J. M. 1989, 1994. Many-facet Rasch Measurement [M]. MESA Press: Chicago.
  • 10Linacre, J. M. 1999. Investigating rating scale category utility [J]. Journal of Outcome Measurement 3(2) : 103-122.

共引文献109

同被引文献33

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部