期刊文献+

机助英语听说考试计算机自动评分的多层面Rasch模型分析 被引量:7

Many-facet Rasch Model analysis of computer automatic scoring in a computer-based English listening-speaking test
下载PDF
导出
摘要 本研究运用多层面Rasch模型测量软件FACETS分析了计算机和15名专家评分员在一次广东省高考机助英语听说考试模拟测试中批阅215名考生口语录音的阅卷行为。研究发现:计算机自动评分和专家评分员评分的严厉度虽有显著差异,但并不会对考生能力分布产生决定性影响;计算机自动评分阅卷较低的评分偏差几率说明计算机自动评分比专家人工阅卷具有更高的内部一致性。 The present study uses FACETS, a many-tacet Rasch model measurement computer program, to exptore the differences in rater severity and consistency among computer automatic scoring and 15 expert raters' rating on 215 examinees' speaking records derived from a mock examination of the Computer-based English Listening-Speaking Test (Guangdong). It finds that the rater severity differences among computer automatic scoring and expert raters' rating do not exert decisive influences on examinees' score distribution. The low bias rate of computer automatic scoring in- dicates that computer automatic scoring is better than human raters in terms of inner-consistency.
作者 周燕 曾用强
出处 《外语测试与教学》 2016年第1期22-31,共10页 Foreign Language Testing and Teaching
基金 广东省教育科学研究项目(TJW2013001)资助
关键词 多层面RASCH模型 机助英语听说考试 计算机自动评分 阅卷效度 many-facet Rasch model Computer-based English Listening-Speaking Test computer automatic sco-ring marking validity
  • 相关文献

参考文献39

  • 1Barkaoui K, Brooks L, Swain M & Lapkin S. Test-takers' strategic behaviors in independent and integrated speaking tasks [J]. Applied Linguistics, 2013, 34(3): 304-324.
  • 2Bernstein J, Moere van A & Cheng J. Validating automated speaking tests [J]. Language Testing, 2010, 27 (3) : 355-377.
  • 3Brooks L & Swain M. Contextualizing performances comparing performances during TOEFL iBT and real-life academic speaking activities [ J]. Language Assessment Quarterly, 2014, 11 (4) : 353-373.
  • 4Brown A, Iwashita N & McNamara T. An Examination of Rater Orientations and Test-taker Performance on English-for-academic-purposes Speaking Tasks [ R]. TOEFL Monograph 29, Princeton, NJ : Educational Tes- ting Service, 2005.
  • 5Burstein J C, Kukich K, Wolff S, Lu C, Chodorow M, Braden-Harder L & Harris M D. Automated scoring u- sing a hybrid feature identification technique [J]. Proceedings of ACL, 1998, 1 : 206-210.
  • 6Farnsworth T L. An investigation into the validity of the TOEFL iBT Speaking Test for International Teaching Assistant Certification [ J]. Language Assessment Quarterly, 2013, 10(3): 274-291.
  • 7Frost K, Elder C & Wigglesworth G. Investigating the validity of an integrated listening-speaking task : A dis- course-based analysis of test takers' oral performances [ J]. Language Testing, 2012, 29 (3) : 345-369.
  • 8Hirai A & Koizumi R. Validation of empirically derived rating scales for a story retelling speaking test [ J ]. Language Assessment Quarterly, 2013, 10 ( 4 ) : 398-422.
  • 9Iwashita N, Brown A, McNamara T F & Hagan S. Assessed levels of second language speaking proficiency: How distinct? [ J]. Applied Linguistics, 2008, 29( 1): 24-49.
  • 10Lee Y. Dependability of scores for a new ESL speaking assessment consisting of integrated and independent tasks [J]. Language Testing, 2006, 23(2) : 131-166.

二级参考文献245

共引文献397

同被引文献53

引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部