机助英语听说考试计算机自动评分的多层面Rasch模型分析被引量：7

Many-facet Rasch Model analysis of computer automatic scoring in a computer-based English listening-speaking test

下载PDF

导出

摘要本研究运用多层面Rasch模型测量软件FACETS分析了计算机和15名专家评分员在一次广东省高考机助英语听说考试模拟测试中批阅215名考生口语录音的阅卷行为。研究发现:计算机自动评分和专家评分员评分的严厉度虽有显著差异,但并不会对考生能力分布产生决定性影响;计算机自动评分阅卷较低的评分偏差几率说明计算机自动评分比专家人工阅卷具有更高的内部一致性。 The present study uses FACETS, a many-tacet Rasch model measurement computer program, to exptore the differences in rater severity and consistency among computer automatic scoring and 15 expert raters＇ rating on 215 examinees＇ speaking records derived from a mock examination of the Computer-based English Listening-Speaking Test （Guangdong）. It finds that the rater severity differences among computer automatic scoring and expert raters＇ rating do not exert decisive influences on examinees＇ score distribution. The low bias rate of computer automatic scoring in- dicates that computer automatic scoring is better than human raters in terms of inner-consistency.

作者周燕曾用强

机构地区广东外语外贸大学广东外语艺术职业学院

出处《外语测试与教学》 2016年第1期22-31,共10页 Foreign Language Testing and Teaching

基金广东省教育科学研究项目(TJW2013001)资助

关键词多层面RASCH模型机助英语听说考试计算机自动评分阅卷效度 many-facet Rasch model Computer-based English Listening-Speaking Test computer automatic sco-ring marking validity

分类号 H319 [语言文字—英语]

引文网络
相关文献

参考文献39

1Barkaoui K, Brooks L, Swain M & Lapkin S. Test-takers' strategic behaviors in independent and integrated speaking tasks [J]. Applied Linguistics, 2013, 34(3): 304-324.
2Bernstein J, Moere van A & Cheng J. Validating automated speaking tests [J]. Language Testing, 2010, 27 (3) : 355-377.
3Brooks L & Swain M. Contextualizing performances comparing performances during TOEFL iBT and real-life academic speaking activities [ J]. Language Assessment Quarterly, 2014, 11 (4) : 353-373.
4Brown A, Iwashita N & McNamara T. An Examination of Rater Orientations and Test-taker Performance on English-for-academic-purposes Speaking Tasks [ R]. TOEFL Monograph 29, Princeton, NJ : Educational Tes- ting Service, 2005.
5Burstein J C, Kukich K, Wolff S, Lu C, Chodorow M, Braden-Harder L & Harris M D. Automated scoring u- sing a hybrid feature identification technique [J]. Proceedings of ACL, 1998, 1 : 206-210.
6Farnsworth T L. An investigation into the validity of the TOEFL iBT Speaking Test for International Teaching Assistant Certification [ J]. Language Assessment Quarterly, 2013, 10(3): 274-291.
7Frost K, Elder C & Wigglesworth G. Investigating the validity of an integrated listening-speaking task : A dis- course-based analysis of test takers' oral performances [ J]. Language Testing, 2012, 29 (3) : 345-369.
8Hirai A & Koizumi R. Validation of empirically derived rating scales for a story retelling speaking test [ J ]. Language Assessment Quarterly, 2013, 10 ( 4 ) : 398-422.
9Iwashita N, Brown A, McNamara T F & Hagan S. Assessed levels of second language speaking proficiency: How distinct? [ J]. Applied Linguistics, 2008, 29( 1): 24-49.
10Lee Y. Dependability of scores for a new ESL speaking assessment consisting of integrated and independent tasks [J]. Language Testing, 2006, 23(2) : 131-166.