评分人个性化反馈信息对CET4作文评分人决策的影响研究被引量：2

A study of the impact of individualized feedback on CET4 essay raters' decision-making

下载PDF

导出

摘要本文从评分人决策的变化探讨了评分人个性化反馈信息的有效性。研究人员首先邀请了三位不同经验和背景的CET4作文评分人对30篇CET4模拟作文评分并提供三条评分理由,然后对另外10篇作文进行有声思维。评分结束一周后,评分人收到包括多层面Rasch模型(MFRM)的分析结果(严厉度、内在一致性和偏差)以及评分理由编码分析结果的个人反馈信息报告。阅读完反馈报告后,评分人接着对新30篇CET4模拟作文评分并对另外10篇作文进行有声思维(其中5篇作文和前测相同)。本研究通过对比分析反馈前、后相同的5篇作文的有声思维数据,结果发现反馈信息能帮助评分人重视评分标准相关特征,并调整自己的决策行为。 This study investigates the effectiveness of individualized feedback to raters from the perspective of the change in decision-making. First, three CET4 accredited essay raters with distinctive experience and background were invited to rate 30 mock essays, then to write and to rank order three reasons for their ratings simultaneously. Afterwards, they were trained on thinking aloud and then asked to rate 10 new essays while thinking aloud. One week later, raters received the feedback including results of the Multi-Facet Rasch Model （MFRM） analysis and coding analysis of reasons for their rating. Shortly after reading, raters were asked to rate a new set of 30 mock essays, and then rate another set of 10 essays （among which half are the same to the pretest） while thinking aloud. A detailed comparative analysis of raters＇ protocols before and after feedback on the 5 common essays indicated that the feedback is useful in helping raters derive scores with a heavier weight of rubric-related features, and make adjustments to their decision-making.

作者徐鹰

机构地区华南理工大学外国语学院

出处《外语测试与教学》 2015年第1期1-11,共11页 Foreign Language Testing and Teaching

基金广东省教育科学"十二五"规划项目"大规模考试中评分人决策风格的诊断研究"(批准号:2013JK013) 广州市社科规划青年项目"大规模语言测试评分人反馈信息有效性的实证研究--以CET4为例"(批准号:14Q11) 广东省高等学校教学质量与教学改革工程项目"大学英语课程多元评估体系建设与实践"(批准号:X2WY/N913078a)的阶段性研究成果

关键词个性化反馈多层面RASCH模型有声思维 individualized feedback Multi-Facet Rasch Model （MFRM） think aloud

分类号 H319 [语言文字—英语]

引文网络
相关文献

参考文献36

1Baker B A. Individual differences in rater decision-making style: An exploratory mixed-methods study [ J].Language Assessment Quarterly, 2012, 9: 225-248.
2Barkaoui K. Participants, texts, and processes in ESL/EFL essay tests: A narrative review of the literature[J] . Canadian Modern Language Review/La Revue canadienne des langues vivantes, 2007 , 64: 99-134.
3Barkaoui K. Think-aloud protocols in research on essay rating: An empirical study of their veridicality and re-activity [J]. Language Testing,2011,28: 51-75.
4Bejar I 1. Rater cognition: Implications for validity [ J] . Educational Measurement: Issues and Practice,2012,31: 2-9.
5Bonk W J & Ockey G J. A Many-facet Rasch analysis of the second language group oral discussion task [J].Language Testing,2003 , 20: 89-110.
6Cherry R D & Meyer P R. Reliability issues in holistic assessment [A]. In Williamson M M & Huot B A(eds. ). Validating Holistic Scoring for Writing Assessment : Theoretical and Empirical Foundations [ C ].Cresskill, NJ: Hampton Press, Inc., 1993. 109-141.
7Crisp V. An investigation of rater cognition in the assessment of projects [ J]. Educational Measurement: Issuesand Practice, 2012,31: 10-20.
8Cumming A, Kantor R & Powers D E. Decision making while rating ESL/EFL writing tasks: A descriptiveframework [ J]. The Modern Language Journal,2002,86: 67-96.
9Elder C, Knoch U,Barkhuizen G & Von Randow J. Individual feedback to enhance rater training: Does itwork? [ J]. Language Assessment Quarterly, 2005 , 3 : 175-196.
10Erdosy M U. Exploring variability in judging writing ability in a second language: A study of four experiencedraters of ESL compositions [ R]. Princeton, NJ : Educational Testing Service, 2004.

二级参考文献35

1Bachman, L. F., B. K. Lynch, & M. Mason. Investigating variability in tasks and rater judgments in a performance test of foreign language speaking [ J ]. Language Testing, 1995,12 : 238 - 257.
2Barrett, S. The impact of training on rater variability[ J]. International Education Journal ,2001,2:49 -58.
3Bernardin, H. J. & E. C. Pence. Effects of rater training: Creating new response sets and decreasing accuracy [ J ]. Journal of Applied Psychology, 1980,65 ( 60 - 66 ).
4Bonk, W. J. & G. J. Ockey. A many-facet Rasch analysis of the second language group oral discussion task [ J ]. Language Testing, 2003,20( 1 ) :89 - 110.
5Brown, W. L. , K. OGorman, & Y. Du. The Reliability and Validity of Mathematics Performance Assessment [ P ]. Paper presented at the Annual Meeting of the American Educational Research Association, Minnesota, 1996.
6Buu, Y. -P. Statistical analysis of rater effects[ D]. Unpublished PhD thesis, University of Florida, Florida ,2003.
7Cronbach, L. J. Essentials of Psychological Testing[ M] (Sth ed. ). New York: Haper and Row,1990.
8Eckes, T. Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis [ J ]. Language Assessment Quarterly,2005,2 ( 3 ) : 197 - 221.
9Eckes, T. Rater types in writing performance assessments: A classifi- cation approach to rater variability [ J ]. Language Testing, 2008,25 : 155 - 185.
10Elder, C. , U. Knoch, G. Barkhuizen, & J. yon Randow. Individual feedback to enhance rater training: Does it work [ J ]. Language Assessment Quarterly,2005,2 : 175 - 196.