期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
DECISION MAKING WHILE SCORING EFL TAPE-MEDIATED SPEAKING TEST PERFORMANCE
1
作者 王海贞 《Chinese Journal of Applied Linguistics》 2008年第4期17-28,16+128,共14页
This study investigates how raters make their scoring decisions when assessing tape-mediated speaking test performance. 24 Chinese EFL teachers were trained before scoring analytically five sample tapes selected from ... This study investigates how raters make their scoring decisions when assessing tape-mediated speaking test performance. 24 Chinese EFL teachers were trained before scoring analytically five sample tapes selected from TEM4-Oral, a national EFL speaking test designed for college English major sophomores in China. The raters' verbal reports concerning what they were thinking about while making their scoring decisions were audio-recorded and collected during and immediately after each assessment. Post-scoring interviews were used as supplements to the probe of the scoring process. A qualitative analysis of the data showed that the raters tended to give weight to the content, to punish both grammar and pronunciation errors and to reward the use of impressive and uncommon words. Moreover, the whole decision-making process was proved to be cyclic in nature. A flow chart describing the cyclic process of hypothesis forming and testing was then proposed and discussed. 展开更多
关键词 decision-making process raters tape-mediated speaking test TEM4
原文传递
STUDY OF SOURCES OF SCORE VARIABILITY IN PERFORMANCE ASSESSMENT USING MFRM:A CASE OF SPEAKING TEST IN PETS BAND3 被引量:4
2
作者 张洁 何莲珍 《Chinese Journal of Applied Linguistics》 2008年第4期40-49,128,共11页
As direct measure of learners' communicative language ability, performance assessment (typically writing and speaking assessment) claims construct validity and a strong power for predictive utility of test scores.... As direct measure of learners' communicative language ability, performance assessment (typically writing and speaking assessment) claims construct validity and a strong power for predictive utility of test scores. However, it is also of common concern that the subjectivity of rating process and the potential unfairness for test takers who encounter different writing prompts and speaking tasks would constitute threats to reliability and validity of test scores, especially in those large-scale and high-stakes tests. Therefore, appropriate means for quality control of subjective scoring should be held essential in test administration and validation. Based upon raw scores from one administration of speaking test in PETS Band3 held in Hangzhou, the present study investigates and models possible sources of score variability within the framework of Many-Facet Rasch Model (MFRM). MFRM conceptualizes the possibility of a examinee being awarded a certain score as a function of several facets — examinee ability, rater severity, domain difficulty and step difficulty between the adjacent score categories and provides estimates of the extent to which the examinee's test score is influenced by those facets. Model construction and data analysis was carried out in FACETS Version 3.58, computer program for conducting MFRM analysis. The results demonstrate statistically significant differences within each facet. Despite the generally acceptable rater consistency across examinees and rating domains, fit statistics indicate some unexpected rating patterns in certain raters such as inconsistency and central tendency, to be avoided through future rater training. Fair scores for each examinee are also provided, minimizing the variability due to facets other than examinees' ability. MFRM manifests itself as effective in detecting whether each test method facet functions as intended in performance assessment and providing useful feedback for quality control of subjective scoring. 展开更多
关键词 PETS speaking test quality control many-facet Rasch model(MFRM)
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部