摘要
本研究通过高中英语阅读测验实测数据,对比分析双参数逻辑斯蒂克模型(2PL-IRT)和加入不同数量题组的双参数逻辑斯蒂克模型(2PL-TRT),探究题组数量对参数估计及模型拟合的影响。结果表明:1.2PL-IRT模型对能力介于-1.50到0.50的被试,能力参数估计偏差较大;2.将题组效应大于0.50的题组作为局部独立题目纳入模型,会导致部分题目区分度参数的低估和大部分题目难度参数的高估;3.题组效应越大,将其当作局部独立题目纳入模型估计项目参数的偏差越大。
Testlet is common in reading comprehension tests. Compared to traditional tests which consist of several single items, test with testlet can not only reduce test time and cost, but also build tasks which are more similar to the real-world situations to improve the validity of the test. However, if reading materials within testlet have different impacts on examinees with different knowledge backgrounds, the testlet effect occurs. As is shown in previous studies, when testlet effect exists, the estimates of item parameters are biased if traditional IRT model is applied. To solve this problem, researchers extended the Testlet Response Theory by adding testlet parameter into standard IRT models. This article summarized the models dealing with testlet effect, and then analyzed data from a high school English reading comprehension test, which consisted of one cloze test and five reading comprehension tests. Item types were multiple-choice items, including 4 options and 5-answers-out-of-7-options items. The sample size of this research was 934. Two different kinds of measurement models were compared for this kind of situation, which were two-parameter logistic item response(2 PLIRT) model and 5 two-parameter logistic testlet item response(2 PL-TRT) models. Each of the 2 PL-TRT model had different number of testlets. The most complicated model(5 T_TRT) contained all 5 reading comprehension testlets. Then, according to the magnitude of testlet effect, the number of testlets was reduced in the models. The simplest 2 PL-TRT model(1 T_TRT) only contained one testlet, which had the largest testlet effect. Firstly, to ensure that all of the reading comprehensions violated the Local Independent Hypothesis(LID), the Q3 values of each testlet were calculated in R. As expected, the absolute values of all 5 reading comprehensions’ Q3 values exceeded 0.20, which indicated all 5 reading comprehensions violated the Local Independent Hypothesis. After that, the ability estimates of every examinee, the estimates of discrimination parameter(a) and difficulty parameter(b) of every item were estimated by SCORIGHT 3.0. For the estimates of ability, there was no obvious difference among 5 two-parameterlogistic-testlet item response models. For the examinees with abilities between-1.50 to 0.50, IRT model would lead to biased estimates of ability parameter. For the estimates of item parameters, if the testlet effect of a reading comprehension test reached 0.50, the items within the test should not be viewed as local independent items. Therefore, these items should be analyzed by the TRT model as a testlet. Otherwise, if these items were regarded as local independent items mistakenly, the estimates of item parameters would be seriously biased. The bias increased as the testlet effect became larger. According to the results, in practice, if a good balance between the accuracy of parameter estimates and the simplicity of models is desired, it is necessary to take two things into consideration: the type of parameter and the magnitude of testlet effect. In addition, researchers accentuated the importance of the rationality of the reading materials. To avoid testlet effects, it is important to take the article subject and item types into consideration before test construction.
作者
马洁
刘红云
Ma Jie;Liu Hongyun(Faculty of Psychology, Beijing Normal University, Beijing,100875;Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing,100875)
出处
《心理科学》
CSSCI
CSCD
北大核心
2018年第6期1374-1381,共8页
Journal of Psychological Science
基金
国家自然科学基金项目(31571152)
北京市与中央在京高校共建项目(019-105812)
国家教育考试科研规划2017年度课题(GJK2017015)的资助
关键词
项目反应理论
题组反应理论
题组模型
模型选择
item response theory
testlet response theory
testlet response model
model selection