摘要
测验等值对于考试的公平性、题库建设以及计算机自适应测验都具有重要的意义。以广东省佛山市"升中"考试为例,对3PLM和GRM混合模型下的IRT五种等值方法进行比较,IRT五种等值方法分别是MM、MS、RMS、SL和HA方法。比较以Tucker方法的等值结果作为标准,并选用误差平均差和标准加权均方差作为评价指标。结果发现:(1)IRT五种等值方法中,Haebara特征曲线法(HA)最优;(2)客观题等值结果最为精确,主观题等值误差最大,包括客观题与主观题的混合题的等值误差介于两者之间;(3)等值方法与题型有交互作用,客观题等值方法间差异不显著,而主观题或混合题等值方法间出现显著差异。
Test equating is of great significance for the examination of the fairness, the construction of the test question bank and the computer adaptive test. Exemplifying high school entering examination data in the province, this paper compared five equating methods of item response theory based on mixed model of3 PLM and GRM. The results of Tucker liner method were used as the evaluation criteria. And we chose the average error and the standard weighted variance to be the two evaluating indicator. After comparing the five IRT methods, MM method, MS method, RMS method, SL method and HA method, the results reached the following conclusions:(1) In a test of mixture scoring item type, when using the standard weighted variance as evaluating indicator, Haebara method among five kinds of IRT test equating methods was best;(2) Main effect of the item type was significant. The result of objective question was the most accurate, and the subjective problem was the largest. The equivalence error between the objective and subjective questions was between the two.(3) The interaction with equating methods of scoring items types was significant. There was no significant difference among objective questions, but there were significant differences among subjective or mixed questions.
作者
黎光明
LI Guangming(School of Psychology,Center for Studies of Psychological Application,South China Normal University,Guangzhou 510631)
出处
《心理研究》
2018年第4期351-357,共7页
Psychological Research
基金
国家自然科学基金面上项目(31470050)
广东省哲学社会科学“十三五”规划一般项目(GD17CXL01)
广东省2015度高等教育教学改革项目(粤教高函[2015]173号)
广州市哲学社会科学“十三五”规划一般项目(2017GZYB111)
华南师范大学2014年度校级高等教育教学研究和改革项目(教学[2014]52号)