摘要
本研究使用多元概化理论分析2007-2010年心理学专业基础综合考试。结果表明:1.从考查的学科内容看,心理统计与测量、普通心理学的测量精度较高,而发展与教育心理学、实验心理学的测量精度偏低;2.从设置的题型看,多选题的测量精度偏低,其他题型的测量精度较高;减少单选题数量、增加多选题数量可在保障全卷测量精度的基础上大幅提高多选题的测量精度;3.全卷测量精度很高,不同年度的试卷在学科内容和题型结构上可看成是"平行"试卷。
Starting from 2007 applicants for the psychology graduate enrollment are to take the National Entrance Examination in China. The comprehensive entrance test, which is developed by NEEA, has 83 items. It has four sub-tests: general psychology, developmental and educational psychology, experimental psychology, psychological statistics and measurement, and consists of four item types: multiple-choice questions with single-correct answers, multiple-choice questions with multi-correct answers, short-answer questions and comprehensive essay questions. The main purpose of this study was to examine psychology entrance tests from a multivariate generalizability theory perspective by means of a series of multivariate generalizability (G) studies and decision (D) studies. Specifically, with the stratified sampled data collected over four years (from 2007 to 2010 administrations), a multivariate generalizability analysis is conducted for each set of data. Various results were "averaged" over years. The results show that 1. Seen from the content tested, on average, the generalizability coefficients for developmental and educational psychology, and the generalizability coefficients for experimental psychology as well were smaller (. 6 below), which shows the poorer reliability of the two sub-tests than others; 2. Seen from the item type designed, multiple choices with multi-correct answers show a poorer reliability (between . 46 and . 65) than others. With item types combined within an assessment, it is important to consider the reliability of scores for each item type and the reliability of composite scores. Changing the number of items within sections can lead to increased composite score reliability in some cases. In the preseut study, D studies demonstrate that the reliability of multiple choices with multi-correct answers would be improved with the increase of the sample's own size and the decrease of the sample size of multiple choices with single-correct answers while the effects on reliability of composite scores can be ignored. 3. For the composite scores, the generalizability coefficients for the four sub-tests/four item types are similar and relatively high in magnitude (between . 88 and . 94), which means the reliability of the total psychology entrance test is very good. 4. The G study results indicate that variance and covarianee component estimates for four administrations are similar and relatively stable. That is, the four forms constructed on the basis of the table of specifications are quite "parallel" to one another. The generalizability theory can be used to estimate the impact of multiple sources of error on composite score reliability. The sample of research questions considered in this study show a potential usefulness of the generalizability theory in studying test structure issues commonly encountered by measurement specialists. The multivariate generalizability ananlysis of the four administrations of psychology entrance tests will provide a valuable reference for the future revision of the "syllabus" and also contribute to improving test development in the future.
出处
《心理科学》
CSSCI
CSCD
北大核心
2011年第4期950-956,共7页
Journal of Psychological Science
基金
"教育考试国家题库的研究与识应用"项目(GFA097013)的资助