1Bay, L. , Chen, L. , Hanson B.A., Happel J. , Kolen M.J. , Miller T. , et al. (1997). ACT' s NAEP redesign project: assessment design is the key to useful and stable assessment results. National Center for Education Statistics( ED ) , Washington, DC.
2Berk, R. A. (1980). A consumers' guide to criterion-referenced test reliability. Journal of Educational Measurement, 17(4) , 323 - 349.
3Bock, R. D., Thissen D., & Zimowski, M. F. (1997). IRT estimation of domain scores. Journal of Educational Measurement, 34 (3), 197-211.
4Cronbach, L. J., Gleser, G. C., Nanda, M., & Rajaratnam, N. (1972). The dependability of behavioral measurements: theory of generalizability for scores and profiles. New York: Wiley.
5Hambleton, R. K. , Swaminathan, H. , Algina, J. , & Coulson D. B. (1978). Criterion-referenced testing and measurement: a review of technical issues and developments. Review of Educational Research, 48, 1-47.
6Hambleton, R. K. (1983). Application of item response models to criterion-referenced assessment. Applied Psychological Measurement, 7(1), 33-44.
7Kaiser, H. F., & Michael, W. B. (1975). Domain validity and generalizability. Educational and Psychological Measurement, 35, 31 -35.
8Kane, M. , & Wilson, J. (1984). Errors of measurement and standard setting in mastery testing. Applied Psychological Measurement, 8 ( 1 ), 107 - 115.
9Lin, M. H. , & Hsiung, C. A. (1994). Empirical bayes estimates of domain scores under binomial and hypergeometric distributions for test scores. Psychometrika, 59 (3), 331 - 359.
10Mazzeo, J. , Kulick, E. Tay-Lim, B. , & Perie M. (2006). Technical report for the 2000 market-basket study in mathematics. ETS NAEP Technical and Research Report Series.