Sage Journals: Discover world-class research

Abstract

It has been shown that fundamental assumptions associated with conventional one-factor measurement models are frequently violated in analyses of scores from a test composed of testlets. Eight different measurement models were conceptualized for this kind of situation, and the goodness of fit of each model was examined. Conventional essentially tauequivalent and congeneric models present worse model fit to data and overestimate the reliability when testlets are involved. The one-factor congeneric model with correlated error specifications seems to be the best measurement model for a test composed of testlets if dichotomously scored items are used as the unit of analysis. However, in estimating score reliability for tests composed of testlets, the one-factor essentially tauequivalent model with correlated error specifications also provides good estimates. Measurement models using passage (testlet) scores would be alternatives for analyzing scores from tests composed of testlets when passage (testlet) scores are used as the unit of analysis.

Get full access to this article

View all access options for this article.

References

Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107, 238-246.

Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29-51.

Bollen, K. A. (1989a). A new incremental fit index for general structural equation models. Sociological Research and Methods, 17, 303-316.

Bollen, K. A. (1989b). Structural equations with latent variables. New York: John Wiley.

Chen, W.-H. , & Thissen, D. (1997). Local dependence indexes for item response theory. Journal of Educational and Behavioral Statistics, 22, 265-289.

Cliff, N. (1988). The eigenvalues-greater-than-one rule and the reliability of components. Psychological Bulletin, 103, 276-279.

Feldt, L. S. , & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (pp. 105-146). Washington, DC: American Council on Education.

Hoover, H. D. , Hieronymus, A. N. , Frisbie, D. A. , & Dunbar, S. B. (1994). Iowa Tests of Basic Skills: Interpretive guide for school administrators. Chicago: Riverside.

Hu, L. , & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 3, 424-453.

10.

Jöreskog, K. G. , & Sörbom, D. (1993a). LISREL8 user’s reference guide. Chicago: Scientific Software International.

11.

Jöreskog, K. G. , & Sörbom, D. (1993b). PRELIS2 user’s reference guide. Chicago: Scientific Software International.

12.

Kaiser, H. F. (1970). A second generation little jiffy. Psychomerika, 35, 401-415.

13.

Lee, G. , Brennan, R. L. , & Frisbie, D. A. (2000). Incorporating the testlet concept in test score analyses. Educational Measurement: Issues and Practice, 19, 9-15.

14.

Lee, G. , & Frisbie, D. A. (1999). Estimating reliability under a generalizability theory model for test scores composed of testlets. Applied Measurement in Education, 12, 237-255.

15.

Marsh, H. W. , Balla, J. R. , & Hau, K. (1996). An evaluation of incremental fit indices: A clarification of mathematical and empirical properties. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling: Issues and techniques (pp. 315-353). Mahwah, NJ: Lawrence Erlbaum.

16.

Qualls, A. L. (1995). Estimating the reliability of a test containing multiple item formats. Applied Measurement in Education, 8(2), 111-120.

17.

Schumacker, R. E. , & Lomax, R. G. (1996). A beginner’s guide to structural equation modeling. Mahwah, NJ: Lawrence Erlbaum.

18.

Sireci, S. G. , Thissen, D. , & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28(3), 237-247.

19.

Sugawara, H. M. , & MacCallum, R. C. (1993). Effect of estimation method on incremental fit indexes for covariance structure models. Applied Psychological Measurement, 17, 365-377.

20.

Wainer, H. , & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24(3), 185-201.

21.

Wainer, H. , & Lewis, C. (1990). Toward a psychometrics for testlets. Journal of Educational Measurement, 27(1), 1-14.

22.

Wainer, H. , & Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability? Educational Measurement: Issues and Practice, 15(1), 22-29.

23.

Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125-145.

24.

Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187-213.

The Relative Appropriateness of Eight Measurement Models for Analyzing Scores from Tests Composed of Testlets

Abstract

Get full access to this article

References