Sage Journals: Discover world-class research

Abstract

When tests are made up of testlets, standard item response theory (IRT) models are often not appropriate due to the local dependence present among items within a common testlet. A testlet-based IRT model has recently been developed to model examinees' responses under such conditions (Bradlow, Wainer, & Wang, 1999). The Bradlow, Wainer, and Wang model introduces separate testlet factors to account for this dependence and applies a common item discrimination parameter to both the general ability and testlet factor. This study investigates several alternative ways of accounting for local dependence that make different assumptions regarding the influence of testlet factors on item performance. The authors implement several Bayesian model selection criteria to compare models using several real test data sets that have a testlet structure. Results suggest that an alternative model in which separate discrimination parameters are applied to the general ability and testlet factors provides a better fit to these data despite its greater complexity. Index terms: item response theory, Bayesian model comparison, Markov chain Monte Carlo, testlets

Get full access to this article

View all access options for this article.

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Csáki (Eds.), Proceedings, 2nd International Symposium on Information Theory (pp. 267-281). Budapest: Akadémiai Kiado

Béguin, A. A. , & Glas, C. A. W. (2001). MCMC estimation of multidimensional IRT models. Psychometrika, 66, 541-561.

Berger, J. O. , & Pericchi, L. R. (1996). The intrinsic Bayes factor for linear models. In J. M. Bernardo , J. O. Berger , A. P. Dawid , & A. F. M. Smith (Eds.), Bayesian statistics (Vol. 5, pp. 25-44). Oxford, UK: Oxford University Press.

Bradlow, E. T. , Wainer, H. , & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153-168.

Geisser, S. , & Eddy, W. (1979). A predictive approach to model selection. Journal of American Statistical Association, 74, 153-160.

Gelfand, A. E. (1996). Model determination using sampling-based methods. In W. R. Gilks , S. Richardson , & D. J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 145-162). Washington, DC: Chapman & Hall.

Gelfand, A. E. , Dey, D. K. , & Chang, H. (1992). Model determination using predictive distributions with implementation via sampling-based methods. In J. M. Bernardo , J. O. Berger , A. P. Dawid , & A. F. M. Smith (Eds.), Bayesian statistics (Vol. 4, pp. 147-167). Oxford, UK: Oxford University Press.

Gelman, A. , Carlin, J. B. , Stern, H. S. , & Rubin, D. B. (1996). Bayesian data analysis. London: Chapman & Hall.

Gibbons, R. D. , & Hedeker, D. R. (1992). Fullinformation bi-factor analysis. Psychometrika, 57, 423-436.

10.

Glas, C. A. W. , Wainer, H. , & Bradlow, E. T. (2000). MML and EAP estimation in testletbased adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 271-287). Boston: Kluwer-Nijhoff.

11.

O'Hagan, A. (1995). Fractional Bayes factors for model comparison. Journal of the Royal Statistical Society B, 57, 99-138.

12.

Raftery, A. E. (1996). Hypothesis testing and model selection. In W. R. Gilks , S. Richardson , & D. J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 163-187). Washington, DC: Chapman & Hall.

13.

Raftery, A. E. , & Lewis, S. M. (1996). Implementing MCMC. In W. R. Gilks , S. Richardson , & D. J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 115-130). London: Chapman & Hall.

14.

Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464.

15.

Sinharay, S. , & Johnson, M. S. (2003). Simulation studies applying posterior predictive model checking for assessing fit of the common item response theory models (Res. Rep. RR-03-28). Princeton, NJ: Educational Testing Service.

16.

Spiegelhalter, D. , Thomas, A. , & Best, N. (2003). Win BUGS version 1.4 [Computer program]. Cambridge, UK: MRC Biostatistics Unit, Institute of Public Health.

17.

Spiegelhalter, D. J. , Best, N. G. , Carlin, B. P. , & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society B, 64, 583-640.

18.

Wainer, H. , Bradlow, E. T. , & Du, Z. (2000). Testlet response theory: An analog for the 3PL useful in adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245-270). Boston: Kluwer-Nijhoff.

19.

Wang, X. , Bradlow, E. T. , & Wainer, H. (2002). A general Bayesian model for testlets: Theory and applications. Applied Psychological Measurement, 26, 109-128.

20.

Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187-213.

A Comparison of Alternative Models for Testlets

Abstract

Get full access to this article

References