Sage Journals: Discover world-class research

Abstract

Lack of sufficient reliability is the primary impediment for generating and reporting subtest scores. Several current methods of subscore estimation do so either by incorporating the correlational structure among the subtest abilities or by using the examinee’s performance on the overall test. This article conducted a systematic comparison of four subscoring methods—multidimensional scoring (MS), augmented scoring (AS), higher order item response model scoring (HO), and objective performance index scoring (OPI)—by examining how test length, number of subtests or domains, and correlation between the abilities affect the subtest ability estimation. The correlation-based methods (i.e., MS, AS, and HO) provided largely similar results, and performed best under conditions involving multiple short subtests and highly correlated abilities. In most of the conditions considered, the OPI method performed poorer compared to other methods on both ability estimates and proportion correct scores. Real data analysis further underscores the similarities and differences between the four subscoring methods.

Keywords

item response theory multidimensional IRT higher order IRT augmented scoring objective performance index ability estimation Bayesian estimation Markov chain Monte Carlo

Get full access to this article

View all access options for this article.

References

Bock, R.D. , & Aitken, M. ( 1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika , 46, 443-459.

Brooks, S.P. , & Gelman, A. ( 1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics , 7, 434-455.

Carrol, J.B. ( 1993). Human cognitive abilities: A survey of factor-analytic studies. Cambridge, England: Cambridge University Press.

Cronbach, L.J. , & Snow, R.E. ( 1977). Aptitude and instructional methods. New York, NY: Irvington.

CTB/McGraw-Hill. ( 1991). Technical bulletin 1 of California Achievement Tests Forms C and D. Monterey, CA: Author .

de la Torre, J. ( 2008). Multidimensional scoring of abilities: The ordered polytomous response case. Applied Psychological Measurement , 32, 355-370.

de la Torre, J. ( 2009). Improving the quality of ability estimates through multidimensional scoring and incorporation of ancillary variables. Applied Psychological Measurement, 33, 465-485.

de la Torre, J. , & Hong, Y. ( 2010). Parameter estimation with small sample size: A higher-order IRT model approach. Applied Psychological Measurement , 34, 267-285.

de la Torre, J. , & Patz, R.J. ( 2005). Making the most of what we have: A practical application of MCMC in test scoring. Journal of Educational and Behavioral Statistics, 30, 295-311.

10.

de la Torre, J. , & Song, H. ( 2009). Simultaneous estimation of overall and domain abilities: A higher-order IRT model approach. Applied Psychological Measurement , 33, 620-639.

11.

Doornik, J.A. ( 2003). Object-oriented matrix programming using Ox (Version 3.1) [Computer software]. London, England: Timberlake Consultants Press.

12.

Edwards, M.C. , & Vevea, J.L. ( 2006). An empirical Bayes approach to subscore augmentation: How much strength can we borrow? Journal of Educational and Behavioral Statistics, 31, 241-259.

13.

Gelman, A. , Carlin, J.B. , Stern, H.S. , & Rubin, D.B. ( 2003). Bayesian data analysis (2nd ed.). New York, NY: Chapman.

14.

Gibbons, R.D. , & Hedeker, D.R. ( 1992). Full-information item bi-factor analysis. Psychometrika, 57, 423-436.

15.

Kelly, T.L. ( 1927). The interpretation of educational measurement . New York: World Book.

16.

Kim, J.K. , & Nicewander, W.A. (1993). Ability estimation for conventional tests. Psychometrika, 58, 587-599.

17.

Lehmann, E.L. , & Casella, G. ( 1998). Theory of point estimation. New York, NY: Springer.

18.

Li, Y.H. , & Schafer, W.D. ( 2005). Trait parameter recovery using multidimensional computerized adaptive testing in reading and mathematics. Applied Psychological Measurement, 29, 3-25.

19.

Mislevy, R.J. ( 1987). Exploiting auxiliary information about examinees in the estimation of item parameters. Applied Psychological Measurement , 11, 81-91.

20.

Mislevy, R.J. , & Sheehan, K.M. ( 1989). The role of collateral information about examinees in item parameter estimation. Psychometrika, 54, 661-679.

21.

Mislevy, R.J. , Sheehan, K.M. , & Wingersky, M. ( 1993). How to equate tests with little or no data. Journal of Educational Measurement, 30, 55-78.

22.

Monfils, L. , Dawber, T. , Han, N. , & Henderson-Montero, D. (2006, April). Supporting reform efforts through diagnostic subscore reports: Implications for schools. Paper presented at the Annual Meeting of the National Council on Measurement in Education, San Francisco, CA.

23.

Reckase, M.D. ( 1996). A linear logistic multidimensional model. In W. J. van der Linder & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 271-286). New York: Springer .

24.

Segall, D.O. ( 1996). Multidimensional adaptive testing. Psychometrika , 61, 331-354.

25.

Tang, K.L. , & Eignor, D.R. ( 2001). A study of the use of collateral statistical information in attempting to reduce TOEFL IRT item parameter estimation sample sizes (TOEFL Technical Report). Princeton, NJ: Educational Testing Service.

26.

Thissen, D. , & Edwards, M.C. ( 2005, April). Diagnostic scores augmented using multidimensional item response theory: Preliminary investigation of MCMC strategies. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Quebec, Canada.

27.

Thissen, D. , & Orlando, M. ( 2001). Item response theory for items scored in two categories . In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 73-140). Mahwah, NJ: Erlbaum.

28.

Wainer, H. , Vevea, J.L. , Camacho, F. , Reeve, B.B., III , Rosa, K. , Nelson, L. , Swygert, K.A. , & Thissen, D. ( 2001). Augmented scores-‘‘Borrowing strength’’ to compute score based on small numbers of items. In D. Thissen , & H. Wainer (Eds.), Test scoring (pp. 343-388). Mahwah, NJ : Erlbaum .

29.

Wang, W.C. , & Chen, P.H. ( 2004). Implementation and measurement efficiency of multidimensional computerized adaptive testing. Applied Psychological Measurement , 28, 295-316.

30.

Wang, W.C. , Chen, P.H. , & Cheng, Y.Y. ( 2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods , 9, 116-136.

31.

Yen, W.M. ( 1987, June). A Bayesian/IRT index of objective performance . Paper presented at the annual meeting of the Psychometric Society , Montreal, Quebec, Canada.

32.

Yen, W.M. , Sykes, R.C. , Ito, K. , & Julian, M. ( 1997, April). A Bayesian/IRT index of objective performance for tests with mixed item types. Paper presented at the annual meeting of the National Council on Measurement in Education , Chicago, IL.

A Comparison of Four Methods of IRT Subscoring

Abstract

Keywords

Get full access to this article

References