Sage Journals: Discover world-class research

Abstract

The purpose of this study was to thoroughly examine the performance of three information-based fit indices—Akaike’s Information Criterion (AIC), Bayesian Information Criterion (BIC), and sample-size-adjusted BIC (SABIC)—using the log-linear cognitive diagnosis model and a set of well-known item response theory (IRT) models. Two simulation studies were conducted to examine the extent to which relative fit indices can identify the generating model under a variety of data conditions and model misspecifications. Generally, indices performed better when item quality was stronger. When the IRT was the generating model, all three indices correctly selected the IRT model for all replications. When the true model was a diagnostic classification model, for all three fit indices, the multidimensional IRT model was incorrectly selected as frequently as 70% of the replications. The results of this study identify situations for researchers where commonly used—and typically well-performing—fit indices may not be appropriate to compare models for selection.

Keywords

diagnostic classification models log-linear cognitive diagnosis model model selection relative fit

Get full access to this article

View all access options for this article.

References

Akaike

(1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716-723.

Bozdogan

(1987). Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52, 345-370.

Bradshaw

Cohen

(2010, May). Accuracy of multidimensional item response model parameters estimated under small sample sizes. In Izsák

(Chair), Using cognitive attributes to develop mathematics assessments, opportunities, and challenges. Symposium conducted at the annual American Educational Research Association conference in Denver, CO.

Bradshaw

Madison

M. J.

(2016). Invariance properties for general diagnostic classification models. International Journal of Testing, 16, 99-118.

Chen

de la Torre

Zhang

(2013). Relative and absolute fit evaluation in cognitive diagnosis modeling. Journal of Educational Measurement, 50, 123-140.

de la Torre

(2008). An empirically based method of Q-matrix validation for the DINA model: Development and applications. Journal of Educational Measurement, 45, 343-362.

de la Torre

(2009). DINA model and parameter estimation: A didactic. Journal of Educational and Behavioral Statistics, 34, 115-130.

de la Torre

(2011). The generalized DINA model framework. Psychometrika, 76, 179-199.

de la Torre

Douglas

J. A.

(2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69, 333-353.

10.

de la Torre

Douglas

J. A.

(2008). Model evaluation and multiple strategies in cognitive diagnosis: An analysis of fraction subtraction data. Psychometrika, 73, 595-624.

11.

DiBello

L. V.

Roussos

L. A.

Stout

W. F.

(2007). Review of cognitively diagnostic assessment and a summary of psychometric models. In Rao

C. R.

Sinharay

(Eds.), Handbook of Statistics: Vol. 26. Psychometrics (pp. 979-1030). Amsterdam, The Netherlands: Elsevier.

12.

Hansen

Cai

Monroe

(2016). Limited-information goodness-of-fit testing of diagnostic classification item response theory models. British Journal of Mathematical and Statistical Psychology, 69, 225-252.

13.

Hartz

S. M.

(2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality (Unpublished doctoral dissertation). University of Illinois at Urbana-Champaign, Urbana, IL.

14.

Henson

J. M.

Reise

S. P.

Kim

K. H.

(2007). Detecting mixtures from structural model differences using latent variable mixture modeling: A comparison of relative model fit statistics. Structural Equation Modeling: A Multidisciplinary Journal, 14, 202-226.

15.

Henson

Templin

Willse

(2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74, 191-210.

16.

Jedidi

Jagpal

H. S.

DeSarbo

W. S.

(1997). Finite-mixture structural equation models for response-based segmentation and unobserved heterogeneity. Marketing Science, 16, 39-59.

17.

Junker

B. W.

Sijtsma

(2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258-272.

18.

Jurich

D. P.

(2014). Assessing model fit of multidimensional item response theory and diagnostic classification models using limited-information statistics (Unpublished doctoral dissertation). James Madison University, Harrisonburg, VA.

19.

Kunina-Habenicht

Rupp

A. A.

Wilhelm

(2012). The impact of model misspecification on parameter estimation and item-fit assessment in log-linear diagnostic classification models. Journal of Educational Measurement, 49, 59-81.

20.

Lai

Gierl

M. J.

Cui

(2012, April). Item consistency index: An item-fit index for Cognitive Diagnostic Assessment. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, British Columbia, Canada.

21.

Cohen

A. S.

Kim

S.-H.

Cho

S.-J.

(2009). Model selection methods for mixture dichotomous IRT models. Applied Psychological Measurement, 33, 353-373.

22.

Lubke

Neale

M. C.

(2006). Distinguishing between latent classes and continuous factors: Resolution by maximum likelihood?Multivariate Behavioral Research, 41, 499-532.

23.

McLachlan

Peel

(2000). Finite mixture models. New York, NY: Wiley.

24.

Muthén

L. K.

Muthén

B. O.

(1998/2014). Mplus user’s guide (7th ed.). Los Angeles, CA: Author.

25.

Nylund

K. L.

Asparouhov

Muthen

B. O.

(2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling, 14, 535-569.

26.

Preinerstorfer

Formann

A. K.

(2012). Parameter recovery and model selection in mixed Rasch models. British Journal of Mathematical and Statistical Psychology, 65, 251-262.

27.

Rupp

Templin

(2008). Unique characteristics of diagnostic models: A review of the current state-of-the-art. Measurement: Interdisciplinary Research and Perspectives, 6, 219-262.

28.

Rupp

Templin

Henson

(2010). Diagnostic measurement: Theory, methods, and applications. New York, NY: Guilford Press.

29.

Schwarz

(1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464.

30.

Sclove

L. S.

(1987). Application of model-selection criteria to some problems in multivariate analysis. Psychometrika, 52, 333-343.

31.

Shao

(1997). An asymptotic theory for linear model selection. Statistica Sinica, 7, 221-242.

32.

Sinharay

Almond

R. G.

(2007). Assessing fit of cognitive diagnostic models: A case study. Educational and Psychological Measurement, 67, 239-257.

33.

Spiegelhalter

D. J.

Best

N. G.

Carlin

B. P.

van der Linde

(2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B. Statistical Methodology, 64, 583-639.

34.

Templin

(2006). CDM: Cognitive diagnosis modeling with Mplus [Computer software]. Retrieved from http://jtemplin.coe.uga.edu/research/

35.

Templin

Henson

(2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287-305.

36.

Templin

Henson

Douglas

(2006, April). General theory and estimation of cognitive diagnosis models: Using Mplus to derive model estimates. Paper presented at the 2007 National Council on Measurement in Education training session, Chicago, IL.

37.

Tofighi

Enders

C. K.

(2007). Identifying the correct number of classes in a growth mixture model. In Hancock

G. R.

Samuelson

K. M.

(Eds.), Advances in latent variable mixture models (pp. 317-341). Greenwich, CT: Information Age.

38.

von Davier

(2005). A general diagnostic model applied to language testing data (ETS Technical Report No. RR-05–16). Princeton NJ: Educational Testing Service.

39.

Vrieze

S. I.

(2012). Model selection and psychological theory: A discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychological Methods, 17, 228-243.

40.

Wang

Shu

Shang

(2015). Assessing item-level fit for the DINA model. Applied Psychological Measurement, 39, 525-538.

41.

Yang

C. C.

(2006). Evaluating latent class analysis models in qualitative phenotype identification. Computational Statistics & Data Analysis, 50, 1090-1104.

42.

Yang

C. C.

Yang

C. C.

(2007). Separating latent classes by information criteria. Journal of Classification, 24, 183-203.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.11 MB

Comparison of Relative Fit Indices for Diagnostic Model Selection

Abstract

Keywords

Get full access to this article

References

Supplementary Material