Abstract
New goodness-of-fit indices are introduced for dichotomous item response theory (IRT) models. These indices are based on the likelihoods of number-correct scores derived from the IRT model, and they provide a direct comparison of the modeled and observed frequencies for correct and incorrect responses for each number-correct score. The behavior of Pearson’s X 2 (S-X 2) and the likelihood ratio G 2 (S-G 2) was assessed in a simulation study and compared with two fit indices similar to those currently in use (Q1-X 2 and Q 1-G 2). The simulations included three conditions in which the simulating and fitting models were identical and three conditions involving model misspecification. S-X 2 performed well, with Type I error rates close to the expected .05 and .01 levels. Performance of this index improved with increased test length. S-G 2 tended to reject the null hypothesis too often, as did Q 1-X 2 and Q 1-G 2. The power of S-X 2 appeared to be similar for all test lengths, but varied depending on the type of model misspecification.
Get full access to this article
View all access options for this article.
