Evaluation of Residual-Based Fit Statistics for Item Response Theory Models in the Presence of Non-Responses

Abstract

Residual-based fit statistics, which compare observed item statistics (e.g., proportions) with model-implied probabilities, are widely used to evaluate model fit, item fit, and local dependence in item response theory (IRT) models. Despite the prevalence of item non-responses in empirical studies, their impact on these statistics has not been systematically examined. Existing software (package) often applies heuristic treatments (e.g., listwise or pairwise deletion), which can distort fit statistics because missing data further inflate discrepancies between observed and expected proportions. This study evaluates the appropriateness of such treatments through extensive simulation. Results show that deletion methods degrade the accuracy of fit testing: fit indices are inflated under both null and power conditions, with the bias worsening as missingness increases. In addition, the impact of missing data exceeds that of model misspecification. Practical recommendations and alternative methods are discussed to guide applied researchers.

Keywords

item response theory missing data fit statistics residual-based statistics model fit item fit local independence

Get full access to this article

View all access options for this article.

References

Allison

P. D.

(2009). Missing data. In Milsap

R. E.

Maydeu-Olivares

(Eds.), The SAGE handbook of quantitative methods in psychology (pp. 72–89). Sage.

Bock

R. D.

Aitkin

(1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459.

Browne

M. W.

(1984). Asymptotically distribution-free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37(1), 62–83.

Cai

(2015). Lord–Wingersky algorithm version 2.0 for hierarchical item factor models with applications in test scoring, scale alignment, and model fit testing. Psychometrika, 80(2), 535–559.

Cai

(2024). flexMIRT: Flexible multilevel multidimensional item analysis and test scoring (Version 3.7) [Computer software]. Vector Psychometric Group, LLC.

Cai

Hansen

(2013). Limited-information goodness-of-fit testing of hierarchical item factor models. British Journal of Mathematical and Statistical Psychology, 66(2), 245–276.

Cai

Monroe

(2014). A new statistic for evaluating item response theory models for ordinal data (CRESST Report No. 839). University of California, National Center for Research on Evaluation, Standards, and Student Testing (CRESST).

Chalmers

R. P.

(2012). Mirt: A multidimensional item response theory package for the R Environment. Journal of Statistical Software, 48(6), 1–29.

Chen

W.-H.

Thissen

(1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265–289.

10.

Chon

K. H.

Lee

W.-C.

Dunbar

S. B.

(2010). A comparison of item fit statistics for mixed IRT models. Journal of Educational Measurement, 47(3), 318–338.

11.

Chung

Cai

(2019). Alternative multiple imputation inference for categorical structural equation modeling. Multivariate Behavioral Research, 54(3), 323–337.

12.

Edwards

M. C.

Houts

C. R.

Cai

(2018). A diagnostic procedure to detect departures from local independence in item response theory models. Psychological Methods, 23(1), 138–149.

13.

Enders

C. K.

(2022). Applied missing data analysis (2nd ed.). Guilford Press.

14.

Finch

(2008). Estimation of item response theory parameters in the presence of missing data. Journal of Educational Measurement, 45(3), 225–245.

15.

Galimard

J.-E.

Chevret

Curis

Resche-Rigon

(2018). Heckman imputation models for binary or continuous MNAR outcomes and MAR predictors. BMC Medical Research Methodology, 18(1), 90.

16.

Graham

J. W.

(2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549–576.

17.

Han

Sinharay

Johnson

M. S.

Liu

(2023). The standardized S-X2 statistic for assessing item fit. Applied Psychological Measurement, 47(1), 3–18.

18.

Hansen

Cai

Stucky

B. D.

Tucker

J. S.

Shadel

W. G.

Edelen

M. O.

(2014). Methodology for developing and evaluating the PROMIS smoking item banks. Nicotine & Tobacco Research, 16(Suppl. 3), S175–S189.

19.

Heitjan

D. F.

Basu

(1996). Distinguishing “missing at random” and “missing completely at random.” The American Statistician, 50(3), 207–213.

20.

Huang

Cai

(2021). Lord–Wingersky algorithm version 2.5 with applications. Psychometrika, 86(4), 973–993.

21.

Joe

Maydeu-Olivares

(2010). A general family of limited information goodness-of-fit statistics for multinomial data. Psychometrika, 75(3), 393–419.

22.

Kang

(2013). The prevention and handling of the missing data. Korean Journal of Anesthesiology, 64(5), 402–406.

23.

Kang

Chen

T. T.

(2008). Performance of the generalized S-X2 item fit index for polytomous IRT models. Journal of Educational Measurement, 45(4), 391–406.

24.

Keller

B. T.

Enders

C. K.

(2023). Blimp user’s guide (version 3). https://www.appliedmissingdata.com/blimp

25.

Little

R. J. A.

(1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association, 83(404), 1198–1202.

26.

Little

R. J. A.

Rubin

D. B.

(2019). Statistical analysis with missing data (3rd ed.). Wiley.

27.

Liu

Maydeu-Olivares

(2013). Local dependence diagnostics in IRT modeling of binary data. Educational and Psychological Measurement, 73(2), 254–274.

28.

Liu

Sriutaisuk

(2020). Evaluation of model fit in structural equation models with ordinal missing data: An examination of the D2 method. Structural Equation Modeling: A Multidisciplinary Journal, 27(4), 561–583.

29.

Liu

Thissen

(2012). Identifying local dependence with a score test statistic based on the bifactor logistic model. Applied Psychological Measurement, 36(8), 670–688.

30.

Liu

Thissen

(2014). Comparing score tests and other local dependence diagnostics for the graded response model. British Journal of Mathematical and Statistical Psychology, 67(3), 496–513.

31.

Lord

F. M.

Wingersky

M. S.

(1984). Comparison of IRT true-score and equipercentile observed-score “equatings.” Applied Psychological Measurement, 8(4), 453–461.

32.

Maydeu-Olivares

(2013). Goodness-of-fit assessment of item response theory models. Measurement: Interdisciplinary Research and Perspectives, 11(3), 71–101.

33.

Maydeu-Olivares

Joe

(2005). Limited- and full-information estimation and goodness-of-fit testing in 2n contingency tables. Journal of the American Statistical Association, 100(471), 1009–1020.

34.

Maydeu-Olivares

Joe

(2006). Limited information goodness-of-fit testing in multidimensional contingency tables. Psychometrika, 71(4), 713–732.

35.

Mislevy

R. J.

Johnson

E. G.

Muraki

(1992). Chapter 3: Scaling procedures in NAEP. Journal of Educational Statistics, 17(2), 131–154.

36.

Myers

T. A.

(2011). Goodbye, listwise deletion: Presenting hot deck imputation as an easy and effective tool for handling missing data. Communication Methods and Measures, 5(4), 297–310.

37.

Newman

D. A.

(2014). Missing data: Five practical guidelines. Organizational Research Methods, 17(4), 372–411.

38.

Organisation for Economic Co-operation and Development. (2024). PISA 2022 technical report. PISA, OECD Publishing.

39.

Orlando

Thissen

(2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 50–64.

40.

Orlando

Thissen

(2003). Further investigation of the performance of S—X2: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement, 27(4), 289–298.

41.

Pepinsky

T. B.

(2018). A note on listwise deletion versus multiple imputation. Political Analysis, 26(4), 480–488.

42.

R Core Team. (2025). R: A language and environment for statistical computing (Version 4.5.1) [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org

43.

Reiser

(1996). Analysis of residuals for the multinomial item response model. Psychometrika, 61(3), 509–528.

44.

Rubin

D. B.

(1976). Inference and missing data. Biometrika, 63(3), 581–592.

45.

Rubin

D. B.

(1987). Multiple imputation for nonresponse in surveys. Wiley-Interscience.

46.

Samejima

(1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika, 34(S1), 1–97.

47.

Sinharay

Monroe

(2025). Assessment of fit of item response theory models: A critical review of the status quo and some future directions. British Journal of Mathematical and Statistical Psychology.

48.

Sriutaisuk

Liu

Chung

Kim

(2025). Evaluating imputation-based fit statistics in structural equation modeling with ordinal data: The MI2S approach. Educational and Psychological Measurement, 85(1), 82–113.

49.

Thissen

Pommerich

Billeaud

Williams

V. S. L.

(1995). Item response theory for scores on tests including polytomous items with ordered responses. Applied Psychological Measurement, 19(1), 39–49.

50.

van Buuren

Groothuis-Oudshoorn

. (2011). Mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67.

51.

Waterbury

G. T.

(2019). Missing data and the Rasch model: The effects of missing data mechanisms on item parameter estimation. Journal of Applied Measurement, 20(2), 154–166.

52.

Buysse

D. J.

Germain

Moul

D. E.

Stover

Dodds

N. E.

Johnston

K. L.

Pilkonis

P. A.

(2012). Development of short forms from the PROMISTM sleep disturbance and sleep-related impairment item banks. Behavioral Sleep Medicine, 10(1), 6–24.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.84 MB