Multiple Imputation of Missing Data for Multilevel Models

Abstract

Multiple imputation (MI) is one of the principled methods for dealing with missing data. In addition, multilevel models have become a standard tool for analyzing the nested data structures that result when lower level units (e.g., employees) are nested within higher level collectives (e.g., work groups). When applying MI to multilevel data, it is important that the imputation model takes the multilevel structure into account. In the present paper, based on theoretical arguments and computer simulations, we provide guidance using MI in the context of several classes of multilevel models, including models with random intercepts, random slopes, cross-level interactions (CLIs), and missing data in categorical and group-level variables. Our findings suggest that, oftentimes, several approaches to MI provide an effective treatment of missing data in multilevel research. Yet we also note that the current implementations of MI still have room for improvement when handling missing data in explanatory variables in models with random slopes and CLIs. We identify areas for future research and provide recommendations for research practice along with a number of step-by-step examples for the statistical software R.

Keywords

multilevel missing data multiple imputation random intercept model random coefficients model random slopes cross-level interactions

Get full access to this article

View all access options for this article.

References

Aguinis

Culpepper

S. A.

(2015). An expanded decision-making procedure for examining cross-level interaction effects with multilevel modeling. Organizational Research Methods, 18(2), 155–176. doi:10.1177/1094428114563618

Allison

P. D.

(2001). Missing data. Thousand Oaks, CA: Sage.

Allison

P. D.

(2012). Handling missing data by maximum likelihood. In Proceedings of the SAS Global Forum. Retrieved from http://support.sas.com/

Andridge

R. R.

(2011). Quantifying the impact of fixed effects modeling of clusters in multiple imputation for cluster randomized trials. Biometrical Journal, 53, 57–74. doi:10.1002/ bimj.201000140

Asparouhov

Muthén

B. O.

(2010). Multiple imputation with Mplus (Technical Appendix). Retrieved from http://statmodel.com/

Bartlett

J. W.

Seaman

S. R.

White

I. R.

Carpenter

J. R.

(2015). Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Statistical Methods in Medical Research, 24, 462–487. doi:10.1177/0962280214521348.

Bodner

T. E.

(2008). What improves with increased missing data imputations? Structural Equation Modeling: A Multidisciplinary Journal, 15, 651–675. doi:10.1080/10705510802339072

Carpenter

J. R.

Goldstein

Kenward

M. G.

(2011). REALCOM-IMPUTE software for multilevel multiple imputation with mixed response types. Journal of Statistical Software, 45(5), 1–14. doi:10.18637/jss.v045.i05

Carpenter

J. R.

Kenward

M. G.

(2013). Multiple imputation and its application. Hoboken, NJ: Wiley.

10.

Cheung

M. W.-L.

(2007). Comparison of methods of handling missing time-invariant covariates in latent growth models under the assumption of missing completely at random. Organizational Research Methods, 10, 609–634. doi:10.1177/1094428106295499

11.

Collins

L. M.

Schafer

J. L.

Kam

C.-M.

(2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6, 330–351. doi:10.1037/1082-989X.6.4.330

12.

Drechsler

(2015). Multiple imputation of multilevel missing data—Rigor versus simplicity. Journal of Educational and Behavioral Statistics, 40, 69–95. doi:10.3102/1076998614563393

13.

Enders

C. K.

(2008). A note on the use of missing auxiliary variables in full information maximum likelihood-based structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 15, 434–448. doi:10.1080/10705510802154307

14.

Enders

C. K.

(2010). Applied missing data analysis. New York, NY: Guilford.

15.

Enders

C. K.

Mistler

S. A.

Keller

B. T.

(2016). Multilevel multiple imputation: A review and evaluation of joint modeling and chained equations imputation. Psychological Methods, 21, 222–240. doi:10.1037/met0000063

16.

Erler

N. S.

Rizopoulos

van Rosmalen

Jaddoe

V. W. V.

Franco

O. H.

Lesaffre

E. M. E. H.

(2016). Dealing with missing covariates in epidemiologic studies: A comparison between multiple imputation and a full Bayesian approach. Statistics in Medicine, 35, 2955–2974. doi:10.1002/sim.6944

17.

Gelman

Hill

(2006). Data analysis using regression and multilevel/hierarchical models. New York, NY: Cambridge University Press.

18.

Gibson

N. M.

Olejnik

(2003). Treatment of missing data at the second level of hierarchical linear models. Educational and Psychological Measurement, 63, 204–238. doi:10.1177/0013164402250987

19.

Goldstein

Carpenter

J. R.

Browne

W. J.

(2014). Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non-linear terms. Journal of the Royal Statistical Society: Series A (Statistics in Society), 177, 553–564. doi:10.1111/rssa.12022

20.

Goldstein

Carpenter

J. R.

Kenward

M. G.

Levin

K. A.

(2009). Multilevel models with multivariate mixed response types. Statistical Modelling, 9, 173–197. doi:10.1177/1471082X0800900301

21.

Gottfredson

N. C.

Sterba

S. K.

Jackson

K. M.

(2016). Explicating the conditions under which multilevel multiple imputation mitigates bias resulting from random coefficient-dependent missing longitudinal data. Prevention Science. Advance online publication. doi:10.1007/s11121-016-0735-3

22.

Graham

J. W.

(2003). Adding missing-data-relevant variables to FIML-based structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 10, 80–100. doi:10.1207/S15328007SEM1001_4

23.

Graham

J. W.

(2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549–576. doi:10.1146/annurev.psych.58.110405.085530

24.

Graham

J. W.

Olchowski

A. E.

Gilreath

T. D.

(2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8, 206–213. doi:10.1007/s11121-007-0070-9

25.

Graham

J. W.

Taylor

B. J.

Olchowski

A. E.

Cumsille

P. E.

(2006). Planned missing data designs in psychological research. Psychological Methods, 11, 323–343. doi:10.1037/1082-989X.11.4.323

26.

Grund

Lüdtke

Robitzsch

(2016a). Multiple imputation of missing covariate values in multilevel models with random slopes: A cautionary note. Behavior Research Methods, 48, 640–649. doi:10.3758/s13428-015-0590-3

27.

Grund

Lüdtke

Robitzsch

(2016b). Multiple imputation of multilevel missing data: An introduction to the R package pan. SAGE Open, 6(4), 1–17. doi:10.1177/2158244016668220

28.

Grund

Lüdtke

Robitzsch

(in press). Missing data in multilevel research. In Humphrey

S. E.

LeBreton

J. M.

(Eds.), Handbook for multilevel theory, measurement, and analysis. Washington, DC: American Psychological Association.

29.

Hofmann

D. A.

Gavin

M. B.

(1998). Centering decisions in hierarchical linear models: Implications for research in organizations. Journal of Management, 24, 623–641. doi:10.1177/014920639802400504

30.

Hox

J. J.

van Buuren

Jolani

(2016). Incomplete multilevel data. In Harring

Stapleton

L. M.

Beretvas

S. N.

(Eds.), Advances in multilevel modeling for educational research: Addressing practical issues found in real-world applications (pp, 39–62). Charlotte, NC: Information Age.

31.

Keller

B. T.

Enders

C. K.

(2016). Blimp Software Manual (Version Beta 6.6) [Computer software]. Retrieved from http://www.appliedmissingdata.com

32.

Kim

Sugar

C. A.

Belin

T. R.

(2015). Evaluating model-based imputation methods for missing covariates in regression models with interactions. Statistics in Medicine, 34, 1876–1888. doi:10.1002/sim.6435

33.

Kreft

I. G. G.

de Leeuw

Aiken

L. S.

(1995). The effect of different forms of centering in hierarchical linear models. Multivariate Behavioral Research, 30, 1–21. doi:10.1207/s15327906mbr3001_1

34.

Little

R. J. A.

Rubin

D. B.

(2002). Statistical analysis with missing data (2nd ed.). Hoboken, NJ: Wiley.

35.

Lüdtke

Marsh

H. W.

Robitzsch

Trautwein

Asparouhov

Muthén

B. O.

(2008). The multilevel latent covariate model: A new, more reliable approach to group-level effects in contextual studies. Psychological Methods, 13, 203–229. doi:10.1037/a0012869

36.

Lüdtke

Robitzsch

Grund

(2017). Multiple imputation of missing data in multilevel designs: A comparison of different strategies. Psychological Methods, 22, 141–165. doi:10.1037/met0000096

37.

Lunn

D. J.

Thomas

Best

Spiegelhalter

(2000). WinBUGS—A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing, 10, 325–337. doi:10.1023/A:1008929526011

38.

McNeish

D. M.

(2016). Using data-dependent priors to mitigate small sample bias in latent growth models: A discussion and illustration using Mplus. Journal of Educational and Behavioral Statistics, 41, 27–56. doi:10.3102/1076998615621299

39.

Mehta

P. D.

(2013). xxM (Version 0.6.0) [Computer software]. Retrieved from xxm.times.uh.edu

40.

Meng

X.-L.

(1994). Multiple-imputation inferences with uncongenial sources of input. Statistical Science, 9, 538- 558. doi:10.1214/ss/1177010269

41.

Mistler

S. A.

(2013). A SAS macro for applying multiple imputation to multilevel data. In Proceedings of the SAS Global Forum. Retrieved from http://support.sas.com/

42.

Mistler

S. A.

(2015). Multilevel multiple imputation: An examination of competing methods (Doctoral dissertation). Retrieved from http://repository.asu.edu/

43.

Muthén

L. K.

Muthén

B. O.

(2012). Mplus user’s guide (7th ed.). Los Angeles, CA: Muthén & Muthén.

44.

Newman

D. A.

(2009). Missing data techniques and low response rates. In Lance

C. E.

Vandenberg

R. J.

(Eds.), Statistical and methodological myths and urban legends: Doctrine, verity and fable in the organizational and social sciences (pp. 7–36). New York, NY: Routledge.

45.

Newman

D. A.

(2014). Missing data: Five practical guidelines. Organizational Research Methods, 17, 372–411. doi:10.1177/1094428114548590

46.

Plummer

(2016). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling (Version 4.2.0) [Computer software]. Retrieved from http://sourceforge.net/projects/mcmc-jags/

47.

Preacher

K. J.

Zyphur

M. J.

Zhang

(2010). A general multilevel SEM framework for assessing multilevel mediation. Psychological Methods, 15, 209–233. doi:10.1037/a0020141

48.

Quartagno

Carpenter

J. R.

(2016). Jomo: A package for multilevel joint modelling multiple imputation (Version 2.3-1) [Computer software]. Retrieved from http://CRAN.R-project.org/package=jomo

49.

R Core Team. (2016). R: A language and environment for statistical computing (Version 3.3.0) [Computer software]. Retrieved from http://www.R-project.org/

50.

Rabe-Hesketh

Skrondal

Zheng

(2012). Multilevel structural equation modeling. In Hoyle

R. H.

(Ed.), Handbook of structural equation modeling (pp. 512–531). New York, NY: Guilford.

51.

Rasbash

Charlton

Browne

W. J.

Healy

Cameron

(2015). MLwiN (Version 2.34) [Computer software]. Bristol, UK: University of Bristol, Centre for Multilevel Modelling.

52.

Resche-Rigon

White

I. R.

(2016). Multiple imputation by chained equations for systematically and sporadically missing multilevel data. Statistical Methods in Medical Research. doi:10.1177/0962280216666564

53.

Robitzsch

Grund

Henke

(2016). Miceadds: Some additional multiple imputation functions, especially for mice (Version 1.7-8) [Computer software]. Retrieved from http://CRAN.R-project.org/package=miceadds

54.

Royston

(2004). Multiple imputation of missing values. Stata Journal, 4, 227–241.

55.

Rubin

D. B.

(1976). Inference and missing data. Biometrika, 63, 581–592. doi:10.1093/biomet/63.3.581

56.

Rubin

D. B.

(1987). Multiple imputation for nonresponse in surveys. Hoboken, NJ: Wiley.

57.

Schafer

J. L.

(2003). Multiple imputation in multivariate problems when the imputation and analysis models differ. Statistica Neerlandica, 57, 19–35. doi:10.1111/1467-9574.00218

58.

Schafer

J. L.

Graham

J. W.

(2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147–177. doi:10.1037//1082-989X.7.2.147

59.

Schafer

J. L.

Yucel

R. M.

(2002). Computational strategies for multivariate linear mixed-effects models with missing values. Journal of Computational and Graphical Statistics, 11, 437–457. doi:10.1198/106186002760180608

60.

Seaman

S. R.

Bartlett

J. W.

White

I. R.

(2012). Multiple imputation of missing covariates with non-linear effects and interactions: An evaluation of statistical methods. BMC Medical Research Methodology, 12(1), 46. Retrieved from http://www.biomedcentral.com/1471-2288/12/46

61.

Shin

Raudenbush

S. W.

(2010). A latent cluster-mean approach to the contextual effects model with missing data. Journal of Educational and Behavioral Statistics, 35, 26–53. doi:10.3102/1076998609345252

62.

Snijders

T. A. B.

Bosker

R. J.

(2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks, CA: Sage.

63.

Stubbendick

A. L.

Ibrahim

J. G.

(2003). Maximum likelihood methods for nonignorable missing responses and covariates in random effects models. Biometrics, 59, 1140–1150. doi:10.1111/j.0006-341X.2003.00131.x

64.

Taljaard

Donner

Klar

(2008). Imputation strategies for missing continuous outcomes in cluster randomized trials. Biometrical Journal, 50, 329–345. doi:10.1002/bimj.200710423

65.

van Buuren

(2011). Multiple imputation of multilevel data. In Hox

J. J.

(Ed.), Handbook of advanced multilevel analysis (pp. 173–196). New York, NY: Routledge.

66.

van Buuren

(2012). Flexible imputation of missing data. Boca Raton, FL: CRC Press.

67.

van Buuren

Groothuis-Oudshoorn

(2011). MICE: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. doi:10.18637/jss.v045.i03

68.

Vermunt

J. K.

(2003). Multilevel latent class models. Sociological Methodology, 33, 213–239. doi:10.1111/j.0081-1750.2003.t01-1-00131.x

69.

Vermunt

J. K.

Magidson

(2013). Latent GOLD (Version 5.0) [Computer software]. Belmont, MA: Statistical Innovations.

70.

Vermunt

J. K.

van Ginkel

J. R.

van der Ark

L. A.

Sijtsma

(2008). Multiple imputation of incomplete categorical data using latent class analysis. Sociological Methodology, 38, 369–397. doi:10.1111/j.1467-9531.2008.00202.x

71.

Vink

van Buuren

(2013). Multiple imputation of squared terms. Sociological Methods & Research, 42, 598–607. doi:10.1177/0049124113502943

72.

von Hippel

P. T.

(2009). How to impute interactions, squares, and other transformed variables. Sociological Methodology, 39, 265–291. doi:10.1111/j.1467-9531.2009.01215.x

73.

(2010). Mixed effects models for complex data. Boca Raton, FL: CRC Press.

74.

Yucel

R. M.

(2008). Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 366, 2389–2403. doi:10.1098/rsta.2008.0038

75.

Zhang

Wang

(2016). Moderation analysis with missing data in the predictors. Psychological Methods. Advance online publication. doi:10.1037/met0000104

76.

Zinn

(2013). An imputation model for multilevel binary data (NEPS Working Paper No. 31). Retrieved from http://www.neps-data.de/

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.37 MB