Handling Missing Data in Cross-Classified Multilevel Analyses: An Evaluation of Different Multiple Imputation Approaches

Abstract

Multiple imputation (MI) is a popular method for handling missing data. In education research, it can be challenging to use MI because the data often have a clustered structure that need to be accommodated during MI. Although much research has considered applications of MI in hierarchical data, little is known about its use in cross-classified data, in which observations are clustered in multiple higher-level units simultaneously (e.g., schools and neighborhoods, transitions from primary to secondary schools). In this article, we consider several approaches to MI for cross-classified data (CC-MI), including a novel fully conditional specification approach, a joint modeling approach, and other approaches that are based on single- and two-level MI. In this context, we clarify the conditions that CC-MI methods need to fulfill to provide a suitable treatment of missing data, and we compare the approaches both from a theoretical perspective and in a simulation study. Finally, we illustrate the use of CC-MI in real data and discuss the implications of our findings for research practice.

Keywords

missing data multiple imputation cross-classification clustered data random-effects models multilevel analysis

Missing data have received considerable attention in the psychological and educational literature in recent years (e.g., Enders, 2010; Little & Rubin, 2002), and multiple imputation (MI) has become one of the most commonly recommended methods for handling missing data in many areas of research (e.g., Schafer & Graham, 2002). The general idea underlying MI is that missing data are replaced with multiple plausible values that are generated on the basis of the observed data and an imputation model (Rubin, 1987). A key requirement of MI is that the imputation model correctly takes into account the data structure and the relationships between the observed variables. This can be particularly challenging when the data have a clustered structure, in which observations are organized within higher-level units (e.g., students in schools, employees in teams, repeated measures in individuals). Although a number of studies have considered applications of MI in clustered data, much of this research has been concerned with hierarchical data, such as two- and three-level data (Enders et al., 2016; Goldstein et al., 2009; Grund et al., 2018b; Lüdtke et al., 2017; Schafer & Yucel, 2002; Wijesuriya et al., 2020). By contrast, the use of MI applications with nonhierarchical data structures, such as cross-classified or multiple-membership structures, is still poorly understood (for an overview, see Rasbash & Browne, 2008).

The purpose of this article is to investigate the effectiveness of MI for the treatment of missing values in cross-classified data, wherein observations can belong to multiple clusters that do not form a clear hierarchy (Goldstein, 2011; Raudenbush & Bryk, 2002). Cross-classified data are common in many areas of research, for example, in cross-sectional data when individuals are organized in multiple higher-level units (e.g., students in schools and neighborhoods; employees in teams and fields of expertise; see also Claus et al., 2020; Fielding & Goldstein, 2006) or in longitudinal data when cluster membership changes over time (e.g., students in primary and secondary schools; see also Cafri et al., 2015). The treatment of missing data in cross-classified data can be extremely challenging because the variables can be observed at different levels, and the cross-classified structure implies a complex pattern of dependency between the observations that can no longer be captured by hierarchical models.

In writing this article, we have three major goals. First, we aim to clarify the requirements that imputation approaches need to fulfill in order to provide a suitable treatment of missing values in the analysis of cross-classified data. In this context, we focus on analyses with random intercepts and linear effects, and we discuss how the imputation approaches can be extended to address additional types of analyses. Second, we aim to compare the statistical properties of different MI approaches for cross-classified data (CC-MI) from both theoretical and practical perspectives and by using the results of a simulation study. To this end, we (a) introduce a novel approach to CC-MI that is based on the fully conditional specification (FCS) approach to MI and (b) outline a Bayesian joint modeling (JM) approach for the treatment of incomplete cross-classified data. Third, we illustrate the application of CC-MI in a worked example with real data from education research, provide recommendations, and outline limitations and extensions of the approaches that we considered.

This article is organized as follows. In the first section, we provide a brief introduction to the structure and analysis of cross-classified data. In doing so, we try to outline the most important structural features of cross-classified data and explain how they can be analyzed with cross-classified random-effects models (CCRMs). Next, we present the JM and FCS approaches to CC-MI and explain how these methods accommodate the structural properties of cross-classified data. In this context, we also consider ad hoc approaches that extend conventional methods for single- and two-level MI to better accommodate cross-classified data and that can be implemented in a wide variety of statistical software. Then, we present the results of a simulation study, in which we evaluated the statistical properties of these methods. Finally, we demonstrate the application of the methods in an example with computer code and real data, and we discuss the implication of our findings for the treatment of incomplete cross-classified data.

Cross-Classified Data

Cross-classified data are characterized by a clustered structure, in which observations belong to multiple clusters simultaneously. For example, consider the hypothetical scenario in Figure 1, where students are clustered within the units of two random factors: schools (A) and neighborhoods (B). In such a case, students who attend the same school or live in the same neighborhood often tend to be more similar to each other than to students in different schools or neighborhoods because they are exposed to similar contextual influences. Both factors represent higher-level units, and we refer to these levels as Levels A and B. However, in contrast to the clustering that occurs in hierarchical data, the two factors are crossed and not nested within one another. In other words, the students are cross-classified by neighborhoods and schools. This is reflected by the fact that the students at any particular school sometimes live in different neighborhoods, and the students in any particular neighborhood sometimes attend different schools. The combined membership of each student in one school and one neighborhood forms a number of school-neighborhood pairs, in which a certain number of students are nested. We refer to this intermediate level as Level AB. Conceptually, the school-neighborhood pairs may correspond to more tightly knit communities or peer groups that share additional contextual influences that are not shared by other students who attend the same school (but live in different neighborhoods) or live in the same neighborhood (but attend different schools).

Figure 1.

Example of cross-classified data with students (Level 1) clustered in schools (Level A) and neighborhoods (Level B). The two factors (A and B) are crossed, resulting in a number of school-neighborhood pairs (AB), in which the students are nested.

The school-neighborhood assignment shown in Figure 1 is an example of partial cross-classification, in which every school is crossed with only a subset of the neighborhoods and vice versa. The partially cross-classified structure is reflected by the fact that each neighborhood sends students to only some of the schools, and each school recruits students from only some of the neighborhoods. For the example above, the assignment of schools and neighborhoods into school-neighborhood pairs is illustrated in more detail in Figure 2. By contrast, a full cross-classification would occur if every school received students from all the neighborhoods or, equivalently, if every neighborhood sent students to all the schools.

Figure 2.

Example (continued) for cross-classified data with students (Level 1) clustered in schools (Level A) and neighborhoods (Level B). The two panels represent the ties between schools and neighborhoods (a) and the number of students for each school-neighborhood pair (b).

Examples of cross-classified data can be found in many areas of research. For example, in education research, cross-classification can occur when schools are crossed with other organizational units, such as neighborhoods or families (e.g., Dundas et al., 2014; Dunn et al., 2015; Garner & Raudenbush, 1991). In longitudinal data, cross-classification occurs when students transition from one type of school to another (e.g., primary and secondary school; Goldstein & Sammons, 1997; Paterson, 1991) or switch classes or teachers over time (e.g., Gregory & Huang, 2013; Heck, 2009; Kyriakides & Creemers, 2008). Finally, cross-classified data can also occur in other areas of research, such as in clinical and organizational research (Barker et al., 2020; Claus et al., 2020), in experimental research (Baayen et al., 2008), or in the context of generalizability theory (Cronbach et al., 1972; Shavelson & Webb, 2000).

Cross-Classified Random-Effects Models

One of the most popular methods for analyzing cross-classified data is the CCRM. Suppose that a researcher is interested in the relationship between an explanatory variable x and an outcome variable y in a sample of students (Level 1) who are clustered within schools (Level A) and neighborhoods (Level B). For example, y may represent students’ academic achievement, whereas x may represent their socioeconomic status (SES).

Intercept-only model

A common first step in the analysis of cross-classified data is to estimate the components of variance in y that can be attributed to differences between schools, neighborhoods, and school-neighborhood pairs, for example, by using the following intercept-only model (Raudenbush & Bryk, 2002). For student i ( $i = 1, \dots, n_{j k}$ ) in school j ( $j = 1, \dots, J$ ) and neighborhood k ( $k = 1, \dots, K$ ),

y_{i j k} = β_{0} + u_{A, j} + u_{B, k} + u_{A B, j k} + e_{i j k},

where $β_{0}$ is the overall intercept, $u_{A, j}$ and $u_{B, k}$ are the random intercepts of the schools and neighborhoods, respectively, $u_{A B, j k}$ are the random intercepts of the school-neighborhood pairs, and $e_{i j k}$ are the student-specific residuals. The random effects $u_{A, j}$ , $u_{B, k}$ , and $u_{A B, j k}$ and the residuals $e_{i j k}$ are assumed to follow independent normal distributions with means of zero and variances denoted by $τ_{A}^{2}$ , $τ_{B}^{2}$ , $τ_{A B}^{2}$ , and $σ^{2}$ , respectively.

The main purpose of this model is to distinguish the components of variance in y that pertain to differences between schools (A), neighborhoods (B), and school-neighborhood pairs ( $A B$ ). These can subsequently be used to compute the intraclass correlation (ICC) at different levels (see Raudenbush & Bryk, 2002). Conceptually, the random effects associated with A and B ( $u_{A, j}$ and $u_{B, k}$ ) represent the mean differences that exist between the members of different schools or neighborhoods and that are shared among the students who attend the same school (j) or live in the same neighborhood (k). The random effect associated with the school-neighborhood pair ( $u_{A B, j k}$ ) represents the differences between the mean values of students who belong to a particular combination of school and neighborhood ( $j k$ ) above and beyond the differences that they share with other students who attend the same school (but live in different neighborhoods) or live in the same neighborhood (but attend different schools). The random effects of the cluster membership are sometimes also referred to as “main” ( $u_{A, j}$ and $u_{B, k}$ ) and “interaction” ( $u_{A B, j k}$ ) effects (e.g., Beretvas, 2011) because they reflect mean differences between groups of students similar to the main and interaction effects in a two-way between-subjects analysis of variance (see also Maxwell et al., 2018; Raudenbush & Bryk, 2002).

Random-intercept model with explanatory variables

The model above can be extended to address more interesting research questions by including explanatory variables that can be measured at any level of the sample (Raudenbush & Bryk, 2002). In addition, the model can include the cluster means of explanatory variables at Level 1, which allows the effects of this variable to take on different values at different levels. For example, with three explanatory variables x at Level 1, z at Level A, and w at Level B, the model can be extended as follows:

y_{i j k} = β_{0} + β_{1} x_{i j k} + β_{2} {\bar{x}}_{• j •} + β_{3} {\bar{x}}_{• • k} + β_{4} {\bar{x}}_{• j k} + β_{5} z_{j} + β_{6} w_{k} + u_{A, j} + u_{B, k} + u_{A B, j k} + e_{i j k} .

In this model, $β_{1}$ is the effect of x at Level 1, $β_{2}$ is the effect of x at Level A, $β_{3}$ is the effect of x at Level B, and $β_{4}$ is the effect of x at Level AB (i.e., for the school-neighborhood pair). In addition, $β_{5}$ and $β_{6}$ are the effects of z and w at Levels A and B, respectively.

Notice that the extended model now partitions both y and x into within- and between-cluster components, although it does so in different ways. Specifically, the components in y are represented by random effects and residuals, whereas the components in x are represented by the values of x at Level 1 ( $x_{i j k}$ ) and the cluster means of x at Levels A ( ${\bar{x}}_{• j •}$ ), B ( ${\bar{x}}_{• • k}$ ), and AB ( ${\bar{x}}_{• j k}$ ). The effects of the cluster means ( ${\bar{x}}_{• j •}$ , ${\bar{x}}_{• • k}$ , and ${\bar{x}}_{• j k}$ ) of x in Equation 2 represent contextual effects, that is, the extent to which cluster-level differences in x are associated with cluster-level differences in y above and beyond the effect of x at Level 1 (Raudenbush & Bryk, 2002). In sum, this model partitions the lower-level variables into within- and between-cluster components and allows these components to be associated with one another at different levels.

The models above incorporate the nonhierarchical structure of the data in multiple ways. First, the models include separate random effects and variance components for the two crossed factors A and B. By contrast, if this structure was simplified or ignored, for example, by treating the factors as hierarchical, then the estimated parameters and standard errors could be biased (Lai, 2019; Luo & Kwok, 2009; Meyers & Beretvas, 2006). Second, the models include a random effect of the “interaction” of the two factors, that is, for the combined membership of individuals in a pair of units in A and B. Omitting this component can sometimes simplify the specification of the model but can also induce bias (Shi et al., 2010). As a general rule, including the random effect of the “interaction” requires that there are multiple observations ( $n_{j k} > 1$ ) for at least some of the pairs of units in A and B; otherwise (if all $n_{j k} = 1$ ), it cannot be distinguished from the residual at Level 1 and must be dropped from the analysis (see also Beretvas, 2011). Third, the models can include effects of explanatory variables at each level as well as effects of the cluster means of explanatory variables at Level 1, which allows the cluster-level effects to differ from the effects at Level 1 (Raudenbush and Bryk, 2002; see also Kreft et al., 1995).

Further extensions

The models can also be extended further to address additional research questions. For example, random slopes can be included to allow the effects of the explanatory variables to vary across the units of A or B, and explanatory variables at Levels A or B can be used to explain some of the variance in the slope coefficients (e.g., Raudenbush & Bryk, 2002). In such a model, the effects of lower-level explanatory variables vary both at random and due to the moderating influence of higher-level variables (cross-level interactions [CLIs]). In the following sections, we focus on CCRMs that include only random intercepts and linear effects. We do not consider applications with random slopes, CLIs, or other nonlinear effects in detail, but we return to these extensions later.

MI of Cross-Classified Data

In the following section, we outline two of the main strategies—JM and FCS—that are typically used to conduct MI, and we explain how these strategies can be extended for CC-MI. In addition, we consider a number of ad hoc approaches that are based on imputation approaches for single-level and two-level (hierarchical) data. For simplicity, we focus on applications with continuous data; however, either approach can also be used with categorical data, and we return to this topic in the Discussion section.

Joint Modeling

The general idea underlying the JM approach is that a single (joint) imputation model is specified for all variables with missing data, thus generating imputations for all variables simultaneously. The JM approach was developed primarily in the context of single- and two-level data (Schafer & Olsen, 1998; Schafer & Yucel, 2002; for an overview, see Carpenter & Kenward 2013), but it has also been applied to cross-classified and multiple-membership data (Yucel et al., 2008). To conduct CC-MI with the JM approach, a multivariate CCRM that includes all variables both with and without missing data must be specified.

Suppose that the data comprise observations clustered in two factors A and B as before and with variables measured at different levels. Let further $y^{(1)}$ denote the variables at Level 1, $y^{(A)}$ the variables at Level A, $y^{(B)}$ the variables at Level B, and $y^{(A B)}$ the variables at Level AB. For example, if the variables of interest were those in Equation 2, then $y^{(1)}$ would include y and x, $y^{(A)}$ would include z, $y^{(B)}$ would include w, and $y^{(A B)}$ would be empty. Then, for observation i in unit j of factor A and unit k in factor B, the joint model can be written as

\begin{array}{l} y_{i j k}^{(1)} = μ^{(1)} + u_{A, j}^{(1)} + u_{B, k}^{(1)} + u_{A B, j k}^{(1)} + e_{i j k} & (Level 1) \\ y_{j k}^{(A B)} = μ^{(A B)} + u_{A, j}^{(A B)} + u_{B, k}^{(A B)} + u_{A B, j k}^{(A B)} & (Level AB) \\ y_{j}^{(A)} = μ^{(A)} + u_{A, j}^{(A)} & (Level A) \\ y_{k}^{(B)} = μ^{(B)} + u_{B, k}^{(B)} & (Level B) \end{array},

where $u_{A, j} = (u_{A, j}^{(1)}, u_{A, j}^{(A)}, u_{A, j}^{(A B)})$ are the random effects and residuals at Level A, $u_{B, j} = (u_{B, j}^{(1)}, u_{B, j}^{(B)},$ $u_{B, j}^{(A B)})$ are the random effects and residuals at Level B, $u_{A B, j} = (u_{A B, j}^{(1)}, u_{A B, j}^{(A B)})$ are the random effects and residuals at Level AB, and $e_{i j k}$ are residuals at Level 1. The random effects and residuals at each level are assumed to follow independent multivariate normal distributions with mean vectors of zero and covariance matrices $T_{A}$ , $T_{A}$ , $T_{A B}$ , and $Σ$ , respectively.

Similar to the univariate intercept-only model in Equation 1, the joint model partitions the within- and between-cluster components in the variables at each level. In addition, the model incorporates the associations that can exist between the components at each level by allowing the random effects and residuals to be correlated (i.e., through $T_{A}$ , $T_{A}$ , $T_{A B}$ , and $Σ$ ). Consequently, the JM approach to CC-MI incorporates information from variables at different levels in the imputation of missing data. For example, when imputing missing data at Level 1, the JM approach incorporates information from variables at Levels A, B, and AB ( $T_{A}$ , $T_{B}$ , and $T_{A B}$ ) as well as other variables at Level 1 ( $Σ$ ).

Markov Chain Monte Carlo (MCMC) algorithm

The JM approach to CC-MI can be implemented with MCMC techniques (Browne, 2009; Browne et al., 2001). In the following, we outline the main steps of a generic MCMC algorithm for the estimation of the model parameters and the imputation of missing data at each level (for additional details, see Rasbash & Browne, 2008). For convenience, we write the data of all units and variables as $y = (y^{(1)}, y^{(A)}, y^{(B)}, y^{(A B)})$ and the random effects as $u = (u_{A}, u_{B}, u_{A B})$ . At iteration t of the MCMC algorithm,

Draw $μ^{(t)} \sim P (μ | y^{(t - 1)}, u^{(t - 1)}, T_{A}^{(t - 1)}, T_{B}^{(t - 1)}, T_{A B}^{(t - 1)}, Σ^{(t - 1)})$ given sensible priors.

Draw $T_{A}^{(t)} \sim P (T_{A} | y^{(t - 1)}, u^{(t - 1)}, μ^{(t)}, T_{B}^{(t - 1)}, T_{A B}^{(t - 1)}, Σ^{(t - 1)})$ given sensible priors.

Draw $T_{B}^{(t)} \sim P (T_{B} | y^{(t - 1)}, u^{(t - 1)}, μ^{(t)}, T_{A}^{(t)}, T_{A B}^{(t - 1)}, Σ^{(t - 1)})$ given sensible priors.

Draw $T_{A B}^{(t)} \sim P (T_{A B} | y^{(t - 1)}, u^{(t - 1)}, μ^{(t)}, T_{A}^{(t)}, T_{B}^{(t)}, Σ^{(t - 1)})$ given sensible priors.

Draw $Σ^{(t)} \sim P (Σ | y^{(t - 1)}, u^{(t - 1)}, μ^{(t)}, T_{A}^{(t)}, T_{B}^{(t)}, T_{A B}^{(t)})$ given sensible priors.

Draw $u^{(t)}$ as follows:

Compute the observed-data residuals $u_{A, j}^{obs, (t)}$ at Level A. Draw the random effects and missing-data residuals from $u_{A, j}^{mis, (t)} \sim P (u_{A, j}^{mis} | u_{A, j}^{obs, (t)}, u_{B}^{(t - 1)}, u_{A B}^{(t - 1)}, y^{(t - 1)}, μ^{(t)}, T_{A}^{(t)}, T_{B}^{(t)},$ $T_{A B}^{(t)}, Σ^{(t)})$ .

Compute the observed-data residuals $u_{B, k}^{obs, (t)}$ at Level B. Draw the random effects and missing-data residuals from $u_{B, k}^{mis, (t)} \sim P (u_{B, k}^{mis} | u_{B, k}^{obs, (t)}, u_{A}^{(t)}, u_{A B}^{(t - 1)}, y^{(t - 1)}, μ^{(t)}, T_{A}^{(t)}, T_{B}^{(t)},$ $T_{A B}^{(t)}, Σ^{(t)})$ .

Compute the observed-data residuals $u_{A B, j k}^{obs, (t)}$ at Level AB. Draw the random effects and missing-data residuals from $u_{A B, j k}^{mis, (t)} \sim P (u_{A B, j k}^{mis} | u_{A B, j k}^{obs, (t)}, u_{A}^{(t)}, u_{B}^{(t)}, y^{(t - 1)}, μ^{(t)}, T_{A}^{(t)}, T_{B}^{(t)},$ $T_{A B}^{(t)}, Σ^{(t)})$ .

Compute the observed-data residuals $e_{i j k}^{obs, (t)}$ at Level 1. Draw the missing-data residuals from $e_{i j k}^{mis, (t)} \sim P (e_{i j k}^{mis} | e_{i j k}^{obs, (t)}, y^{(t - 1)}, u^{(t)}, μ^{(t)}, T_{A}^{(t)}, T_{B}^{(t)}, T_{A B}^{(t)}, Σ^{(t)})$ .

Update $y^{(t)}$ by imputing the missing data at Levels 1, A, B, and AB, using the current values of $μ^{(t)}$ , $u^{mis, (t)}$ , and $e^{mis, (t)}$ in accordance with Equation 3.

Notice that the sampling steps for the random effects (Step 6) and the residuals (Step 7) are performed by conditioning on the random effects and the observed values of other variables at the same level. In doing so, the JM approach incorporates information from other variables into the imputation of missing data, while accounting for the relationships that exist between variables at different levels.

To our knowledge, there is currently no software that implements the JM approach to CC-MI (but see Yucel et al., 2008). However, the required sampling steps can be carried out in general-purpose software for Bayesian data analysis, such as WinBUGS/OpenBUGS (Lunn et al., 2000), JAGS (Plummer, 2017), or Stan (Stan Development Team, 2021). In addition, a simplified version of this model with no random effects or variables at Level AB can be fit with the Bayesian estimation procedure in Mplus (Muthén & Muthén, 2017).

Fully Conditional Specification

As an alternative to the JM approach, the joint distribution of the variables with missing data can be approximated by imputing one variable at a time while iterating over a sequence of univariate imputation models, one for each variable with missing data. This strategy is known as the FCS approach to MI (Raghunathan et al., 2001; van Buuren et al., 2006). In the context of CC-MI, each imputation model is set up in such a way that it (a) partitions the within- and between-cluster components of the respective target variable and (b) includes other variables and their cluster means as predictors to represent the relationships between the variables at each level.

Because the FCS approach iterates along a sequence of imputation models, one for each target variable with missing data, different types of models are required to address variables at different levels. Suppose that $y^{(p)}$ denotes the pth target variable with missing data ( $p = 1, \dots, P$ ). If $y^{(p)}$ is measured at Level 1, then the imputation model takes the form of a CCRM as follows:

\begin{matrix} y_{i j k}^{(p)} & = & x_{i j k}^{(p)} β^{(p)} + u_{A, j}^{(p)} + u_{B, k}^{(p)} + u_{A B, j k}^{(p)} + e_{i j k}^{(p)}, & (Level 1) \end{matrix}

where $x_{i j k}^{(p)}$ denotes the values of the predictor variables and their cluster means at Levels A, B, and AB, $β^{(p)}$ denotes the fixed effects, and $u_{A, j}^{(p)}$ , $u_{B, k}^{(p)}$ , $u_{A B, j k}^{(p)}$ , and $e_{i j k}^{(p)}$ denote the random effects and residuals with means of zero and variances denoted by $τ_{A}^{2 (p)}$ , $τ_{A}^{2 (p)}$ , $τ_{A B}^{2 (p)}$ , and $σ^{2 (p)}$ . The predictors in $x^{(p)}$ typically include all variables other than $y^{(p)}$ . In addition, in CC-MI, $x^{(p)}$ includes the cluster means of the predictors at Levels A, B, and AB. In doing so, the imputation model takes into account the within- and between-cluster components both in $y^{(p)}$ (through random effects) and in $x^{(p)}$ (through cluster means) as well as the relations that can exist between the two (through $β$ ). For example, if the first target variable $y^{(1)}$ was the outcome variable y from Equation 2, then the predictors would be x, z, w, and the cluster means of x at Levels A, B, and AB ( ${\bar{x}}_{• j •}$ , ${\bar{x}}_{• • k}$ , and ${\bar{x}}_{• j k}$ ).

For target variables at Levels A, B, and AB, the imputation models take on simpler forms but follow the same strategy by conditioning on the other variables at the same levels as well as the cluster means of lower-level variables. Specifically, for variables at Levels A, B, and AB, the models become:

\begin{array}{l} y_{j k}^{(p)} = x_{j k}^{(p)} β^{(p)} + u_{A, j}^{(p)} + u_{B, k}^{(p)} + u_{A B, j k}^{(p)} & (Level AB) \\ y_{j}^{(p)} = x_{j}^{(p)} β^{(p)} + u_{A, j}^{(p)} & (Level A) \\ y_{k}^{(p)} = x_{k}^{(p)} β^{(p)} + u_{B, k}^{(p)} & (Level B) \end{array},

where $x^{(p)}$ denotes the predictors, $β^{(p)}$ denotes the fixed effects, and $u_{A, j}^{(p)}$ , $u_{B, k}^{(p)}$ , $u_{A B, j k}^{(p)}$ denote the random effects and residuals with variance components as before. The predictor variables in $x^{(p)}$ can include all variables other than $y^{(p)}$ that are measured at the same level as well as the cluster means of predictors that were measured at lower-levels (e.g., at Levels 1 or AB for a target variable at Level A). In addition, in unbalanced and partially cross-classified data, $x^{(p)}$ can include cluster means of higher-level variables that were measured at other levels (e.g., at Level B for a target variable at Level A). For example, if the second target variable $y^{(2)}$ was the explanatory variable z from Equation 2, which is measured at Level A, then the predictors would be the cluster means of x and y at Level A ( ${\bar{x}}_{• j •}$ and ${\bar{y}}_{• j •}$ ) and potentially those of w (in unbalanced and partially cross-classified data).

Similar to the JM approach, the FCS approach aims to accommodate the cross-classified structure of the data in each imputation model by (a) partitioning the variables in within- and between-cluster components and (b) allowing for associations among the components at different levels. However, in the FCS approach, only the components in the target variables are represented by random effects, whereas those in the predictors are represented by cluster means (see also Enders et al., 2016; Grund et al., 2018b; Lüdtke et al., 2017; Mistler & Enders, 2017). If the cluster means themselves are based on variables with missing data, then they are updated in each step of the procedure, so that they reflect the most recent imputations of the underlying variables (see also Royston, 2005; van Buuren & Groothuis-Oudshoorn, 2011). To our knowledge, the FCS approach to CC-MI is currently supported only by the packages mice (van Buuren & Groothuis-Oudshoorn, 2011) and miceadds (Robitzsch et al., 2021) in the statistical software R (R Core Team, 2021).

FCS algorithm

The FCS approach to CC-MI that is implemented in miceadds uses an (approximate) Gibbs sampling algorithm to generate imputations by iterating along the following steps. For simplicity, we present these steps only for a single variable at Level 1, dropping the p superscript, and we write the random effects and their variances as $u = (u_{A, j}, u_{B, k}, u_{A B, j k})$ and $τ^{2} = (τ_{A}^{2}, τ_{B}^{2}, τ_{A B}^{2})$ . At iteration t,

Update $x_{i j k}^{(t)}$ , so that the cluster means reflect the most recent imputations for the underlying variables.

Fit the model in Equation 4 using the cases with observed data in y to obtain estimates $\hat{β}$ , ${\hat{τ}}^{2 (t)}$ , and ${\hat{σ}}^{2 (t)}$ .

Draw $β^{(t)} \sim P (β | {\hat{β}}^{(t)}, {\hat{V}}_{β}^{(t)})$ , where ${\hat{V}}_{β}^{(t)}$ is the estimated variance-covariance matrix of $\hat{β}$ .

Draw $u^{(t)}$ by iterating s times over the following steps.

Draw $u_{A, j}^{(s)} \sim P (u_{A, j} | u_{B, k}^{(s - 1)}, u_{A B, j k}^{(s - 1)}, y_{i j k}^{obs}, x_{i j k}^{(t)}, {\hat{β}}^{(t)}, {\hat{τ}}^{2 (t)}, {\hat{σ}}^{2 (t)})$ .

Draw $u_{B, k}^{(s)} \sim P (u_{B, k} | u_{A, j}^{(s)}, u_{A B, j k}^{(s - 1)}, y_{i j k}^{obs}, x_{i j k}^{(t)}, {\hat{β}}^{(t)}, {\hat{τ}}^{2 (t)}, {\hat{σ}}^{2 (t)})$ .

Draw $u_{A B, j k}^{(s)} \sim P (u_{A B, j k} | u_{A, j}^{(s)}, u_{B, k}^{(s)}, y_{i j k}^{obs}, x_{i j k}^{(t)}, {\hat{β}}^{(t)}, {\hat{τ}}^{2 (t)}, {\hat{σ}}^{2 (t)})$ .

Draw $y_{i j k}^{mis, (t)} \sim P (y_{i j k}^{mis} | x_{i j k}^{(t)}, u^{(t)}, β^{(t)}, {\hat{τ}}^{2 (t)}, {\hat{σ}}^{2 (t)})$ to impute the cases with missing data in y in accordance with Equation 4.

Notice that the sampling steps for the random effects (Step 4) and the imputations at Level 1 (Step 5) are implemented by conditioning on the predictor variables and their cluster means (in $x_{i j k}^{(p)}$ ). In doing so, the FCS approach accommodates the relationships that can exist between the variables at different levels, similar to the JM approach. In addition, the random effects are sampled conditionally on one another and in an iterative manner. This is required in (partially) cross-classified data because the observed data provide information about multiple random effects. The sampling steps for the regression coefficients are standard Gibbs steps with an implicit uniform prior for $β$ . However, the algorithm is not a true Gibbs sampler because it omits the sampling of the variance components, relying on estimated values instead.

Single- and two-level FCS with cluster means

As an alternative to CC-MI, cross-classified data can also be accommodated with simpler imputation models (e.g., for single- or two-level data) by including the effects of cluster membership through fixed effects or additional cluster means (Andridge, 2011; Drechsler, 2015; Lüdtke et al., 2017; Wijesuriya et al., 2020). These ad hoc approaches naturally do not offer a full replacement of CC-MI, but they can be useful if software for CC-MI is not available. Here, we consider one such approach that relies on additional cluster means and can be based on single- or two-level FCS.

The FCS approach to CC-MI (Equation 4) accommodates the cross-classified structure of the data by including random effects for the target variable and cluster means for the predictor variables in each imputation model. Cluster means can be included as predictors even in simpler models, so the main difference is how these approaches address the random effects for the target. For example, when using single-level FCS, the random effects for a target variable at Level 1 can be approximated as follows:

\begin{matrix} y_{i j k}^{(p)} & = & x_{i j k}^{(p)} β^{(p)} + γ_{A}^{(p)} {\bar{y}}_{• j • (- i)}^{(p)} + γ_{B}^{(p)} {\bar{y}}_{• • k (- i)}^{(p)} + γ_{A B}^{(p)} {\bar{y}}_{• j k (- i)}^{(p)} + e_{i j k}^{(p)}, \end{matrix}

where $x^{(p)}$ contains the predictors and their cluster means as before, and ${\bar{y}}_{• j • (- i)}^{(p)}$ , ${\bar{y}}_{• • k (- i)}^{(p)}$ , and ${\bar{y}}_{• j k (- i)}^{(p)}$ are the cluster means at Levels A, B, and AB for the target variable, which are computed from the imputed values from the previous iteration of the imputation procedure. However, in order to avoid a direct dependency between the target variable and its own imputed values (van Buuren, 2018, Ch. 6), these cluster means are computed individually for every case i, such that the case i is excluded from the computation, that is:

{\bar{y}}_{• j • (- i)}^{(p)} = \frac{1}{(\sum_{k} n_{j k}) - 1} \sum_{k} \sum_{i^{'} \neq i} y_{i^{'} j k,} {\bar{y}}_{• • k (- i)}^{(p)} = \frac{1}{(\sum_{j} n_{j k}) - 1} \sum_{j} \sum_{i^{'} \neq i} y_{i^{'} j k}, {\bar{y}}_{• j k (- i)}^{(p)} = \frac{1}{n_{j k} - 1} \sum_{i^{'} \neq i} y_{i^{'} j k} .

Going forward, we will refer to these cluster means as adjusted cluster means. Conceptually, the adjusted cluster means can be regarded as a proxy for the information about case i that the other cases within the same cluster provide (i.e., the ICC), thus mimicking the contribution of the random effects in CC-MI. This method requires the adjusted cluster means to be updated at each iteration of the imputation procedure with “passive” imputation steps, an option that is provided by many statistical software packages (Raghunathan et al., 2018; Royston, 2005; van Buuren Groothuis-Oudshoorn, 2011).

Differences Between JM and FCS

The main difference between the JM and FCS approaches to CC-MI is how they represent the between-cluster components in the variables included in the imputation model. In the JM approach, the imputation model is based on a multivariate CCRM and represents the between-cluster components with random effects, which correspond to the latent within- and between-group components in the variables at each level (Asparouhov & Muthén, 2006; Lüdtke et al., 2008). By contrast, the FCS approach is based on a sequence of univariate CCRMs, in which the between-cluster components of the target variable are also represented by random effects, whereas those of the predictor variables are represented by manifest cluster means (e.g., Raudenbush & Bryk, 2002).

In the context of hierarchical data, it has been shown that the JM and FCS approaches to MI are equivalent in cases with balanced data, that is, when all clusters have the same size (Carpenter & Kenward, 2013; see also Enders et al., 2016; Lüdtke et al., 2017; Resche-Rigon & White, 2018). Specifically, for balanced data, it can be shown that the two approaches represent the conditional distribution of the missing data in different but equivalent ways, provided that the cluster means are included in the FCS approach. Grund et al. (2018a) further showed that the two approaches provide nearly identical results even in unbalanced data, where the exact equivalence between them no longer holds (see also Resche-Rigon & White, 2018).

In the Appendix, and in more detail in Supplement A in the Online Supplemental Materials, we extend these considerations to cross-classified data and found that the same result holds but under stronger conditions. Specifically, we found that the FCS and JM approaches to CC-MI are asymptotically equivalent in balanced fully cross-classified data, that is, when the number of units in A and B and the cluster sizes are constant and the sample is sufficiently large. The equivalence holds only asymptotically, because the FCS approach induces a slight dependency between marginally independent observations whose strength diminishes as the numbers of units in A and B become large. For this reason, the FCS and JM approaches are not formally equivalent in cross-classified data. Nonetheless, given that the discrepancy between the FCS and JM approaches appears to be relatively minor, we would still expect their performances to be similar in practice (see also Grund et al., 2018a). In addition, from a practical perspective, the FCS approach often has advantages over the JM approach because it allows for a more flexible specification of the imputation models, with finer control over what type of imputation model is used for each variable and which predictor variables are included in them (see also van Buuren et al., 2006).

Previous research on CC-MI has focused primarily on the FCS approach (Wijesuriya, 2021; however, see Yucel et al., 2008) or specific applications, such as missing item responses in educational assessments (Kadengye et al., 2014) or missing data in social network analysis (Jorgensen et al., 2018). In addition, Hill and Goldstein (1998) considered the special case of missing unit identifiers in longitudinally cross-classified data. Overall, these studies suggest that methods for CC-MI can provide an effective treatment of missing values in cross-classified data. However, to our knowledge, no study has systematically compared the JM and FCS approaches to CC-MI in more general settings.

Simulation

In the following, we present the results of a simulation study in which we evaluated the performance of different MI approaches for cross-classified data. This included the JM and FCS approaches to CC-MI as well as ad hoc approaches that were based on single- and two-level MI. The computer code needed to run the simulation study is provided in the OSF repository (https://osf.io/5em2d).

Data Generation

In the simulation study, we generated data for two standardized variables x and y from a multivariate CCRM (see Equation 3) with observations clustered within two crossed factors A (e.g., schools) and B (e.g., neighborhoods). Specifically, for an observation i ( $i = 1, \dots, N$ ) in unit j of factor A ( $j = 1, \dots, J$ ) and unit k of factor B ( $k = 1, \dots, K$ ), the data were generated with the following model:

(x_{i j k}, y_{i j k}) = u_{A, j} + u_{B, k} + u_{A B, j k} + e_{i j k},

where the random effects ( $u_{A, j}$ , $u_{B, k}$ , and $u_{A B, j k}$ ) and the residuals ( $e_{i j k}$ ) followed independent bivariate normal distributions with mean vectors of zero and covariance matrices $T_{A}$ , $T_{B}$ , $T_{A B}$ , and $Σ$ , respectively. The covariance matrices were chosen in such a way that (a) the two variables x and y would have certain proportions of variance at each level ( $τ_{A}^{2}$ , $τ_{B}^{2}$ , $τ_{A B}^{2}$ , and $σ^{2}$ ), and (b) the random effects and residuals of x and y would be correlated to a given extent.

Partial cross-classification

The model in Equation 8 covers both fully and partially cross-classified data. In this study, we focused on partially cross-classified data and used the following procedure to assign a subset of the units in B to a subset of the units in A (see Figure 2). The procedure consisted of two steps. First, we defined the number of units in B that needed to be assigned to each unit in A ( $n_{B / A}$ ) and initially assigned these units in a block-like pattern, where the first $n_{B / A}$ units in B would be assigned to the first $n_{A / B} = n_{B / A} \cdot \frac{J}{K}$ units in A, and so on. Second, we introduced a certain number of random permutations into these initial assignments. This resulted in partially cross-classified data with a total number of $n_{B / A} \cdot J$ pairs between A and B and with an assignment pattern that was in part deterministic and in part random. For each pair of A and B, we then generated either a balanced or unbalanced number of observations at Level 1. In the balanced case, we generated clusters of constant size (all $n_{j k} = n$ ); in the unbalanced case, we chose one unit in B per unit in A and increased the cluster size for this pair while decreasing the cluster size for all other units in B that were assigned to the same unit in A. We did this in such a way that the average cluster size would remain unchanged ( $\sum_{j} \sum_{k} n_{j k} = n$ ) in the unbalanced data. Table 1 shows an example with $J = 16$ units in A and $K = 32$ units in B, where we assigned $n_{B / A} = 8$ units in B to each unit in A with 20% random assignments and an unbalanced number of observations with an average cluster size of $n = 5$ .

Table 1.

Example of Partially Cross-Classified Data in the Simulation Study With J = 16 Units in A, K = 32 units in B, Eight Units in B Assigned to Each Unit in A, and an Average Cluster Size of n = 5 (Unbalanced, 20% Random Assignments)

Neighborhood (B)
School (A)		1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31	32
	1	4		4	4	4	4	12	4																							4
	2	4	4	4	12	4	4	4	4
	3	4			4	4	4	4	4								12								4
	4	4	4	12	4		4	4	4				4
	5									4	4	4	12	4	4	4	4
	6					4						4		4	12	4	4							4					4
	7									4		4	4	4	4	12							4							4
	8		4							4	4	4	4	12	4	4
	9										4							4	4	4		12		4	4								4
	10			4														4	4	4	12	4	4								4
	11									12								4	4	4	4	4	4		4
	12																		4	4	4		4	12	4	4						4
	13		12								4							4						4			4	4		4			4
	14																4				4	4				4	4	12	4		4
	15																									4	4	4	4	4	4	4	12
	16																									4	4	4	4	4	4	12	4

Missing data

Once the data were generated, we induced missing values in x on the basis of the values in y in accordance with an missing completely at random (MCAR) or missing at random (MAR) mechanism that we simulated with the following linear model:

r_{i j k} = α + λ y_{i j k} + v_{i j k}, v_{i j k} \sim N (0, 1 - λ^{2}) .

In this model, $α$ is a quantile of the standard normal distribution that determines the probability of missing data (e.g., $α = - 0.674$ for 25% missing data), and $λ$ determines the missing data mechanism (i.e., MCAR if $λ = 0$ , MAR otherwise). A value $x_{i j k}$ was set to missing when the corresponding $r_{i j k} > 0$ .

Simulated Conditions

Using this data generating procedure, we varied the sample sizes at Levels A and B, where we fixed the number of units in A to $J = 128$ and set the number of units in B to $K = 64$ , 128, or 256. For the cross-classification, we set the number of units in B per unit in A to $n_{B / A} = 8$ , which resulted in a constant number of 1,024 pairs between A and B in all conditions. In addition, we simulated both balanced and unbalanced samples with an average cluster size of $n = 5$ . This resulted in cross-classified data with a total sample size of $N = 5, 120$ and moderate to strong degrees of cross-classification as indicated by Cramér’s V (i.e., in comparison with a hierarchical data structure; see Lai, 2019).¹ For the variance components, we fixed the residual variances at Level 1 ( $σ^{2}$ ) to .50 and set the variances of the random effects at Levels A, B, and AB ( $τ_{A}^{2}$ , $τ_{B}^{2}$ , and $τ_{A B}^{2}$ ) to values between .10 and .30. Specifically, we considered three configurations: one equal-variance condition, where all variances were set to .20; and two unequal-variance conditions, where the variances at Levels A, B, and AB were set to .30, .20, and .10, or .20, .30, and .10, respectively. The correlations between the random effects at Levels A, B, and AB were fixed to .50, and the correlations between the residuals at Level 1 were fixed to .20. Finally, we simulated both MCAR ( $λ = 0$ ) and MAR data ( $λ = .70$ ) and fixed the probability of missing data to 25%. This resulted in 36 simulated conditions, each of which was replicated 2,000 times.²

Imputation and Analysis

To handle the missing data, we considered 10 different imputation approaches. This included the JM and FCS approaches to CC-MI as well as eight ad hoc approaches based on single- and two-level FCS. The ad hoc approaches included both “naive” specifications of these methods, which reflect common recommendations for applications in single- and two-level data, as well as the extended specifications that aim to accommodate the cross-classified data structure by including adjusted cluster means. Specifically, the methods were as follows:

FCS-1L: single-level FCS.

FCS-1L-M: single-level FCS, extended to include cluster means for the predictor and adjusted cluster means for the target variable (at Levels A, B, and AB).

FCS-2L-A: two-level FCS with random effects of A and cluster means for the predictor.

FCS-2L-A-M: two-level FCS with random effects of A and cluster means for the predictor, extended to include adjusted cluster means for the target variable (at Levels B and AB).

FCS-2L-B: two-level FCS with random effects of B and cluster means for the predictor.

FCS-2L-B-M: two-level FCS with random effects of B and cluster means for the predictor, extended to include adjusted cluster means for the target variable (at Levels A and AB).

FCS-2L-AB: two-level FCS with random effects of $A B$ and cluster means for the predictor.

FCS-2L-AB-M: two-level FCS with random effects of $A B$ and cluster means for the predictor, extended to include adjusted cluster means for the target variable (at Levels A and B).

FCS-CC: FCS approach to CC-MI.

JM-CC: JM approach to CC-MI.

To implement JM-CC, we used OpenBUGS (Lunn et al., 2000); and to implement FCS-CC and the methods based on single- and two-level FCS, we used the R packages mice and miceadds (Robitzsch et al., 2021; van Buuren & Groothuis-Oudshoorn, 2011). Finally, we also conducted the analyses with the complete data (CD) and after listwise deletion (LD) to provide a means of comparison.

In the analysis of the imputed data, we were interested in two different models. The first model was an intercept-only CCRM for x (see Equation 1). The second model was a random-intercept CCRM with y as the outcome variable and x as the explanatory variable (see Equation 2). The parameters of interest were the estimated variance components in x at each level ( $τ_{A}^{2 (x)}$ , $τ_{B}^{2 (x)}$ , $τ_{A B}^{2 (x)}$ , and $σ^{2 (x)}$ ) as well as the estimated regression coefficients for the effects of x on y at each level ( $β_{1}^{(y \sim x)}$ , $β_{A}^{(y \sim x)}$ , $β_{B}^{(y \sim x)}$ , and $β_{A B}^{(y \sim x)}$ ). For each parameter, we computed the relative bias and the coverage rates of the 95% confidence interval (CI) to evaluate the accuracy of the parameter estimates and the estimated standard errors. Due to the different representation of the between-cluster components in the data-generating model and the analysis, the true values of the regression coefficients cannot generally be expressed in closed form. For this reason, we used the average estimates in the CD as reference values in the computation of the bias and coverage.

Results

The main results are summarized in Tables 2 through 4. For simplicity, we present the detailed results only for selected conditions with $J = K = 128$ units in A and B, unbalanced clusters, and MAR data. The results for the other conditions were often similar, so we discuss them only when needed and provide them in full in Supplement C in the Online Supplemental Materials and the OSF repository (https://osf.io/5em2d).

Table 2.

Bias in the Estimated Variance Components for x in Conditions With J = 128 Units in A, K = 128 Units in B, Unbalanced Cluster Sizes, and MAR Data

			Single-Level FCS		Two-Level FCS
Par.	True	LD	1L	1L-M	2L-A	2L-A-M	2L-B	2L-B-M	2L-AB	2L-AB-M	FCS-CC	JM-CC
	$τ_{A}^{2}$ = .30, $τ_{B}^{2}$ = .20, and $τ_{A B}^{2}$ = .10
$τ_{A}^{2 (x)}$	.300	−.010	−.103	−.011	.007	−.014	−.096	−.004	−.035	−.011	.000	.000
$τ_{B}^{2 (x)}$	.200	−.008	−.070	−.013	−.066	−.007	.006	−.012	−.025	−.014	−.001	−.001
$τ_{A B}^{2 (x)}$	.100	−.002	−.026	.008	−.028	.010	−.027	.006	.060	.005	.001	.002
$σ^{2 (x)}$	.400	−.002	.198	.007	.081	.006	.108	.006	.000	.009	.000	−.001
	$τ_{A}^{2}$ = .20, $τ_{B}^{2}$ = .30, and $τ_{A B}^{2}$ = .10
$τ_{A}^{2 (x)}$	.201	−.006	−.070	−.011	.008	−.010	−.065	−.005	−.024	−.013	.001	.001
$τ_{B}^{2 (x)}$	.301	−.011	−.104	−.012	−.097	−.005	.006	−.015	−.036	−.012	.000	.000
$τ_{A B}^{2 (x)}$	.100	−.003	−.026	.007	−.028	.005	−.028	.009	.059	.005	.000	.002
$σ^{2 (x)}$	.400	−.002	.199	.007	.108	.007	.082	.007	.001	.010	.000	−.001
	$τ_{A}^{2}$ = .20, $τ_{B}^{2}$ = .20, and $τ_{A B}^{2}$ = .20
$τ_{A}^{2 (x)}$	.200	−.008	−.069	−.007	.011	−.008	−.064	−.003	−.024	−.007	.000	.000
$τ_{B}^{2 (x)}$	.200	−.009	−.070	−.007	−.065	−.003	.011	−.009	−.024	−.008	.000	.000
$τ_{A B}^{2 (x)}$	.200	−.007	−.060	.000	−.061	.002	−.060	.002	.048	−.003	.000	.000
$σ^{2 (x)}$	.400	−.001	.198	.008	.108	.009	.108	.009	.000	.009	.000	.000

Note. Bias values larger than $\pm$ 10% of the true values are printed in bold. $σ^{2 (x)}$ , $τ_{A}^{2 (x)}$ , $τ_{B}^{2 (x)}$ , $τ_{A B}^{2 (x)}$ , and $σ^{2 (x)}$ = variance components in x at Levels 1, A, B, and AB; CD = complete data; LD = listwise deletion; FCS-1L = single-level FCS with or without (adjusted) cluster means; FCS-2L = two-level FCS with random effects for A, B, or $A B$ and with or without (adjusted) cluster means; FCS-CC = cross-classified FCS; JM-CC = cross-classified JM.

The results for the estimated variance components in x are presented in Table 2. The FCS and JM approaches to CC-MI (FCS-CC, JM-CC) provided approximately unbiased parameter estimates in all simulated conditions. By contrast, the single- and two-level FCS approaches (FCS-1L, FCS-2L) led to bias unless their specification was extended to include the adjusted cluster means of the incomplete variable x. The direction and size of the bias depended on the variance configuration and how the random effects at each level were accommodated by these procedures. Specifically, single-level FCS (FCS-1L) underestimated the variances of the random effects and overestimated the residual variance at Level 1. Two-level FCS (FCS-2L-A, FCS-2L-B, and FCS-2L-AB) slightly overestimated the variance of the random effect that was included in the imputation model (e.g., $τ_{A}^{(x) 2}$ for FCS-2L-A), where the bias was largest for FCS-2L-AB. In addition, these methods underestimated the variances of the other random effects and overestimated the residual variance at Level 1, except for FCS-2L-AB, which estimated the residual variance with approximately no bias. The bias tended to be largest when the omitted variance components were large. However, when we extended the single- and two-level FCS approaches to include the adjusted cluster means, there was little to no bias in the estimated variance components. Finally, LD led to essentially unbiased estimates of the variance components. These results were fairly consistent across the simulated conditions.

The bias in the estimated regression coefficients in the CCRM of y regressed on x is shown in Table 3. Similar to before, FCS-CC and JM-CC provided estimates of the regression coefficients with little to no bias, whereas the estimates provided by the single- and two-level FCS approaches (FCS-1L and FCS-2L) were strongly biased unless their specifications also included the adjusted cluster means of the incomplete variable. When we extended single- and two-level FCS in this manner, the bias in the regression coefficients became smaller for two-level FCS with random effects of A or B (FCS-2L-A-M and FCS-2L-B-M) and even more so for single-level FCS (FCS-1L-M) and two-level FCS with random effects of $A B$ (FCS-2L-AB-M). The size of the remaining bias depended on the variance configuration, such that the bias was largest when the corresponding variance component was large, especially for FCS-2L-A-M and FCS-2L-B-M. LD led to a consistent bias in all regression coefficients. Similar to above, the results were fairly consistent across conditions, except for LD, which provided unbiased results under MCAR.

Table 3.

Bias in the Estimated Regression Coefficients in the CCRM for y Regressed on x in Conditions With J = 128 Units in A, K = 128 Units in B, Unbalanced Cluster Sizes, and MAR Data

			Single-Level FCS		Two-Level FCS
Par.	True	LD	1L	1L-M	2L-A	2L-A-M	2L-B	2L-B-M	2L-AB	2L-AB-M	FCS-CC	JM-CC
	$τ_{A}^{2}$ = .30, $τ_{B}^{2}$ = .20, and $τ_{A B}^{2}$ = .10
$β_{1}^{(y \sim x)}$	.200	$- .020$	−.036	−.003	−.034	−.003	−.043	−.003	.000	−.005	.000	.000
$β_{A}^{(y \sim x)}$	.113	−.035	.184	.009	−.045	.023	.145	.005	.108	.003	−.004	.002
$β_{B}^{(y \sim x)}$	.101	−.032	.169	.016	.117	.014	−.031	.021	.099	.010	.001	.006
$β_{A B}^{(y \sim x)}$	.165	−.043	.014	−.005	.067	−.011	.059	−.005	−.081	.003	.002	−.006
	$τ_{A}^{2}$ = .20, $τ_{B}^{2}$ = .30, and $τ_{A B}^{2}$ = .10
$β_{1}^{(y \sim x)}$	.200	−.020	−.036	−.003	−.042	−.003	−.034	−.003	.000	−.005	.000	.000
$β_{A}^{(y \sim x)}$	.103	−.033	.168	.015	−.031	.020	.116	.014	.098	.009	.000	.006
$β_{B}^{(y \sim x)}$	.115	−.033	.186	.011	.148	.007	−.043	.026	.110	.005	−.003	.004
$β_{A B}^{(y \sim x)}$	.164	−.045	.012	−.007	.057	−.007	.065	−.013	−.082	.001	.000	−.008
	$τ_{A}^{2}$ = .20, $τ_{B}^{2}$ = .20, and $τ_{A B}^{2}$ = .20
$β_{1}^{(y \sim x)}$	.200	−.020	−.036	−.004	−.043	−.004	−.043	−.004	.000	−.005	.000	.000
$β_{A}^{(y \sim x)}$	.069	−.027	.138	.006	−.076	.016	.102	.005	.079	.000	.000	.001
$β_{B}^{(y \sim x)}$	.067	−.027	.139	.006	.104	.005	−.075	.016	.080	.000	.000	.001
$β_{A B}^{(y \sim x)}$	.210	−.056	.055	.003	.106	−.001	.106	.000	−.054	.010	.001	−.002

Note. Bias values larger than $\pm$ 10% of the true values are printed in bold. $β_{1}^{(y \sim x)}$ , $β_{A}^{(y \sim x)}$ , $β_{B}^{(y \sim x)}$ , and $β_{A B}^{(y \sim x)}$ = regression coefficients for the effects of x at Levels 1, A, B, and AB; CD = complete data; LD = listwise deletion; FCS-1L = single-level FCS with or without (adjusted) cluster means; FCS-2L = two-level FCS with random effects for A, B, or $A B$ and with or without (adjusted) cluster means; FCS-CC = cross-classified FCS; JM-CC = cross-classified JM.

Finally, the coverage rates for the 95% CIs of the estimated regression coefficients (Table 4) followed the same pattern as the bias. Specifically, for FCS-CC and JM-CC, the coverage rates were close to the nominal value of 95%. For the single- and two-level FCS approaches without adjusted cluster means (FCS-1L, FCS-2L-A, FCS-2L-B, and FCS-2L-AB), the coverage rates were well below the nominal value. By contrast, when we included the adjusted cluster means, we found coverage rates close to the nominal value for both single- and two-level FCS (FCS-1L-M, FCS-2L-A-M, FCS-2L-B-M, and FCS-2L-AB-M). For LD, we found coverage rates well below the nominal value of 95%. These results were again fairly consistent across conditions, except for LD, which showed nominal coverage under MCAR.

Table 4.

Coverage Rates (%) of the 95% Confidence Intervals for the Regression Coefficients in the CCRM for y Regressed on x in Conditions with J = 128 Units in A, K = 128 Units in B, Unbalanced Cluster Sizes, and MAR Data

		Single-Level FCS		Two-Level FCS
Par.	LD	1L	1L-M	2L-A	2L-A-M	2L-B	2L-B-M	2L-AB	2L-AB-M	FCS-CC	JM-CC
	$τ_{A}^{2}$ = .30, $τ_{B}^{2}$ = .20, and $τ_{A B}^{2}$ = .10
$β_{1}^{(y \sim x)}$	79.0	35.1	94.6	51.9	94.3	30.2	94.4	94.9	94.3	95.7	95.7
$β_{A}^{(y \sim x)}$	92.0	50.5	94.8	91.1	93.7	62.5	94.7	76.3	95.0	95.2	94.8
$β_{B}^{(y \sim x)}$	92.5	53.0	95.5	71.2	95.5	92.8	95.8	78.1	95.7	95.9	96.0
$β_{A B}^{(y \sim x)}$	77.8	97.3	95.7	63.3	94.8	70.2	95.2	42.1	96.0	95.3	96.2
	$τ_{A}^{2}$ = .20, $τ_{B}^{2}$ = .30, and $τ_{A B}^{2}$ = .10
$β_{1}^{(y \sim x)}$	78.8	35.6	94.0	31.4	93.8	52.9	93.8	93.8	93.5	94.5	94.0
$β_{A}^{(y \sim x)}$	93.0	53.7	95.3	93.2	94.8	71.5	95.0	77.9	95.6	95.4	95.3
$β_{B}^{(y \sim x)}$	93.4	48.8	95.8	59.2	95.8	92.2	94.5	74.5	96.2	95.6	95.5
$β_{A B}^{(y \sim x)}$	75.3	97.3	94.9	71.7	94.8	63.9	93.6	40.5	95.0	94.2	95.0
	$τ_{A}^{2}$ = .20, $τ_{B}^{2}$ = .20, and $τ_{A B}^{2}$ = .20
$β_{1}^{(y \sim x)}$	80.3	36.1	94.9	29.8	94.8	29.8	94.5	95.3	95.0	95.7	95.0
$β_{A}^{(y \sim x)}$	92.9	68.7	93.8	84.7	94.7	77.9	94.2	84.8	94.2	94.2	94.5
$β_{B}^{(y \sim x)}$	92.2	68.3	94.3	76.4	94.3	84.0	94.5	83.3	94.4	94.4	94.2
$β_{A B}^{(y \sim x)}$	64.5	75.5	94.8	23.8	94.8	25.1	94.8	73.1	94.5	95.0	94.9

Note. Coverage rates smaller than 92.5% or larger than 97.5% are printed in bold. $β_{1}^{(y \sim x)}$ , $β_{A}^{(y \sim x)}$ , $β_{B}^{(y \sim x)}$ , and $β_{A B}^{(y \sim x)}$ = regression coefficients for the effects of x at Levels 1, A, B, and AB; CD = complete data; LD = listwise deletion; FCS-1L = single-level FCS with or without (adjusted) cluster means; FCS-2L = two-level FCS with random effects for A, B, or $A B$ and with or without (adjusted) cluster means; FCS-CC = cross-classified FCS; JM-CC = cross-classified JM.

To summarize, the results of the simulation study suggested three key findings. First, both the FCS and JM approaches to CC-MI provided accurate results in all simulated conditions. Second, the conventional single- or two-level FCS approaches performed poorly because they failed to accommodate the cross-classified data structure. Third, when the single- and two-level FCS approaches were extended to include the adjusted cluster means for the incomplete target variables, they provided much more accurate results that were very similar to CC-MI. These results are encouraging because they suggest that CC-MI as well as suitable extensions of single- and two-level MI can provide an effective treatment of missing values in cross-classified data.

Example Analysis

To illustrate the application of the different approaches to CC-MI, we use data from the Early Childhood Longitudinal Study (ECLS-K 1998). The ECLS-K is a longitudinal study that focuses on childrens’ early school experiences with multiple measurements beginning in Kindergarten (1998), through primary school (1999–2004), and up to secondary school (8th grade, 2007). An interesting feature of the ECLS-K data is that most children in the sample change schools when they transition from primary to secondary school, which means that the students are cross-classified by primary school (factor P) and secondary school (factor S).

In this example, we use a subset of the ECLS-K data with observations from Grades 5 and 8, comprising a sample of 9,067 students from 1,997 primary and 2,502 secondary schools who changed schools during that time and for whom school membership at both time points was known (Cramér’s V = .858). Specifically, we are interested in the relationship between reading achievement and the amount of time children spent doing homework after their transition to secondary school, controlling for differences between types of schools (private vs. public):

\begin{array}{l} R E A D_{i j k} & = β_{0} + β_{1} W O R K_{i j k} + β_{2} {\bar{W O R K}}_{• j •} + β_{3} {\bar{W O R K}}_{• • k} + β_{4} {\bar{W O R K}}_{• j k} + \\ β_{5} S C H T Y P E_{k} + u_{P, j} + u_{S, k} + u_{P S, j k} + e_{i j k} . \end{array}

In addition, we fit an intercept-only model for the amount of time spent on homework to quantify the amount of variance between primary schools, secondary schools, and primary-secondary school pairs. In this sample, 553 (6.1%) of the cases had missing data on at least one of the three variables.

To handle the missing data, we used a subset of the methods presented above: LD, single-level FCS with (adjusted) cluster means (FCS-1L-M), and the FCS approach to CC-MI (FCS-CC). These methods were chosen, because they either performed well in our simulation study (FCS-1L-M and FCS-CC) or as a means of comparison (LD). To implement the two MI approaches, we used the R packages mice and miceadds, and we generated 20 imputed data sets. We also used the packages EdSurvey (Bailey et al., 2021) to process the data, lme4 (Bates, 2010) to fit the analysis models, and mitml (Grund, Robitzsch, et al., 2021) to pool the results using Rubin’s (1987) rules. The computer code for this example is provided in Supplement B in the Online Supplemental Materials and the OSF repository (https://osf.io/5em2d).

The results are presented in Table 5. Overall, the results showed that students who spent more time on homework had higher reading achievement (at Level 1). In addition, there was a positive contextual effect of time spent on homework at the primary school level (P), indicating that students who had attended primary schools that assigned more homework had higher reading achievement in secondary school. However, there were no contextual effects at the secondary school level (S) or at the level of the primary-secondary school interaction (P-S). There was also a positive effect of the type of school, indicating that students at private (vs. public) schools had higher reading achievement. Finally, the results for the variance components indicated substantial amounts of variance at each level (P, S, and P-S). Due to the relatively small percentage of missing data, the results were very consistent across the methods for handling missing data, and the estimated regression coefficients were usually within half a unit of a standard error from each other. The largest difference in the estimated coefficients was the effect of time spent on homework at the secondary school level, which was essentially zero for LD and slightly positive (but non-significant) for FCS-1L-M and FCS-CC.

Table 5.

Results From the Example Analysis (ECLS-K 1998)

	LD		FCS-1L-M		FCS-CC
	Est.	SE	Est.	SE	Est.	SE
Reading achievement
Fixed effects
Intercept	162.659	0.777	161.880	0.788	161.903	0.802
Homework	0.139	0.030	0.141	0.032	0.145	0.031
Homework (Primary)	0.372	0.088	0.350	0.089	0.355	0.090
Homework (Secondary)	−0.004	0.093	0.048	0.095	0.047	0.097
Homework (P-S)	0.093	0.089	0.077	0.092	0.072	0.092
School type	10.913	1.165	11.203	1.166	11.025	1.168
Variance components
School (primary)	85.294		86.878		93.002
School (secondary)	119.553		113.999		117.022
School (P-S)	20.244		25.607		24.372
Student	488.134		506.147		498.968
Homework
Fixed effects
Intercept	8.966	0.151	8.895	0.151	8.908	0.151
Variance components
School (primary)	2.690		2.889		2.831
School (secondary)	5.896		5.575		5.433
School (P-S)	5.803		5.392		5.833
Student	101.756		104.582		104.112

Note. LD = listwise deletion; FCS-1L-M = single-level FCS with (adjusted) cluster means; FCS-CC = cross-classified FCS.

Limitations and Extensions

An important requirement of MI is that the imputation procedure must accommodate the relevant features of the data and the intended analyses. In our study, we focused on applications, in which the intended analyses were CCRMs with random intercepts and explanatory variables with linear effects. For these applications, we outlined how the JM and FCS approaches to MI can be implemented to accommodate cross-classified data, using methods that were either based directly on univariate and multivariate CCRMs (JM-CC and FCS-CC) or that emulated them in an ad-hoc manner (e.g., FCS-1L-M). The main strength of these approaches is that they provide a fairly general treatment of missing data in cross-classified data that support a broad range of CCRMs within these limits. This is particularly useful when the imputed data will be used by multiple analysts and in potentially many different analyses.

Naturally, CCRMs can also be extended by including random slopes or nonlinear effects (e.g., CLIs). These types of effects complicate the treatment of missing data, because they cause conventional MI approaches such as JM and FCS to become incompatible with the intended analysis (Du et al., 2022; see also Seaman et al., 2012). Recent research has shown that substantive-model-compatible (SMC) versions of these approaches can be used to ensure compatibility by including the substantive analysis model directly in the imputation procedure (Bartlett et al., 2015; Goldstein et al., 2014). Several studies have shown that SMC methods can be extremely effective at handling missing data in single- and multilevel analyses with nonlinear effects (Erler et al., 2017; Grund, Lüdtke, et al., 2021; Lüdtke et al., 2020). The main advantage of SMC methods is that they can be fine-tuned to accommodate the more complex features of a particular analysis at the cost of making the treatment of missing data more specific to this analysis.

In principle, SMC versions of the JM and FCS approach to CC-MI could also be used in applications of CCRMs with random slopes and nonlinear effects (see also Goldstein et al., 2014). To our knowledge, there is currently no software that provides SMC versions of JM and FCS for cross-classified data. As an alternative, general-purpose software for Bayesian data analysis (e.g., WinBUGS/OpenBUGS, JAGS, or Stan) can be used to implement an SMC version of JM or the sequential modeling approach (Ibrahim et al., 2002) to MI. The sequential modeling approach ensures compatibility by factorizing the joint distribution of the variables into a sequence of univariate conditional models, one of which corresponds to the intended analysis (Ibrahim et al., 2002; see also Lüdtke et al., 2020).

In addition to the compatibility issues caused by nonlinear effects, it has been shown that the imputation models used in the FCS approach are sometimes incompatible with each other, even if the intended analysis includes only linear effects (Liu et al., 2014; Zhu & Raghunathan, 2015). This issue has also been raised in the context of multilevel analyses, where the conditional models employed by FCS sometimes do not correspond to a well-defined joint model (Resche-Rigon and White, 2018; see also Du et al., 2022). In the present study, we found that a similar problem applies to the FCS approach in cross-classified data (see the Appendix), although this did not have any noticeable impact on its performance (see also Grund et al., 2018a). Nonetheless, the (lack of) compatibility in the FCS approach remains an important issue, and researchers should be mindful when applying this method in practice (for a more detailed discussion, see Du et al., 2022).

Discussion

In the present article, we compared different approaches for the imputation of missing values in cross-classified data (CC-MI). To this end, we introduced an extension of the popular JM and FCS approaches to MI for incomplete cross-classified data. On the basis of theoretical considerations and the results of a simulation study, we found that—though not formally equivalent—both the JM and FCS approaches to CC-MI provided an effective treatment of incomplete cross-classified data. In addition, we found that simpler approaches based on single- or two-level FCS can accommodate cross-classified data by extending the imputation models to include (adjusted) cluster means. Finally, we illustrated the application of these methods with the mice and miceadds packages for the statistical software R in a worked example with real data from education research.

Our findings have multiple implications for practice. First, the JM and FCS approaches to CC-MI appear to be similarly suited for handling incomplete cross-classified data. The FCS approach can be particularly convenient because it can easily handle different types of variables (e.g., a mixture of continuous and categorical data) and provides finer control over the selection of predictor variables in each model. For the FCS approach to CC-MI to accommodate the cross-classified structure of the data, the imputation models should include (a) the random effects for each crossed factor and (if possible) their interaction and (b) cluster means of the predictor variables to accommodate the relationships between the variables at each level. As an alternative to CC-MI, cross-classified data can also be handled with simpler techniques that are based on single- or two-level FCS. This can be beneficial when methods for CC-MI are unavailable or suffer from convergence problems. In such cases, single- or two-level FCS approaches can be extended to include (adjusted) cluster means, which can reduce or avoid the computational burden of modeling multiple random effects in CCRMs while providing results that are often similar to CC-MI.

The present study has several limitations in addition to those listed above. First, there are several variants of CCRMs for nonhierarchical data that should be considered in future research. This includes multiple-membership multiple-classification (MMMC) models that can be used to analyze data, in which observations are not only clustered in multiple crossed factors but can also belong to multiple units of the same factor (Browne et al., 2001; Grady & Beretvas, 2010; Park & Beretvas, 2020). Similarly, in longitudinal research, CCRMs can be extended to distinguish between acute and cumulative effects of cluster membership when this membership changes over time (Cafri et al., 2015). Although MMMC and similar models share many of the features of the CCRMs considered in this article, little is known about how multiple-membership structures can and should be accommodated in the treatment of missing data (however, see Yucel et al., 2008). Second, in our simulation study, we focused on partially cross-classified data with structural features that are typical in education research (e.g., Garner & Raudenbush, 1991; Goldstein & Sammons, 1997; Raudenbush, 1993). Future research should also evaluate CC-MI in settings with more challenging features, for example, with a small number of clusters at Levels A and B, or common features from other areas of research, for example, with fully cross-classified data or only a single observation at Level 1 (i.e., without random effects of the “interaction”). Third, we assumed that the identifiers that denote the cluster membership for each unit were fully observed. However, especially in longitudinal data, in which cluster membership can change over time, these identifiers can also be missing, and more research is needed to determine how to handle missing data in these cases (see also Hill & Goldstein, 1998; van Buuren, 2011). Fourth, in our simulation study, we did not consider higher-level variables with missing data (e.g., at Levels A, B, or AB). Future research should therefore evaluate the performance of the different approaches to CC-MI for the treatment of missing data in higher-level variables (see also Enders et al., 2018; Grilli et al., 2022; Grund et al., 2018b). Finally, we evaluated CC-MI only in conditions with MCAR or MAR data. By contrast, when data are missing not at random (MNAR), conducting MI typically requires strong assumptions about the missing data mechanism, which can be evaluated in sensitivity analyses (Carpenter & Kenward, 2013). Future research should therefore evaluate CC-MI in conditions with MNAR data.

To summarize, the present study compared and evaluated multiple approaches to CC-MI, which included both a novel extension of the popular JM and FCS approaches to MI and a number of alternative methods that extend existing methods for single- and two-level MI. We conclude that multiple approaches to CC-MI can provide an effective treatment of incomplete cross-classified data. We hope that the findings presented here motivate further research on statistical methods for handling missing values in cross-classified data and other types of nonhierarchical data.

Supplemental Material

Supplemental Material, sj-pdf-1-jeb-10.3102_10769986231151224 - Handling Missing Data in Cross-Classified Multilevel Analyses: An Evaluation of Different Multiple Imputation Approaches

Supplemental Material, sj-pdf-1-jeb-10.3102_10769986231151224 for Handling Missing Data in Cross-Classified Multilevel Analyses: An Evaluation of Different Multiple Imputation Approaches by Simon Grund, Oliver Lüdtke and Alexander Robitzsch in Journal of Educational and Behavioral Statistics

Footnotes

Appendix

Authors’ Note

All additional files concerning this article, including scripts, data, and the supplemental materials are available at .

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Simon Grund

Notes

References

Andridge

R. R.

(2011). Quantifying the impact of fixed effects modeling of clusters in multiple imputation for cluster randomized trials. Biometrical Journal, 53, 57–74. https://doi.org/10.1002/bimj.201000140

Asparouhov

Muthén

B. O

. (2006). Constructing covariates in multilevel regression (Mplus Web Notes No. 11). http://www.statmodel.com/

Baayen

Davidson

Bates

(2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59, 390–412. https://doi.org/10.1016/j.jml.2007.12.005

Bailey

Emad

Huo

Lee

Liao

Lishinski

Nguyen

Xie

Zhang

Buehler

Lee

S.-J.

Sikali

Bundsgaard

C’deBaca

Christensen

A. A

. (2021). EdSurvey: Analysis of NCES education survey and assessment data (Version 2.7.1). https://CRAN.R-project.org/package=EdSurvey

Barker

K. M.

Dunn

E. C.

Richmond

T. K.

Ahmed

Hawrilenko

Evans

C. R.

(2020). Cross-classified multilevel models (CCMM) in health research: A systematic review of published empirical studies and recommendations for best practices. SSM—Population Health, 12(100661), 1–19. https://doi.org/10.1016/j.ssmph.2020.100661

Bartlett

J. W.

Seaman

S. R.

White

I. R.

Carpenter

J. R.

(2015). Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Statistical Methods in Medical Research, 24, 462–487. https://doi.org/10.1177/0962280214521348

Bates

D. M.

(2010). lme4: Mixed-effects modeling with R. Springer.

Beretvas

S. N

. (2011). Cross-classified and multiple-membership models. In Hox

J. J.

Roberts

J. K.

(Eds.), Handbook of advanced multilevel analysis (pp. 313–334). Routledge.

Browne

W. J

. (2009). MCMC estimation in MLwiN. Technical Manual. Centre for Multilevel Modelling, University of Bristol. http://www.bristol.ac.uk/cmm/software/mlwin/

10.

Browne

W. J.

Goldstein

Rasbash

(2001). Multiple Membership Multiple Classification (MMMC) models. Statistical Modelling, 1, 103–124. https://doi.org/10.1177/1471082X0100100202

11.

Cafri

Hedeker

Aarons

G. A.

(2015). An introduction and integration of cross-classified, multiple membership, and dynamic group random-effects models. Psychological Methods, 20, 407–421. https://doi.org/10.1037/met0000043

12.

Carpenter

J. R.

Kenward

M. G

. (2013). Multiple imputation and its application. Wiley.

13.

Claus

A. M.

Arend

M. G.

Burk

C. L.

Kiefer

Wiese

B. S.

(2020). Cross-classified models in I/O psychology. Journal of Vocational Behavior, 120(103447), 1–16. https://doi.org/10.1016/j.jvb.2020.103447

14.

Cronbach

L. J.

Gleser

G. C.

Nanda

Rajaratnam

. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. Wiley.

15.

Drechsler

(2015). Multiple imputation of multilevel missing data—Rigor versus simplicity. Journal of Educational and Behavioral Statistics, 40, 69–95. https://doi.org/10.3102/1076998614563393

16.

Alacam

Mena

Keller

B. T.

(2022). Compatibility in imputation specification. Behavior Research Methods, 54, 2962–2980. https://doi.org/10.3758/s13428-021-01749-5

17.

Dundas

Leyland

A. H.

Macintyre

(2014). Early-life school, neighborhood, and family influences on adult health: A multilevel cross-classified analysis of the Aberdeen children of the 1950s study. American Journal of Epidemiology, 180, 197–207. https://doi.org/10.1093/aje/kwu110

18.

Dunn

E. C.

Milliren

C. E.

Evans

C. R.

Subramanian

S. V.

Richmond

T. K.

(2015). Disentangling the relative influence of schools and neighborhoods on adolescents’ risk for depressive symptoms. American Journal of Public Health, 105, 732–740. https://doi.org/10.2105/ajph.2014.302374

19.

Enders

C. K

. (2010). Applied missing data analysis. Guilford Press.

20.

Enders

C. K.

Keller

B. T.

Levy

(2018). A fully conditional specification approach to multilevel imputation of categorical and continuous variables. Psychological Methods, 23(2), 298–317. https://doi.org/10.1037/met0000148

21.

Enders

C. K.

Mistler

S. A.

Keller

B. T.

(2016). Multilevel multiple imputation: A review and evaluation of joint modeling and chained equations imputation. Psychological Methods, 21, 222–240. https://doi.org/10.1037/met0000063

22.

Erler

N. S.

Rizopoulos

Jaddoe

V. W.

Franco

O. H.

Lesaffre

E. M.

(2017). Bayesian imputation of time-varying covariates in linear mixed models. Statistical Methods in Medical Research, 28, 555–568. https://doi.org/10.1177/0962280217730851

23.

Fielding

Goldstein

(2006). Cross-classified and multiple membership structures in multilevel models: An introduction and review. http://dera.ioe.ac.uk/6469/1/RR791.pdf

24.

Garner

C. L.

Raudenbush

S. W.

(1991). Neighborhood effects on educational attainment: A multilevel analysis. Sociology of Education, 64, 251–262. https://doi.org/10.2307/2112706

25.

Goldstein

. (2011). Multilevel statistical models (4th ed.). Wiley.

26.

Goldstein

Carpenter

J. R.

Browne

W. J.

(2014). Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non-linear terms. Journal of the Royal Statistical Society: Series A (Statistics in Society), 177, 553–564. https://doi.org/10.1111/rssa.12022

27.

Goldstein

Carpenter

J. R.

Kenward

M. G.

Levin

K. A.

(2009). Multilevel models with multivariate mixed response types. Statistical Modelling, 9, 173–197. https://doi.org/10.1177/1471082X0800900301

28.

Goldstein

Sammons

(1997). The influence of secondary and junior schools on sixteen year examination performance: A cross-classified multilevel analysis. School Effectiveness and School Improvement, 8, 219–230. https://doi.org/10.1080/0924345970080203

29.

Grady

M. W.

Beretvas

S. N.

(2010). Incorporating student mobility in achievement growth modeling: A cross-classified multiple membership growth curve model. Multivariate Behavioral Research, 45, 393–419. https://doi.org/10.1080/00273171.2010.483390

30.

Gregory

Huang

(2013). It takes a village: The effects of 10th grade college-going expectations of students, parents, and teachers four years later. American Journal of Community Psychology, 52, 41–55. https://doi.org/10.1007/s10464-013-9575-5

31.

Grilli

Francesca Marino

Paccagnella

Rampichini

(2022). Multiple imputation and selection of ordinal level 2 predictors in multilevel models: An analysis of the relationship between student ratings and teacher practices and attitudes. Statistical Modelling, 22(3), 221–238. https://doi.org/10.1177/1471082X20949710

32.

Grund

Lüdtke

Robitzsch

(2018a). Multiple imputation of missing data at level 2: A comparison of fully conditional and joint modeling in multilevel designs. Journal of Educational and Behavioral Statistics, 43, 316–353. https://doi.org/10.3102/1076998617738087

33.

Grund

Lüdtke

Robitzsch

(2018b). Multiple imputation of missing data for multilevel models: Simulations and recommendations. Organizational Research Methods, 21, 111–149. https://doi.org/10.1177/1094428117703686

34.

Grund

Lüdtke

Robitzsch

(2021). Multiple imputation of missing data in multilevel models with the R package mdmb: A flexible sequential modeling approach. Behavior Research Methods, 53(6), 2631–2649. https://doi.org/10.3758/s13428-020-01530-0

35.

Grund

Robitzsch

Luedtke

. (2021). mitml: Tools for multiple imputation in multilevel modeling (Version 0.4-3). https://CRAN.R-project.org/package=mitml

36.

Heck

R. H.

(2009). Teacher effectiveness and student achievement: Investigating a multilevel cross-classified model. Journal of Educational Administration, 47, 227–249. https://doi.org/10.1108/09578230910941066

37.

Hill

P. W.

Goldstein

(1998). Multilevel modeling of educational data with cross-classification and missing identification for units. Journal of Educational and Behavioral statistics, 23, 117–128. https://doi.org/10.3102/10769986023002117

38.

Ibrahim

J. G.

Chen

M.-H.

Lipsitz

S. R.

(2002). Bayesian methods for generalized linear models with covariates missing at random. Canadian Journal of Statistics, 30, 55–78. https://doi.org/10.2307/3315865

39.

Jorgensen

T. D.

Forney

K. J.

Hall

J. A.

Giles

S. M.

(2018). Using modern methods for missing data analysis with the social relations model: A bridge to social network analysis. Social Networks, 54, 26–40. https://doi.org/10.1016/j.socnet.2017.11.002

40.

Kadengye

D. T.

Ceulemans

Van den Noortgate

(2014). Direct likelihood analysis and multiple imputation for missing item scores in multilevel cross-classification educational data. Applied Psychological Measurement, 38(1), 61–80. https://doi.org/10.1177/0146621613491138

41.

Kreft

I. G. G.

de Leeuw

Aiken

L. S.

(1995). The effect of different forms of centering in hierarchical linear models. Multivariate Behavioral Research, 30, 1–21. https://doi.org/10.1207/s15327906mbr3001_1

42.

Kyriakides

Creemers

B. P. M.

(2008). A longitudinal study on the stability over time of school and teacher effects on student outcomes. Oxford Review of Education, 34, 521–545. https://doi.org/10.1080/03054980701782064

43.

Lai

M. H. C.

(2019). Correcting fixed effect standard errors when a crossed random effect was ignored for balanced and unbalanced designs. Journal of Educational and Behavioral Statistics, 44, 448–472. https://doi.org/10.3102/1076998619843168

44.

Little

R. J. A.

Rubin

D. B

. (2002). Statistical analysis with missing data (2nd ed.). Wiley.

45.

Liu

Gelman

Hill

Y.-S.

Kropko

(2014). On the stationary distribution of iterative imputations. Biometrika, 101(1), 155–173. https://doi.org/10.1093/biomet/ast044

46.

Lüdtke

Marsh

H. W.

Robitzsch

Trautwein

Asparouhov

Muthén

B. O.

(2008). The multilevel latent covariate model: A new, more reliable approach to group-level effects in contextual studies. Psychological Methods, 13, 203–229. https://doi.org/10.1037/a0012869

47.

Lüdtke

Robitzsch

Grund

(2017). Multiple imputation of missing data in multilevel designs: A comparison of different strategies. Psychological Methods, 22, 141–165. https://doi.org/10.1037/met0000096

48.

Lüdtke

Robitzsch

West

S. G.

(2020). Regression models involving nonlinear effects with missing data: A sequential modeling approach using Bayesian estimation. Psychological Methods, 25, 157–181. https://doi.org/10.1037/met0000233

49.

Lunn

D. J.

Thomas

Best

Spiegelhalter

(2000). WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing, 10, 325–337. https://doi.org/10.1023/A:1008929526011

50.

Luo

Kwok

O.-M.

(2009). The impacts of ignoring a crossed factor in analyzing cross-classified data. Multivariate Behavioral Research, 44(2), 182–212. https://doi.org/10.1080/00273170902794214

51.

Maxwell

S. E.

Delaney

H. D.

Kelley

. (2018). Designing experiments and analyzing data: A model comparison perspective (3rd eds.). Erlbaum.

52.

Meyers

J. L.

Beretvas

S. N.

(2006). The impact of inappropriate modeling of cross-classified data structures. Multivariate Behavioral Research, 41, 473–497. https://doi.org/10.1207/s15327906mbr4104_3

53.

Mistler

S. A.

Enders

C. K.

(2017). A comparison of joint model and fully conditional specification imputation for multilevel missing data. Journal of Educational and Behavioral Statistics, 42, 432–466. https://doi.org/10.3102/1076998617690869

54.

Muthén

L. K.

Muthén

B. O

. (2017). Mplus user’s guide (8th ed.). Muthén & Muthén. https://www.statmodel.com/language.html

55.

Park

Beretvas

S. N.

(2020). The multivariate multiple-membership random-effect model: An introduction and evaluation. Behavior Research Methods, 52, 1254–1270. https://doi.org/10.3758/s13428-019-01315-0

56.

Paterson

(1991). Socio-economic status and educational attainment: A multi-dimensional and multi-level study. Evaluation & Research in Education, 5, 97–121. https://doi.org/10.1080/09500799109533303

57.

Plummer

. (2017). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling (Version 4.3.0). http://sourceforge.net/projects/mcmc-jags/

58.

R Core Team. (2021). R: A language and environment for statistical computing (Version 4.1). https://www.R-project.org/

59.

Raghunathan

T. E.

Berglund

Solenberger

P. W.

van Hoewyk

(2018). IVEware: Imputation and variance estimation software (Version 0.3). https://www.src.isr.umich.edu/software/

60.

Raghunathan

T. E.

Lepkowski

J. M.

van Hoewyk

Solenberger

(2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, 27(1), 85–96. http://www.statcan.gc.ca/

61.

Rasbash

Browne

W. J.

(2008). Non-hierarchical multilevel models. In de Leeuw

Meijer

(Eds.), Handbook of multilevel analysis (pp. 301–334). Springer.

62.

Raudenbush

S. W.

(1993). A crossed random effects model for unbalanced data with applications in cross-sectional and longitudinal research. Journal of Educational and Behavioral Statistics, 18, 321–349. https://doi.org/10.3102/10769986018004321

63.

Raudenbush

S. W.

Bryk

A. S.

(2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Sage.

64.

Resche-Rigon

White

I. R.

(2018). Multiple imputation by chained equations for systematically and sporadically missing multilevel data. Statistical Methods in Medical Research, 27, 1634–1649. https://doi.org/10.1177/0962280216666564

65.

Robitzsch

Grund

Henke

. (2021). miceadds: Some additional multiple imputation functions, especially for mice (Version 3.12-16). https://CRAN.R-project.org/package=miceadds

66.

Royston

(2005). Multiple imputation of missing values: Update. The Stata Journal, 5, 188–201. http://www.stata-journal.com

67.

Rubin

D. B

. (1987). Multiple imputation for nonresponse in surveys. Wiley.

68.

Schafer

J. L.

Graham

J. W.

(2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147–177. https://doi.org/10.1037//1082-989X.7.2.147

69.

Schafer

J. L.

Olsen

M. K.

(1998). Multiple imputation for multivariate missing-data problems: A data analyst’s perspective. Multivariate Behavioral Research, 33, 545–571. https://doi.org/10.1207/s15327906mbr3304_5

70.

Schafer

J. L.

Yucel

R. M.

(2002). Computational strategies for multivariate linear mixed-effects models with missing values. Journal of Computational and Graphical Statistics, 11, 437–457. https://doi.org/10.1198/106186002760180608

71.

Seaman

S. R.

Bartlett

J. W.

White

I. R.

(2012). Multiple imputation of missing covariates with non-linear effects and interactions: An evaluation of statistical methods. BMC Medical Research Methodology, 12(46), 1–13. https://doi.org/10.1186/1471-2288-12-46

72.

Shavelson

R. J.

Webb

N. M

. (2000). Generalizability theory: A primer. SAGE.

73.

Shi

Leite

Algina

(2010). The impact of omitting the interaction between crossed factors in cross-classified random effects modelling. British Journal of Mathematical and Statistical Psychology, 63, 1–15. https://doi.org/10.1348/000711008X398968

74.

Stan Development Team (2021). Stan modeling language user’s guide and reference manual. http://mc-stan.org.

75.

van Buuren

(2011). Multiple imputation of multilevel data. In Hox

J. J.

(Ed.), Handbook of advanced multilevel analysis (pp. 173–196). Routledge.

76.

van Buuren

. (2018). Flexible imputation of missing data (2nd ed.). CRC Press. https://stefvanbuuren.name/fimd/

77.

van Buuren

Brand

J. P. L.

Groothuis-Oudshoorn

C. G. M.

Rubin

D. B.

(2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76, 1049–1064. https://doi.org/10.1080/10629360600810434

78.

van Buuren

Groothuis-Oudshoorn

. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. https://doi.org/10.18637/jss.v045.i03

79.

Wijesuriya

. (2021). Evaluation of multiple imputation approaches for handling incomplete three-level data. University of Melbourne. http://minerva-access.unimelb.edu.au/handle/11343/293059

80.

Wijesuriya

Moreno-Betancur

Carlin

J. B.

Lee

K. J.

(2020). Evaluation of approaches for multiple imputation of three-level data. BMC Medical Research Methodology, 20(207), 1–15. https://doi.org/10.1186/s12874-020-01079-8

81.

Yucel

R. M.

Ding

Uludag

A. K.

Tomaskovic-Devey

(2008). Multiple imputation in multiple classification and multiple-membership structures. Proceedings of the Section on Bayesian Statistical Science of the American Statistical Association. http://www.asasrms.org/Proceedings/y2008f.html

82.

Zhu

Raghunathan

T. E.

(2015). Convergence properties of a sequential regression multiple imputation algorithm. Journal of the American Statistical Association, 110, 1112–1124. https://doi.org/10.1080/01621459.2014.948117

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.31 MB