Sage Journals: Discover world-class research

Abstract

In this article, we extend Liao’s test for across-group comparisons of the fixed effects from the generalized linear model to the fixed and random effects of the generalized linear mixed model (GLMM). Using as our basis the Wald statistic, we developed an asymptotic test statistic for across-group comparisons of these effects. The test can be applied when the fixed and random effects are multivariate normally distributed, and it works well for any link function and conditional distribution of the dependent variable of the GLMM. We also derived the asymptotic properties of this test, and because power information does not exist for either our new test statistic or Liao’s test, we implemented a power study to demonstrate the superiority of these tests over the alternatively proposed F test. Using an example, we show the application of the test and then discuss its possible restrictions with respect to the distribution of the random effects.

Keywords

generalized linear mixed model group comparisons Wald statistic general linear hypothesis

In 2011, the International Association for the Evaluation of Educational Achievement (IEA) conducted the Progress in International Reading Literacy Study (PIRLS) and the Trends in International Mathematics and Science Study (TIMSS) jointly for the first time. PIRLS has assessed the reading comprehension achievement of Grade 4 students every five years since 2001 (Mullis et al. 2012), while TIMSS has assessed the mathematics and science achievement of Grades 4 and 8 students every four years since 1995 (Martin et al. 2012). In 2011, 34 countries and three benchmark participants collected data on Grade 4 students’ educational achievement in three competence domains: reading comprehension, mathematics, and science (Martin and Mullis 2013).

Martin et al. (2013) performed a “school-effectiveness” analysis using the TIMSS/PIRLS 2011 combined data set. For their analysis, Martin et al. used five school-effectiveness variables and two student home-background variables as predictors in country-specific hierarchical linear models (HLMs). They used students’ achievement scores (reading comprehension, mathematics achievement, and science achievement) as dependent variables. Because the goal of the study was to “present an analytic framework that could provide an overview of how these relationships vary across countries” (Martin et al. 2013:110), it seems reasonable to assume that the results from the hierarchical linear modeling were comparable across the participating countries. However, it is not obvious, from the results that Martin et al. (2013) presented, which procedure the authors applied during their cross-national comparisons of the fixed effects from the HLM.

HLMs present a special case of the generalized linear mixed model (GLMM). The GLMM allows for nonnormal distributed response variables, and the linear predictor can contain random effects. The model can therefore be used to perform, for example, linear regression analyses with random effects, logistic regression analyses with random effects, or Poisson mixed analyses for overdispersed count data, with inference from the fixed and random effects of the GLMM usually based on estimable functions or predictable functions (Searle 1971). To date, researchers have only applied these tests when fitting the GLMM to a single sample from a single population (McCulloch, Searle, and Neuhaus 2008). Also, although researchers wanting to conduct across-group comparisons of fixed effects have resource to many tests for this purpose, these tests refer to the fixed effects of the linear model (Brame et al. 1998; Cai and Xia 2014; DeShon and Alexander 1994, 1996; Gujarati 1970a, 1970b; Moreno, Torres, and Casella 2005; Radhakrishnan and Robinson 1996; Skvarcius and Cromer 1971; Stroud 1974; Weerahandi 1987; Werts et al. 1976), the general linear model (Lui, Cumberland, and Chang 2014), and nonparametric regression models (Maity 2012; Neumeyer and Sperlich 2006; Park, Hannig, and Kang 2014; Tonggumnead et al. 2010). The question we address in this article is can these tests be used to test for the equality of the effects from the GLMM?

We found, as we document in this article, that a direct application of the above tests to the more general case of the GLMM is not always possible. However, we conjectured that modifying a test proposed by Liao (2004) for across-group comparisons of the fixed effects from the generalized linear model (GLM) might address this concern. Unfortunately, no power information existed for this test until now. Also, the test has been applied only to the fixed effects of the GLM. We decided to extend this test to the more general case of the GLMM because it seemed to us that our test could be used for across-group comparisons of fixed and random effects. As we also show in this article, the test statistic proved to be asymptotic $χ^{2}$ -distributed and superior in power to the traditional testing procedures we compared it with. To provide a practical example of this finding, we reanalyzed part of the TIMSS/PIRLS 2011 combined data sets that Martin et al. (2013) used to demonstrate the applicability of the test statistic within the context of IEA studies.

The GLMM

In the following, let us assume that we have $i = 1, \dots, n$ observations on response variable y_i with $y = (\begin{matrix} y_{1} & \dots & y_{n} \end{matrix})^{'}$ . The typical assumptions about y that are associated with the GLMM are as follows:

$\begin{array}{l} y_{i} | θ & \sim indep . f (y_{i} | θ), \end{array}$

$\begin{array}{l} f (y_{i} | θ) & = h (y_{i}) exp [{η_{i} T (y_{i}) - A (η_{i})} / ϕ], \end{array}$

$\begin{array}{l} E [y | θ] & = g^{- 1} (X β + Z θ), and \end{array}$

$\begin{array}{l} θ & \sim w (θ), \end{array}$

with $(\partial A (η_{i}) / \partial η_{i}) = μ_{i} = E [y_{i} | θ]$ and $(\partial^{2} A (η_{i}) / \partial η_{i}^{2}) ϕ = σ_{y_{i} | θ}^{2}$ (Breslow and Clayton 1993; Karim and Zeger 1992; McCullagh and Nelder 1989; McCulloch et al. 2008; Pfeffermann et al. 1998; Pinheiro and Bates 1995; Rabe-Hesketh and Skrondal 2006; Tuerlinckx et al. 2006; Wolfinger and O’Connell 1993).¹ Thus, a typical assumption regarding the random vector $θ$ is that the elements y_i are independent and that each has a distribution $f (y_{i} | θ)$ .² Another assumption is that a differentiable monotonic link function $g (\cdot)$ (with its inverse $g^{- 1} (\cdot)$ ) exists that maps the conditional expectation $E [y | θ]$ linearly on the $n \times (p + 1)$ predictor matrix X with corresponding $(p + 1) \times 1$ fixed-effect vector $β$ and then on the $n \times t M$ block-predictor matrix $Z = (\begin{matrix} Z_{1} & \dots & Z_{M} \end{matrix})$ with corresponding $t M \times 1$ random effect vector $θ$ (where t is the number of assumed random effect predictors and M is the number of units across which the random effects should vary; see below). A further assumption is that the random effects follow some form of distribution but not necessarily a normal one.

The GLMM is therefore a very general statistical analysis model. However, by properly specifying the conditional density $f (y_{i} | θ)$ and the link function, we can use the GLMM to accommodate different statistical analyses. For example, if we assume a normal distribution for the conditional density and use the identity link, then the result is a linear mixed model (LMM)

E [y | θ] = X β + Z θ,

that can be used to perform, for example, not only linear regression analyses, analyses of variance, and analyses of covariance with or without (if $θ = 0$ ) random effects, but also, and in general, hierarchical linear analyses (Hofmann 1997; McCulloch et al. 2008; Raudenbush and Bryk 2002; Woltman et al. 2012). Let us consider another example. If we assume a multinomial distribution for the conditional density and if we use the cumulative logit link function (which is useful when we have an ordinal dependent variable) then, and assuming y_i can fall in $c = 1, \dots, C$ categories,

E [y_{i} \leq j | θ] = Pr (y_{i} \leq j) = ν_{i j} = \frac{e^{δ_{j} - λ_{i}}}{1 + e^{δ_{j} - λ_{i}}} j = 1, \dots, C - 1 .

Here, $ν_{i j}$ are the cumulative probabilities $ν_{i j} = p_{i 1} + \dots + p_{i j}$ (with $p_{i j}$ as the probability that observation i will fall into category j) and $δ_{j}$ are the intercept for category j and $λ_{i} = x_{i}' β + z_{i}' θ$ . We can also reduce the ordinal model to one that has dependent variables with only two categories by, for example, assuming a Bernoulli distribution and using the logistic link function. These models can be used for, among other analyses, logistic or ordinal regression analyses with or without random effects (Ene et al. 2015). As Boeck et al. (2011), Powers and Xie (2008), and Skrondal and Rabe-Hesketh (2004) have shown, these models can also be used to estimate a variety of item response theory models with the GLMM.

In general, the GLMM can accommodate multilevel-structured data sets through proper specification of Z and $θ$ . We can demonstrate this possibility via an example employing a two-stage sampling design with a continuous normally distributed response variable. Extending this example to models with more than two stages or with response variables that are not normally distributed is straightforward. Let us, then, assume a two-stage cluster sampling design where schools have been randomly sampled during Stage 1 and students randomly sampled during Stage 2. Now let $j = 1, \dots, M$ and $i = 1, \dots, n_{j}$ denote the indices of units at Level 2 (Stage 1) and Level 1 (Stage 2). Assume further that we have mathematical achievement as the response variable $y_{i j}$ and that achievement in mathematics is explained by the student’s social status $x_{1 i j}$ and school region (urban or rural) $x_{2 i j}$ . If we then assume that both the average achievement of students in the school and the relationship between students’ socioeconomic status and mathematical achievement vary randomly across schools, we can be confident that the resultant model is a random intercept, random slope model (Raudenbush and Bryk 2002).

In terms of the GLMM, this model can be expressed as

E [y | θ] = X β + Z θ,

= (\begin{matrix} 1 & x_{111} & x_{211} \\ ⋮ & ⋮ & ⋮ \\ 1 & x_{1 n_{1} 1} & x_{2 n_{1} 1} \\ 1 & x_{112} & x_{212} \\ ⋮ & ⋮ & ⋮ \\ 1 & x_{1 n_{M} M} & x_{2 n_{M} M} \end{matrix}) (\begin{matrix} β_{0} \\ β_{1} \\ β_{2} \end{matrix}) + (\begin{matrix} 1 & x_{111} \\ ⋮ & ⋮ \\ 1 & x_{1 n_{1} 1} \\ ⋱ \\ 1 & x_{11 M} \\ ⋮ & ⋮ \\ 1 & x_{1 n_{M} M} \end{matrix}) (\begin{matrix} θ_{01} \\ θ_{11} \\ ⋮ \\ θ_{0 M} \\ θ_{1 M} \end{matrix}),

where $θ_{01}, \dots, θ_{0 M}$ are the random effects due to the intercept and $θ_{11}, \dots, θ_{1 M}$ are the random effects due to the slope for socioeconomic status (note that the empty spaces replace the zeros in Z ). We consider it important to mention here that while researchers are rarely interested in the predicted values of $θ_{j} = (\begin{matrix} θ_{0 j} & \dots & θ_{t j} \end{matrix})^{'}$ , they are interested in the variances and covariance of these effects. Hence, in our example, the goal would be to predict

\begin{array}{l} Σ_{θ_{j}} = (\begin{matrix} σ_{θ_{0}}^{2} & σ_{θ_{0}, θ_{1}} \\ σ_{θ_{1}, θ_{0}} & σ_{θ_{1}}^{2} \end{matrix}), \end{array}

rather than to predict $θ_{j}$ directly. It is important at this point and hereafter for us to assume that $Σ_{θ_{j}} = Σ_{θ}$ holds for all j. (For a critical reflection on this assumption, see Daniels and Zhao 2003; Heagerty and Zeger 2000).

Although a closed-form solution for estimating $β$ and predicting $θ$ (or $Σ_{θ}$ ) exists for some GLMMs (e.g., when the dependent variable is normally distributed and the LMM is therefore used), no analytic solution exists for other models. Researchers have proposed many different methods for estimating or predicting the parameters in these cases (see, e.g., Booth and Hobert 1999; Breslow and Clayton 1993; Breslow and Lin 1995; Gamerman 1997; Lin and Breslow 1996; Natarajan and Kass 2000; Pinheiro and Bates 1995; Rabe-Hesketh and Skrondal 2006; Raudenbush, Yang, and Yosef 2000; Shun 1997; Shun and McCullagh 1995; Wolfinger and O’Connell 1993; Zeger and Karim 1991). A review of these estimation or prediction methods can be found in Tuerlinckx et al. (2006). However, here and in the following sections, we can assume that a proper estimation method is being used in the sense that the assumption of (multivariate) normally distributed estimates $\hat{β} \sim N (β, Σ_{β})$ (with ${lim}_{n \to \infty} E (\hat{β}) = β$ and, if it exists, asymptotic covariance matrix $Σ_{β}$ ; McCulloch et al. 2008) is reasonable. In general, this assumption holds when, for example, a maximum likelihood estimation procedure is used. We can further assume that the random effects $θ$ are zero-mean normally distributed.

Inference About the Fixed and Random Effects of the GLMM

Because our focus is on group comparisons, we are mainly interested in the question of whether these effects are equal across different populations. As mentioned above, the GLMM is a very general statistical analysis model. It incorporates, for example, the linear regression model or the analysis of variance model with random effects. Our aim in this section is to discuss a broad range of hypotheses tests (for both the single population case and the multiple population case) within the context of these specific models (Brame et al. 1998; Cai and Xia 2014; DeShon and Alexander 1994, 1996; Gujarati 1970a, 1970b; Moreno et al. 2005; Radhakrishnan and Robinson 1996; Skvarcius and Cromer 1971; Stroud 1974; Weerahandi 1987; Werts et al. 1976). We recognize, however, that a straightforward application of these specific tests to the estimators of the GLMM is not always possible. For example, as Ai and Norton (2003) have shown, the dummy variable approach, introduced by Gujarati (1970a, 1970b) and Skvarcius and Cromer (1971) within the context of the linear model for testing significant different slopes across groups, can produce biased inference when a probit or logit link is used. Consequently, instead of examining these specific tests, we discuss inference procedures directly relevant to the effects of the GLMM.

When discussing inference procedures for the fixed and random effects of the GLMM, we need, of course, to assume two different settings. In the first setting, that is, the single population case, the researcher has no interest in group differences (Stroup and Kachman 1994). In the second setting, that is, the multiple populations case, the researcher is interested in group differences. As mentioned above, different inference procedures exist for this setting. One possible way to test for these group differences is the dummy variable approach (Gujarati 1970a, 1970b; Skvarcius and Cromer 1971). This approach requires the researcher to combine the multiple data sets, use dummy variables to differentiate between these groups, and apply the GLMM to this combined data set to test the significance levels of the different slopes (interaction effects) across the groups (Rabe-Hesketh and Skrondal 2012). Another possibility is to apply the same GLMM independently to multiple data sets (one for each group) and then to compare the group-specific estimators (or predictors) across groups (Lazar and Zerbe 2011; Liao 2004). As we previously noted, the first procedure sometimes fails to provide valid results within the context of the GLMM, and for this reason, we do not consider it further. Hence, when conducting group comparisons of the estimators (or predictors) of the GLMM, we consider the second procedure in more detail. That is, here and in the following, we assume, among other premises, that the same GLMM is applied independently to multiple data sets proposed for the second setting. However, because inference procedures developed for the single population case usually provide a foundation for the more general case of multiple populations, we briefly consider these tests as well.

Single Population

For the single population case, inference about the fixed and random effects of the GLMM is usually based on the likelihood ratio test (LRT; McCulloch et al. 2008), the score test (Commenges and Jacqmin-Gadda 1997; Commenges, Letenneur, et al. 1994; Commenges, Olson, and Wijsman 1994; Lin 1997), or the Wald statistic (Stroup and Kachman 1994). When introducing the LRT, we can assume that the vector $ξ = (\begin{matrix} β^{'} & θ^{'} \end{matrix})^{'}$ is given. We can also assume that this vector can be partitioned into two components $ξ^{*} = (\begin{matrix} ξ_{1}^{'} & ξ_{2}^{'} \end{matrix})^{'}$ , where $ξ_{1}$ contains all the effects of $ξ$ on which the inference should be made and $ξ_{2}$ includes the remaining effects. We can then define our hypothesis in the form of $H_{0} : ξ_{1} = m$ , where m is a specified value for $ξ_{1}$ , and ${\hat{ξ}}_{2}$ is the solution of $ξ_{2}$ under the restriction that $ξ_{1} = m$ . The LRT is then

Λ = \frac{L (m, {\hat{ξ}}_{2})}{L ({\hat{ξ}}^{*})},

with the likelihood given by

L (y, β, α) = \prod_{i = 1}^{n} l_{i} (y_{i}, β, α),

= \int \prod_{i = 1}^{n} h (y_{i}) exp [{η_{i} T (y_{i}) - A (η_{i})} / ϕ] w (θ) d θ,

and where α contains the parameters of the distribution $w (θ)$ .

Under certain regularity conditions (e.g., the true values of $ξ$ should not be on the boundary of the space; Self and Liang 1987; Stram and Lee 1994, 1995), the large sample distribution of $- 2 log Λ$ under the null hypotheses is $χ^{2}$ with degrees of freedom equal to $d f = r$ , where r is the dimension of $ξ_{1}$ (Wilks 1938). Thus, we can use this statistic to test, for example, the hypothesis $H_{0} : β = 0$ by setting $ξ^{*} = ξ$ , $m = 0$ and $d f = p + 1$ . The limiting distribution of $- 2 log Λ$ for local alternatives (i.e., the distribution for the LRT when the sequence of alternatives ${ξ_{n}^{*}}$ , with $ξ_{n}^{*} = (\begin{matrix} ξ_{1 n}^{'} & ξ_{2}^{*'} \end{matrix})^{'}$ , where $ξ_{i n} = m_{i} + δ_{i n} / \sqrt{n}$ for $i = 1, \dots, r$ and $ξ_{2}^{*}$ is the vector of true values of $ξ_{2}$ , converge to a point of the null hypotheses for $n \to \infty$ ) is noncentral $χ^{2}$ , with noncentrality parameter $λ^{2}$ equal to the limit $n ξ_{1 n}^{'} Σ_{ξ_{1 n}}^{- 1} ξ_{1 n}$ , $d f = r$ and with the asymptotic covariance matrix $Σ_{ξ_{1 n}}$ of the quantity $n^{1 / 2} ({\hat{ξ}}_{1 n} - ξ_{1 n})$ (Davidson and Lever 1970; Feder 1968; Satorra and Saris 1985; Stroud 1972; Wald 1943). This limiting distribution for the LRT can be used, for example, to derive the power for $Λ$ .

The Wald statistic for inference about the fixed and random effects of the GLMM is based on the general linear hypotheses (Hsu 1991; Milliken and Graybill 1970; Olson 1975). As Stroup and Kachman (1994) and McCulloch et al. (2008) have pointed out, anyone using this test must assume that estimable functions $L_{β} β$ or predictable functions $L_{β} β + L_{θ} θ$ exist, where $L_{β}$ and $L_{θ}$ contain the coefficients of these estimable or predictable functions. If we let $L = (\begin{matrix} L_{β} & L_{θ} \end{matrix})$ , then the Wald statistic for the null hypotheses $H_{0} : L ξ = m$ is

W = (L \hat{ξ} - m)^{'} {(L Σ_{*} L^{'})}^{- 1} (L \hat{ξ} - m),

where $Σ_{*}$ is the covariance matrix of $(L \hat{ξ} - m)$ , assuming $(L \hat{ξ} - m) \sim A N (0, Σ_{*})$ . Like the LRT, this statistic is, under certain regularity conditions, approximately $χ^{2}$ -distributed with $d f = r (L)$ when $H_{0} : L ξ = m$ is true. Consequently, when, for example, we assume that all fixed effects in the null hypotheses are zero and $L_{θ} = 0$ , then $m = 0$ and $d f = r (L) = p + 1$ . The limiting distribution of the Wald statistic for local alternatives is noncentral $χ^{2}$ with the same noncentrality parameter $λ^{2}$ as the LRT (Shieh 2005).

Several researchers have proposed the score test within the context of the GLMM to test the variance components given in $Σ_{θ}$ (Commenges and Jacqmin-Gadda 1997; Commenges, Letenneur, et al. 1994; Commenges, Olson, and Wijsman 1994; Lin 1997). However, as Verbeke and Molenberghs (2000, 2003) have shown, these tests may not be the right ones to use when the true parameter values of the variance components are on the boundary of the parameter space, that is, when the variance components (e.g., the diagonal elements of $Σ_{θ}$ ) are restricted to nonnegative values. Consequently, researchers typically need one-sided tests of the null hypothesis $H_{0} : σ_{θ}^{2} = 0$ versus $H_{1} : σ_{θ}^{2} > 0$ . Verbeke and Molenberghs (2003) offer researchers an alternative formula for the score test and a procedure for testing variance components, but because we do not focus in this article on inferences about the random components, we do not consider this alternative test here.

A comparison of the LRT and the Wald test makes clear that both tests are asymptotically equivalent under the null hypotheses $H_{0} : ξ = 0$ (Engle 1984). Surprisingly, although local alternatives to these tests have the same distribution limits, suggesting that the asymptotic power of these tests might be the same, Peers (1971) showed in relation to a class of composite alternative hypotheses that the power function of the likelihood ratio criterion could differ depending on the concrete alternative hypotheses. As such, none of these tests seems to be uniformly superior to the others. In addition, the power of the tests may differ, especially if samples are small. Sutradhar and Bartlett (1993) showed from their simulation study, which included one dependent variable, three predictor variables, and the linear regression model, that in samples where $n \leq 75$ , the score test was uniformly superior over the other two tests with respect to testing multidimensional composite hypotheses. When the authors tested simple as well as one-dimensional composite hypotheses (assuming one dependent variable, four probability distributions for this dependent variable and up to two parameters, and with sample sizes of $n \leq 30$ ), the authors found that the superiority of a test was denoted by a systematic pattern: If the score test was more (size adjusted) powerful in the lower side of the alternative space, then the Wald test was more powerful in the upper side, and the LRT always occupied the second position. However, the authors also found that the power differences between the tests seemed to dissipate when the samples were large. Bush (2015) compared all three tests in a simulation study that assumed a logistic regression with random effects. He found no power differences between the three tests for composite hypotheses concerning the fixed effects and a sample size of at least $n = 127$ . Unfortunately, Bush (2015) did not consider sample sizes smaller than $n = 127$ . Further research directed toward comparing the behavior of these tests in terms of their power seems necessary and especially so in regard to small samples and in applications to the effects of the GLMM.

Irrespective of the power of the tests, we could argue that the LRT and the score test are generally superior to the Wald test because the confidence region based on these tests is exactly invariant under reparameterizations (Cox 1988). That said, using the Wald statistic does have utility because, unlike the LRT, researchers have to calculate the likelihood once only. The Wald test is therefore useful when a study requires the researcher to fit many models and to test a null hypothesis for every model. In general, then, consideration of both tests seems useful for drawing inferences from the parameters of the GLMM.

Multiple Populations

Here and in the following, assume that we have independently sampled $i = 1, \dots, n_{g}$ observations from $g = 1, \dots, G$ mutually independent exclusive populations. Assume also that we have observed comparable measurements on a dependent variable y _g , fixed predictor variables X _g , and random predictor variables Z _g . Now assume that we have applied the GLMM independently to each data set $D_{g} = {y_{g}, X_{g}, Z_{g}}$ and that we have used the same distributional family $f (y_{i g} | θ_{g})$ , link function $g_{g} (\cdot)$ , and distributional family $w_{g} (θ_{g})$ in each group g. Our next step, given the vectors $ξ_{g} = (\begin{matrix} β_{g}^{'} & θ_{g}^{'} \end{matrix})^{'}$ with dimension $u \times 1$ ( $u = p + 1 + t M$ ), is to test hypotheses of the form $H_{0} : L_{⋄} (ξ_{+} - ξ_{*}) = m$ with the $(G - 1) u \times 1$ vectors $ξ_{+} = (\begin{matrix} ξ_{1}^{'}, \dots, ξ_{1}^{'} \end{matrix})^{'}$ and $ξ_{*} = (\begin{matrix} ξ_{2}^{'}, \dots, ξ_{G}^{'} \end{matrix})^{'}$ .³ By properly specifying $L_{⋄}$ and m , we can formulate different hypotheses. For example, if I and 0 are in their proper order, then $L_{⋄} = (\begin{matrix} I & 0 \end{matrix})$ , which results in the hypothesis $H_{0} : ξ_{1} - ξ_{2} = m$ , and if $L_{⋄} = I$ ( I of order $(G - 1) u \times (G - 1) u$ ) and $m = 0$ , the hypothesis is $H_{0} : ξ_{1} = ξ_{2} = \dots = ξ_{G}$ .

The likelihood for the hypothesis $H_{0} : ξ_{1} = ξ_{2} = \dots = ξ_{G}$ can be expressed as $L_{R} (y, ξ, α) = \prod_{g = 1}^{G} l_{g} (y_{g}, ξ, α)$ , while the likelihood for the unrestricted case, that is, when the vectors $ξ_{g}$ are group specific, is $L_{U} (y, ξ, α) = \prod_{g = 1}^{G} l_{g} (y_{g}, ξ_{g}, α)$ . To test for parameter equality across groups, we can use the LRT by taking the ratio of these quantities. However, use of this test implies the need to estimate the restricted model and the unrestricted models. In addition, and depending on the estimation algorithm, the number of random effects, and the sample sizes, in particular, estimating L_R within the context of the GLMM can pose a considerable challenge, even with today’s superfast personal computers (Tuerlinckx et al. 2006). Test statistics that do not require the calculation of L_R may be an attractive alternative, and it is these tests that we consider here.

Lazar and Zerbe (2011) recommend use of the following F-statistic for testing the null hypothesis $H_{0} : L_{⋄} ξ_{d} = 0$ with $ξ_{d} = ξ_{+} - ξ_{*}$

F = \frac{(L_{⋄} {\hat{ξ}}_{d})^{'} (L_{⋄} {\hat{Σ}}_{d} {L^{'}}_{⋄}) (L_{⋄} {\hat{ξ}}_{d})}{r (L_{⋄})},

where $r (L_{⋄}) = s$ is the rank of $L_{⋄}$ and ${\hat{Σ}}_{d}$ is the empirical covariance matrix of $ξ_{d}$ . The authors borrowed this test from Littell et al. (2006), who proposed using it for drawing inferences about the fixed and random effects of the LMM. For an application of this test to the more general case of $H_{0} : L_{⋄} ξ_{d} = m$ , see Wendt, Kasper, and Trendtel (2017). The test that Littell et al. (2006) proposed assumes that the residuals e (given the LMM $y = X β + Z θ + e$ ) are normally distributed. The derivation of the F-statistic under this assumption is then straightforward. However, this assumption is generally not valid for the GLMM, and Lazar and Zerbe (2011) provided no evidence to justify the application of the F-test for residuals that are not normally distributed, nor did they provide information about the power of this statistic. Thus, almost nothing is known about the behavior of F when it is used in the context of the GLMM.

Another approach that avoids calculating L_R is the test statistic introduced by Liao (2004). Here, if we have the $(p + 1) \times 1$ fixed-effects vector $β_{g}$ with covariance matrix $Σ_{β_{g}}$ from the GLM (McCullagh and Nelder 1989; Nelder and Wedderburn 1972), the proposed Wald statistic is

W_{L} = ({\hat{β}}_{+} - {\hat{β}}_{*})' {\hat{Σ}}_{\hat{β}}^{-} ({\hat{β}}_{+} - {\hat{β}}_{*}),

with degrees of freedom equal to b ( $b = (p + 1) \times (G - 1)$ ), the vectors $β_{+} = (\begin{matrix} β_{1}^{'}, \dots, β_{1}^{'} \end{matrix})^{'}$ and $β_{*} = (\begin{matrix} β_{2}^{'}, \dots, β_{G}^{'} \end{matrix})^{'}$ , and the empirical covariance matrix ${\hat{Σ}}_{\hat{β}}^{-}$ with expectation

E ({\hat{Σ}}_{\hat{β}}) = Σ_{β},

= (\begin{matrix} Σ_{β_{1}} + Σ_{β_{2}} & Σ_{β_{1}} & \dots & Σ_{β_{1}} \\ Σ_{β_{1}} & Σ_{β_{1}} + Σ_{β_{3}} & \dots & Σ_{β_{1}} \\ ⋮ & ⋱ & ⋮ \\ Σ_{β_{1}} & Σ_{β_{1}} & \dots & Σ_{β_{1}} + Σ_{β_{G}} \end{matrix}) .

It is clear from equation (10) that Liao’s (2004) test can only be applied to multiple independent populations because otherwise the covariance parameters representing the different populations would have to be included in this covariance matrix. Although Liao (2004) presented examples of applications of this statistic, he did not provide a theoretical derivation for W_L , nor did he provide information about the power of W_L . In addition, the statistic in its current form is available only for the fixed effects of the GLM, and there might be situations where testing the random effects in the GLMM is also desirable. For example, in longitudinal studies of organizations, organizational trends are of interest (Hochweber and Hartig 2017). In this instance, a random sample of organizations (e.g., schools) is usually selected and then (clusters) samples of individuals (e.g., students) within these organizations are assessed at different time points (Feldman and McKinlay 1994). In cross-sectional designs, the samples are selected independently within each cluster at each time point (Feldman and McKinlay 1994:62), which means that the samples of individuals across clusters and time points can be considered as multiple independent populations, given the organizations. A common statistical approach for analyzing the resulting data (e.g., student’s test scores) is the random intercept model, a special case of the GLMM, as shown above. In this context, the random intercept effects can be considered as the cluster-specific means of the organizations. Thus, when the model is applied independently to each time point data set, $θ_{0 M t}$ reflects the cluster-specific mean of organization M at time point t. Hence, a comparison of $θ_{0 M t}$ across time points presents as an organizational trend measure. Obviously, one possibility for testing this trend measure statistically is to expand the testing procedure introduced by Liao (2004) by the random effects of the GLMM. Moreover, the only hypotheses that can be tested with Liao’s (2004) approach are those for the form $H_{0} : (β_{+} - β_{*}) = 0$ . Liao (2004) furthermore did not consider the more general case of the GLMM and of $H_{0} : L_{⋄} (ξ_{+} - ξ_{*}) = m$ , which is why we considered it necessary to introduce an extended version of W_L .

Our extended version of W_L , which includes the random effects, is applicable within the context of the GLMM, and allows testing of hypotheses of the form $H_{0} : L_{⋄} ξ_{d} = m$ (with $ξ_{d} = ξ_{+} - ξ_{*}$ ), can be formulated as

W_{G} = (L_{⋄} {\hat{ξ}}_{d} - m)^{'} {(L_{⋄} {\hat{Σ}}_{{\hat{ξ}}_{d}} {L^{'}}_{⋄})}^{- 1} (L_{⋄} {\hat{ξ}}_{d} - m),

with empirical covariance matrix ${\hat{Σ}}_{{\hat{ξ}}_{d}}$ , such that

E ({\hat{Σ}}_{{\hat{ξ}}_{d}}) = Σ_{ξ_{d}},

= (\begin{matrix} Σ_{ξ_{1}} + Σ_{ξ_{2}} & Σ_{ξ_{1}} & \dots & Σ_{ξ_{1}} \\ Σ_{ξ_{1}} & Σ_{ξ_{1}} + Σ_{ξ_{3}} & \dots & Σ_{ξ_{1}} \\ ⋮ & ⋱ & ⋮ \\ Σ_{ξ_{1}} & Σ_{ξ_{1}} & \dots & Σ_{ξ_{1}} + Σ_{ξ_{G}} \end{matrix}) .

Like Liao’s (2004) statistic, this test can also only be applied to multiple independent populations. For a derivation of this statistic, see Online Appendix A (which can be found at https://journals.sagepub.com/doi/suppl/10.1177/0049124120986182). When $H_{0} : L_{⋄} ξ_{d} = m$ is true, the Wald statistic W_G is asymptotically $χ^{2}$ -distributed with $d f = r (L_{⋄})$ . By properly specifying $L_{⋄}$ and m , we can use this statistic to test different hypotheses. For example, given two fixed and random effects for three groups, we can test the hypothesis $H_{0} : β_{1} = β_{2} = β_{3}$ with $m = 0$ and

L_{⋄} = (\begin{matrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \end{matrix}) .

Thus, the W_L statistic is, in reality, a special case of W_G , and it is obvious that for $H_{0} : L_{⋄} ξ_{d} = 0$ the numerator of F and W_G is equal. We can therefore also see F as a special case of W_G . However, if we want to compare only the fixed effects across groups, that is, if we want to test hypotheses of the form $H_{0} : L_{⋄} β_{d} = m$ with $β_{d} = β_{+} - β_{*}$ and $β_{+} = (\begin{matrix} β_{1}^{'}, \dots, β_{1}^{'} \end{matrix})^{'}$ as well as $β_{*} = (\begin{matrix} β_{2}^{'}, \dots, β_{G}^{'} \end{matrix})^{'}$ , then W_G can be reduced to

W_{B} = (L_{⋄} {\hat{β}}_{d} - m)^{'} {(L_{⋄} {\hat{Σ}}_{{\hat{β}}_{d}} {L^{'}}_{⋄})}^{- 1} (L_{⋄} {\hat{β}}_{d} - m),

where W_B is asymptotically centrally $χ^{2}$ distributed with $d f = r (L_{⋄})$ if $H_{0}$ is true (see Online Appendix A, which can be found at https://journals.sagepub.com/doi/suppl/10.1177/0049124120986182). Hence, comparisons of the fixed effects $β_{g}$ across groups are possible even when the random effects vectors $θ_{g}$ are not of the same order.⁴ The results of the power analyses that we performed for these new tests statistics appear in the next section. We also provide examples of applications of the new test statistics.

Power Analyses

Design of the Power Study

During our power analyses, we examined two link functions and two conditional distributions $f (y_{i g} | θ_{g})$ . For the first setting, we assumed a normal distribution for the conditional density of $y_{i g}$ together with the identity link function (i.e., we assumed an LMM), and for the second setting, we assumed a Bernoulli distribution together with the logistic link function (i.e., we assumed a logistic regression analysis with random effects). We also assumed a three-stage cluster sampling design where observations $k = 1, \dots, K_{g}$ were randomly sampled at Stage 1 (Level 3), observations $j = 1, \dots, J_{k_{g}}$ were randomly sampled at Stage 2 (Level 2), and observations $i = 1, \dots, n_{k j_{g}}$ were randomly sampled at Stage 3 (Level 1). During the simulation, we assumed five different values for $K_{g} = \{10, 20, 40, 80, 100\}$ , three different values for $J_{k_{g}} = \{1, 2, 3\}$ , and four different values for $n_{k j_{g}} = \{6, 10, 20, 30\}$ , which meant we simulated a total of $5 \times 3 \times 4 = 60$ sample size combinations for each g. Our design thus corresponded to the typical structures for educational achievement data sets, where, for example, schools are sampled at Stage 1, classes within selected schools are sampled at Stage 2, and students within selected classes are sampled at Stage 3. Throughout the simulation study, we assumed that $θ_{g}$ was multivariate normally distributed with $θ_{g} \sim N (0, Σ_{θ_{g}})$ . We also assumed that the design matrix X_g and that the matrices Z_g and $Σ_{θ_{g}}$ as well as (in the case of the LMM) the residual variance matrix $R_{g} = σ_{ε_{g}} I_{N}$ were given. We further assumed that $K_{g} = K_{g^{'}}$ , $J_{k_{g}} = J_{k_{g^{'}}}$ , and $n_{k j_{g}} = n_{k j_{g^{'}}}$ for all $g, g^{'} = 1, \dots,4$ , and that $Σ_{θ_{g}} = Σ_{θ}$ , $X_{g} = X$ , $Z_{g} = Z$ , and $R_{g} = R$ (for details about $Σ_{θ}$ , X , Z , and R , see Online Appendix B, which can be found at https://journals.sagepub.com/doi/suppl/10.1177/0049124120986182).

In total, we assumed three fixed effects (one intercept and two slope parameters) and two random effects (one random intercept effect due to Level 3 and one random intercept effect due to Level 2). Thus:

β_{g} = (\begin{matrix} β_{0 g} & β_{1 g} & β_{2 g} \end{matrix})^{'} and θ_{g} = (\begin{matrix} θ_{01 g} & \dots & θ_{0 K g} & θ_{11 g} & \dots & θ_{1 J_{K} g} \end{matrix})^{'},

where the order of $θ_{g}$ depends on K and J_K . We fixed the hypotheses for the power analyses to $H_{0} : β_{1} = β_{2} = β_{3} = β_{4}$ , which implies, among other considerations, that $m = 0$ and that $r (L_{⋄}) = 9$ . The hypotheses also imply that W_G and W_B are equal. As a consequence, we had to investigate only the behavior of W_G during our power analyses. For comparative purposes, we also studied the behavior of F, even though we did not explicitly develop our study design for this test.

The power of the test W_G can be expressed as

1 - β = P (χ_{d f, λ}^{2} > χ_{d f, α}^{2}),

where $χ_{d f, λ}^{2}$ follows a noncentral $χ^{2}$ distribution with $d f$ degrees of freedom and noncentrality parameter $λ$ , and where $χ_{d f, α}^{2}$ is the upper $α$ percentile of a central $χ^{2}$ distribution with $d f$ degrees of freedom. In the simulation study, we fixed $α$ to $α = 0.05$ and assumed five different values for $1 - β$ ( $0.1, 0.3, 0.6, 0.9, 0.95$ ). Table 1 depicts the noncentrality parameters for these combinations, and $d f = r (L_{⋄}) = 9$ .

Table 1.

Noncentrality Parameter for $χ_{d f, λ}^{2}$ when $d f = 9$ .

1 − β	λ
.10	1.41
.30	5.34
.60	10.71
.90	19.85
.95	23.59

Because our hypotheses related only to the fixed effects, we chose random effects so that $θ_{d} = 0$ , which meant the random effects did not influence the power of our study. The values for $β_{d}$ , however, did influence the noncentrality parameters. For a given power (and fixed $Σ_{θ}$ , as well as R ), the values of $β_{d}$ , and hence $β_{g}$ , depended on $α$ , K, J, and n, whereas for the fixed values of $α$ , K, J, n, $Σ_{θ}$ , and R , the power of W_G depended on the magnitude of $β_{d}$ . Table 2 provides an example for $β_{g}$ , and hence of $β_{d}$ , when $Σ_{θ}$ and R are fixed to the value of the simulation design (see Online Appendix B, which can be found at https://journals.sagepub.com/doi/suppl/10.1177/0049124120986182), $α = 0.05$ , $λ = 23.59$ , $K = 100$ , $J = 3$ , and $n = 30$ .⁵ Of course, many solutions (possibly an infinite number) additional to the parameter values that we used in our study will result in the desired noncentrality parameters, but for a given power of the tests and significance level, we considered these solutions to be equivalent. We therefore used the vectors $β_{g}$ in Table 2, and the remaining parameter values in the syntax, for our next complement of analyses.

Table 2.

Values for $β_{g}$ when $α = 0.05$ , $λ = 23.59$ , $K = 100$ , $J = 3$ , and $n = 30$ .

$β_{g}$	g
$β_{g}$	1^a	2	3	4
$β_{0}$	10	6.30	6.85	7.70
$β_{1}$	7	6.92	6.93	6.95
$β_{2}$	4	3.96	3.97	3.98

^a Regardless of $α$ , $λ$ , K, J, and n, the values of $β_{1}$ are fixed to $β_{1} = (\begin{matrix} 10 & 7 & 4 \end{matrix})^{'}$ .

For given values of sample sizes (i.e., values of K, J, and n), X , Z , $β_{g}$ , $Σ_{θ}$ , and (in the case of the LMM) R , we produced values for y _g that followed the desired model. For the LMM, we formulated this model as $y_{g} = X β_{g} + Z θ_{g} + e_{g}$ , with $e \sim N (0, 1)$ and $θ_{g} \sim N (0, Σ_{θ})$ . For the binary logistic regression model with random effects, we formulated the model as $y_{g, *} = g^{- 1} (X β_{g} + Z θ_{g})$ , where $g^{- 1}$ is the logistic link function. We produced binary responses for y _g by applying $y_{g, *}$ as parameters to the Bernoulli distribution and took a random sample from this distribution as y_g . By repeatedly sampling T from $θ_{g} \sim N (0, Σ_{θ})$ , $e \sim N (0, 1)$ (in the case of the LMM) and from the Bernoulli distribution (in the case of the GLMM), we observed T sample values for y _g . We analyzed each sample value in accordance with the specified (G)LMM and then, using the estimates for $β_{d}$ and $Σ_{θ}$ , the W_G (F) statistic, we applied it under the assumed null hypotheses. The proportion of significant W_G (F) values, that is, the number of significant W_G (F) values compared to T, thus became our estimate for the power of the W_G (F) statistic. During the simulation study, we set $T = 1, 000$ , and we used SAS/STAT® Version 14.1 and SAS/IML® Version 14.1 of the SAS System for Windows⁶ (SAS Institute Inc. 2015a, 2015b) to implement our simulation design.

Evaluation Criteria

To evaluate our implemented simulation design, we examined different difference measures. These measures basically compare the difference between the given values (i.e., the input values) for the fixed effects, the random components, and (in the case of the LMM) the residual variance with the estimated values (i.e., the output values) of these quantities. For the fixed effects, we defined the matrix ${\hat{B}}_{d} = (B - \hat{\bar{B}})$ with $B = (\begin{matrix} β_{1} & β_{2} & β_{3} & β_{4} \end{matrix})$ , $\hat{\bar{B}} = (\begin{matrix} {\hat{\bar{β}}}_{1} & {\hat{\bar{β}}}_{2} & {\hat{\bar{β}}}_{3} & {\hat{\bar{β}}}_{4} \end{matrix})$ , ${\hat{\bar{β}}}_{g} = \sum_{t = 1}^{T} {\hat{β}}_{g, t} / T$ , and elements ${\hat{β}}_{d, v g} = β_{v, g} - {\hat{\bar{β}}}_{v, g}$ ( $v = 0, 1, 2$ ). From this matrix, we calculated the row average ${\bar{b}}_{d, v .} = \sum {\hat{β}}_{d, v g} / G$ and row variance ${\hat{σ}}_{d, v .}^{2} = var ({\hat{β}}_{d, v g})$ as well as the total average

{\bar{b}}_{d,..} = 1^{T} B_{d} 1 / G * 3,

and total variance ${\hat{σ}}_{d,..}^{2} = var (vec (B))$ , with $vec (B)$ the vectorization of the matrix $B$ , that is, ${\bar{b}}_{d,..}$ is the average difference between the vectors $β_{g}$ and ${\hat{β}}_{g}$ (where the average is taken over G, K, J, n, and v) and ${\hat{σ}}_{d,..}^{2}$ is the corresponding variance. We also calculated the absolute values $| {\hat{β}}_{d, v g} |$ to obtain the minimum and maximum absolute difference between the simulated elements $β_{v, g}$ and the estimated component ${\hat{β}}_{v, g}$ .

For the random components, we defined ${\hat{σ}}_{θ_{w, g}}^{2} = \sum_{t = 1}^{T} {\hat{σ}}_{θ_{w, g, t}}^{2} / T$ and ${\hat{σ}}_{d, w g}^{2} = σ_{θ_{w}}^{2} - {\hat{σ}}_{θ_{w, g}}^{2}$ , where ${\hat{σ}}_{θ_{w, g, t}}^{2}$ is the estimate of the variance component w ( $w = 0, 1$ ) for group g in replication t and $σ_{θ_{w}}^{2}$ is the given value for that component. Remember that we assumed equal random components across groups g. Using these definitions as our reference, we calculated ${\bar{d}}_{d, w .} = {\hat{σ}}_{d, w g}^{2} / G$ and ${\hat{σ}}_{d, w .}^{2} = var ({\hat{σ}}_{d, w g}^{2})$ , where ${\hat{σ}}_{d, w .}^{2}$ is the variance of ${\hat{σ}}_{d, w g}^{2}$ across groups g. We also calculated the total average

{\bar{d}}_{d,..} = \sum^{W} \sum^{G} \sum^{T} \frac{(σ_{θ_{w}}^{2} - {\hat{σ}}_{θ_{w, g, t}}^{2})}{T * G * W},

and the corresponding total variance ${\hat{σ}}_{d,..}^{2} = var ({\hat{σ}}_{d, w g t}^{2})$ , where ${\bar{d}}_{d,..}$ is the average difference between $σ_{θ}^{2}$ and ${\hat{σ}}_{θ}^{2}$ and ${\hat{σ}}_{d,..}^{2}$ is the variance of $σ_{θ_{w}}^{2} - {\hat{σ}}_{θ_{w, g, t}}^{2}$ across G, K, J, and w. In addition, we used absolute values $| {\hat{σ}}_{d, w g}^{2} |$ to present the minimum and maximum absolute difference between the simulated elements $σ_{θ_{w}}^{2}$ and the estimated components ${\hat{σ}}_{θ_{w, g}}^{2}$ .

For the LMM, we also defined difference measures for the residual term. Given the simulated residual variance $σ_{e}^{2}$ (and again remember that we assumed equal residual variance across groups g) and the estimator of that variance ${\hat{σ}}_{e_{g, t}}^{2}$ in group g and replication t, we defined ${\hat{σ}}_{θ_{e, g}}^{2} = \sum_{t = 1}^{T} {\hat{σ}}_{e_{g, t}}^{2} / T$ and ${\hat{σ}}_{d, e g}^{2} = σ_{e}^{2} - {\hat{σ}}_{θ_{e, g}}^{2}$ . In accordance with this definition, we calculated ${\bar{e}}_{d, e .} = {\hat{σ}}_{d, e g}^{2} / G$ and ${\hat{σ}}_{d, e .}^{2} = var ({\hat{σ}}_{d, e g}^{2})$ , where ${\hat{σ}}_{d, e .}^{2}$ is the variance of ${\hat{σ}}_{d, e g}^{2}$ across groups g. We also calculated the total average

{\bar{e}}_{d,..} = \sum^{G} \sum^{T} \frac{(σ_{e}^{2} - {\hat{σ}}_{e_{g, t}}^{2})}{T * G},

and the corresponding total variance ${\hat{σ}}_{d,..}^{2} = var ({\hat{σ}}_{d, e g t}^{2})$ . The absolute values $| {\hat{σ}}_{d, e g}^{2} |$ allowed us to present the minimum and maximum absolute difference between the simulated residual variance $σ_{e}^{2}$ and the estimated residual variance ${\hat{σ}}_{θ_{e, g}}^{2}$ . If our implementation of the simulation design was accurate, then all evaluation criteria should have reached a value of zero. In addition to establishing the evaluation criteria, we recorded the convergence rate for the LMM and GLMM.

Results

Evaluation criteria

We obtained $| α | \times | λ | \times | K | \times | J | \times | n | = 300$ values for the convergence rate of the LMM and an additional 300 values for the GLMM. The convergence rate for the LMM was always 100 percent (in other words, no convergence problems occurred), and all $T = 1, 000$ replication results per combination could be used to estimate the power of W_G and F. For the GLMM, the average convergence rate across the 300 combinations was 41.26 percent, the variance of the convergence rate across combinations was 0.15, and the minimum convergence rate was 0.1 percent. The minimum convergence rate of 0.1 percent implied that at least one replication per combination was successful. However, due to the low convergence rate for some sample size combinations, readers should view the overall estimates of the power of W_G and F for the GLMM with caution. The following results are based solely on the successful replications per combination.

For the fixed effects criteria ${\bar{b}}_{d,..}$ , ${\hat{σ}}_{d,..}^{2}$ , $min ({\hat{β}}_{d, v g})$ , and $max ({\hat{β}}_{d, v g})$ , we obtained 300 values for the LMM and an additional 300 values for the GLMM. The averages of these values across the 300 combinations for the LMM were ${\bar{\bar{b}}}_{d,..} = 0.00$ and ${\hat{\bar{σ}}}_{d,..}^{2} = 0.00$ , while the largest value for the minimum as well as the maximum were $min ({\hat{β}}_{d, v g}) = - 0.09$ and $max ({\hat{β}}_{d, v g}) = 0.10$ . The corresponding values for the GLMM were ${\bar{\bar{b}}}_{d,..} = 2.85$ , ${\hat{\bar{σ}}}_{d,..}^{2} = 0.72$ , $min ({\hat{β}}_{d, v g}) = 1.13$ , and $max ({\hat{β}}_{d, v g}) = 4.82$ . Like the measures for the different sample size combinations (e.g., see Table 3), these measures were zero or close to zero. We therefore considered our implementation of the simulation study with respect to the fixed effects $β_{g}$ as accurate.

Table 3.

Values for ${\bar{b}}_{d, v .}$ , ${\hat{σ}}_{d, v .}^{2}$ , ${\bar{b}}_{d,..}$ , ${\hat{σ}}_{d,..}^{2}$ , $min (| {\hat{β}}_{d, v g} |)$ , and $max (| {\hat{β}}_{d, v g} |)$ when $α = 0.05$ and $λ = 1.41$ (Linear Mixed Model).

K	J	n	${\bar{b}}_{d,0.}$	${\hat{σ}}_{d,0.}^{2}$	${\bar{b}}_{d,1.}$	${\hat{σ}}_{d,1.}^{2}$	${\bar{b}}_{d,2.}$	${\hat{σ}}_{d,2.}^{2}$	${\bar{b}}_{d,..}$	${\hat{σ}}_{d,..}^{2}$	$min (\| {\hat{β}}_{d, v g} \|)$	$max (\| {\hat{β}}_{d, v g} \|)$
10	1	6	−.14	.00	.03	.00	.00	.00	−.04	.01	.00	.14
		10	.00	.00	.02	.00	.00	.00	.01	.00	.01	.03
		20	−.01	.00	.00	.00	.00	.00	.00	.00	.00	.01
		30	−.04	.00	.00	.00	.00	.00	−.01	.00	.00	.04
	2	6	.03	.00	.02	.00	.00	.00	.02	.00	.00	.04
		10	.00	.00	−.01	.00	.00	.00	−.01	.00	.00	.02
		20	.06	.00	.00	.00	.00	.00	.02	.00	.00	.06
		30	−.16	.00	.00	.00	.00	.00	−.05	.01	.00	.16
	3	6	−.25	.00	.00	.00	.01	.00	−.08	.02	.01	.26
		10	−.07	.00	.00	.00	.00	.00	−.02	.00	.00	.07
		20	.00	.00	.00	.00	.00	.00	.00	.00	.00	.01
		30	−.03	.00	.00	.00	.00	.00	−.01	.00	.00	.04
20	1	6	−.14	.00	.02	.00	−.01	.00	−.04	.01	.00	.15
		10	.02	.00	−.01	.00	.00	.00	.00	.00	.00	.03
		20	.09	.00	.00	.00	.00	.00	.03	.00	.00	.09
		30	.00	.00	.00	.00	.00	.00	.00	.00	.00	.01
	2	6	.06	.00	−.01	.00	.00	.00	.02	.00	.00	.07
		10	−.01	.00	.00	.00	.00	.00	.00	.00	.01	.01
		20	−.08	.00	.00	.00	.00	.00	−.03	.00	.00	.08
		30	−.04	.00	.00	.00	.00	.00	−.01	.00	.00	.04
	3	6	.08	.00	.00	.00	.00	.00	.02	.00	.00	.08
		10	−.04	.00	.00	.00	.00	.00	−.02	.00	.00	.04
		20	.04	.00	.00	.00	.00	.00	−.01	.00	.00	.04
		30	.05	.00	.00	.00	.00	.00	.02	.00	.00	.05
40	1	6	−.02	.00	.01	.00	.00	.00	.00	.00	.00	.02
		10	.07	.00	.00	.00	.00	.00	.02	.00	.00	.07
		20	−.01	.00	.00	.00	.00	.00	.00	.00	.00	.01
		30	−.02	.00	.00	.00	.00	.00	−.01	.00	.00	.02
	2	6	.04	.00	.00	.00	.00	.00	.02	.00	.00	.05
		10	.00	.00	.00	.00	.00	.00	.00	.00	.00	.01
		20	.05	.00	.00	.00	.00	.00	.02	.00	.00	.05
		30	.05	.00	.00	.00	.00	.00	.02	.00	.00	.06
	3	6	.06	.00	.00	.00	.00	.00	.02	.00	.00	.06
		10	.06	.00	.00	.00	.00	.00	.02	.00	.00	.06
		20	.00	.00	.00	.00	.00	.00	.00	.00	.00	.00
		30	.06	.00	.00	.00	.00	.00	.02	.00	.00	.06
80	1	6	.01	.00	.00	.00	.00	.00	.00	.00	.00	.01
		10	−.10	.00	.00	.00	.00	.00	−.03	.00	.00	.10
		20	−.03	.00	.00	.00	.00	.00	−.01	.00	.00	.03
		30	.03	.00	.00	.00	.00	.00	.01	.00	.00	.03
	2	6	.02	.00	.00	.00	.00	.00	.01	.00	.00	.02
		10	.01	.00	.00	.00	.00	.00	.00	.00	.00	.02
		20	.04	.00	.00	.00	.00	.00	.01	.00	.00	.04
		30	.00	.00	.00	.00	.00	.00	.02	.00	.00	.00
	3	6	−.05	.00	.00	.00	.00	.00	−.02	.00	.00	.05
		10	.02	.00	.00	.00	.00	.00	.01	.00	.00	.02
		20	.03	.00	.00	.00	.00	.00	.01	.00	.00	.03
		30	.02	.00	.00	.00	.00	.00	.01	.00	.00	.02
100	1	6	.01	.00	.00	.00	.00	.00	.00	.00	.00	.01
		10	.02	.00	.00	.00	.00	.00	.00	.00	.00	.02
		20	−.03	.00	.00	.00	.00	.00	−.01	.00	.00	.03
		30	.00	.00	.00	.00	.00	.00	.00	.00	.00	.00
	2	6	.00	.00	.00	.00	.00	.00	.00	.00	.00	.01
		10	.01	.00	.00	.00	.00	.00	.00	.00	.00	.01
		20	.04	.00	.00	.00	.00	.00	.01	.00	.00	.04
		30	−.02	.00	.00	.00	.00	.00	−.01	.00	.00	.02
	3	6	−.03	.00	.00	.00	.00	.00	−.01	.00	.00	.03
		10	.00	.00	.00	.00	.00	.00	.00	.00	.00	.00
		20	−.04	.00	.00	.00	.00	.00	−.01	.00	.00	.04
		30	.07	.00	.00	.00	.00	.00	.02	.00	.00	.07

In regard to the evaluation criteria for the random components, we obtained $| α | \times | λ | \times | K | \times | J | = 75$ values for ${\bar{d}}_{d,..}$ , ${\hat{σ}}_{d,..}^{2}$ , $min (| {\hat{σ}}_{d, w g}^{2} |)$ , and $max (| {\hat{σ}}_{d, w g}^{2} |)$ for the LMM, and an additional 75 values for the GLMM. The average of these values across the 75 combinations for the LMM was ${\bar{\bar{d}}}_{d,..} = 0.01$ and ${\hat{\bar{σ}}}_{d,..}^{2} = 0.04$ ; the largest values for the minimum and the maximum were $min ({\hat{σ}}_{d, w g}^{2}) = - 0.50$ and $max ({\hat{σ}}_{d, w g}^{2}) = 0.71$ . The corresponding values for the GLMM were ${\bar{\bar{d}}}_{d,..} = 44.98$ and ${\hat{\bar{σ}}}_{d,..}^{2} = 5.05$ , $min ({\hat{σ}}_{d, w g}^{2}) = 39.19$ , and $max ({\hat{σ}}_{d, w g}^{2}) = 49.56$ . Although these values, like the measures for the different sample size combinations (e.g., see Table 4), were somewhat higher than the corresponding values for the fixed effects, they were still very small (LMM) or moderately small (GLMM) and hence we considered our implementation of the simulation study with respect to the random components $σ_{θ_{w}}^{2}$ to be accurate. Please note that our preliminary analyses, not presented here, suggested that as T increased, these measures tended toward zero.

Table 4.

Values for ${\bar{d}}_{d, w .}$ , ${\hat{σ}}_{d, w .}^{2}$ , ${\bar{d}}_{d,..}$ , ${\hat{σ}}_{d,..}^{2}$ , $min (| {\hat{σ}}_{d, w g}^{2} |)$ , and $max (| {\hat{σ}}_{d, w g}^{2} |)$ when $α = 0.05$ and $λ = 1.41$ (Linear Mixed Model).

K	J	${\bar{d}}_{d,0.}$	${\hat{σ}}_{d,0.}^{2}$	${\bar{d}}_{d,1.}$	${\hat{σ}}_{d,1.}^{2}$	${\bar{d}}_{d,..}$	${\hat{σ}}_{d,..}^{2}$	$min (\| {\hat{σ}}_{d, w g}^{2} \|)$	$max (\| {\hat{σ}}_{d, w g}^{2} \|)$
10	1	5.23	.01	−5.95	.01	−.36	35.74	5.15	6.07
	2	0.70	.00	0.03	.00	.37	0.13	0.02	0.71
	3	0.51	.00	−0.02	.00	.24	0.08	0.01	0.52
20	1	3.00	.00	−3.21	.00	−.11	11.00	2.91	3.25
	2	0.18	.00	−0.03	.00	.07	0.01	0.02	0.20
	3	0.71	.00	−0.02	.00	.34	0.15	0.02	0.71
40	1	1.37	.00	−1.51	.00	−.07	2.38	1.32	1.59
	2	0.62	.00	0.00	.00	.31	0.11	0.00	0.63
	3	0.48	.00	−0.04	.00	.22	0.08	0.03	0.50
80	1	0.82	.00	−0.19	.00	.32	0.29	0.13	0.86
	2	0.31	.00	−0.02	.00	.14	0.03	0.02	0.32
	3	0.19	.00	0.02	.00	.11	0.01	0.02	0.20
100	1	0.48	.00	−0.11	.00	.18	0.10	0.09	0.52
	2	0.07	.00	−0.02	.00	.03	0.00	0.01	0.08
	3	0.19	.00	−0.02	.00	.08	0.01	0.02	0.19

For the evaluation criteria of the residual variance, we obtained 300 values for the LMM. The average values of these criteria over these combinations for the LMM were ${\bar{\bar{e}}}_{d,..} = 0.00$ and ${\hat{\bar{σ}}}_{d,..}^{2} = 0.00$ , with the largest values for the minimum and the maximum being $min ({\hat{σ}}_{d, e g}^{2}) = - 0.01$ and $max ({\hat{σ}}_{d, e g}^{2}) = 0.01$ . Because these values, as well as the values for each combination (e.g., see Table 5), were all close to zero, we considered the implementation of our simulation study with respect to $σ_{e}^{2}$ (and thus in total) to be accurate.

Table 5.

Values for ${\bar{e}}_{d,..}$ , ${\hat{σ}}_{d,..}^{2}$ , $min (| {\hat{σ}}_{d, e g}^{2} |)$ , and $max (| {\hat{σ}}_{d, e g}^{2} |)$ when $α = 0.05$ and $λ = 1.41$ (Linear Mixed Model).

K	J	n	${\bar{e}}_{d,..}$	${\hat{σ}}_{d,..}^{2}$	$min (\| {\hat{σ}}_{d, e g}^{2} \|)$	$max (\| {\hat{σ}}_{d, e g}^{2} \|)$
10	1	6	.01	.00	.00	.01
		10	.00	.00	.00	.01
		20	.00	.00	.00	.01
		30	.00	.00	.00	.00
	2	6	.00	.00	.00	.01
		10	.00	.00	.00	.00
		20	.00	.00	.00	.01
		30	.00	.00	.00	.00
	3	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
20	1	6	.00	.00	.00	.01
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	2	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	3	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
40	1	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	2	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	3	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
80	1	6	.00	.00	.00	.01
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	2	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	3	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
100	1	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	2	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	3	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00

Power of W_G and F

The average proportion of significant W_G values for the LMM across the 300 sample size combinations was $49.27$ percent, and the average value for F was $48.71$ percent. On average, then, the W_G proved to be more powerful than the F statistic ( $Z = 4.38$ ; $p < 0.001$ ). As usual, the power of both tests depended on $α$ , the effect size (i.e., increasing values of $λ$ ) or equivalent, the expected power (i.e., increasing values of $1 - β$ ), and the sample size. For an example of the power results, see Table 6. In general, for small effect sizes (e.g., if $1 - β \leq 0.60$ ), both tests had a negative bias because they underestimated the expected power. However, if $1 - β = 0.90$ , then a negative bias for both tests seemed to depend on sample size. Thus, for large samples ( $K > 40$ ), it seems that negative biases will be observed more often when $K \leq 40$ , whereas almost no bias will be observed when $1 - β > 0.90$ .

Table 6.

Power of W_G and F for Different Levels of $1 - β$ and $α = 0.05$ (Linear Mixed Model).

			$1 - β$
			0.10		0.30		0.60		0.90		0.95
K	J	n	W_G	F	W_G	F	W_G	F	W_G	F	W_G	F
10	1	6	.08	.06	.19	.14	.53	.45	.92	.87	.96	.93
		10	.03	.02	.21	.19	.56	.52	.94	.92	.95	.93
		20	.03	.02	.15	.14	.45	.43	.88	.87	.94	.94
		30	.03	.03	.18	.17	.49	.47	.89	.88	.95	.95
	2	6	.05	.03	.28	.25	.60	.56	.94	.92	.97	.96
		10	.04	.03	.12	.11	.44	.42	.86	.84	.94	.93
		20	.03	.02	.15	.15	.48	.47	.90	.89	.95	.95
		30	.03	.03	.14	.14	.48	.47	.87	.87	.95	.94
	3	6	.06	.05	.22	.18	.58	.55	.90	.89	.97	.96
		10	.03	.03	.19	.17	.51	.50	.92	.91	.96	.96
		20	.04	.04	.22	.22	.55	.55	.91	.90	.96	.96
		30	.03	.03	.19	.18	.51	.50	.89	.89	.96	.96
20	1	6	.03	.03	.16	.14	.49	.43	.88	.85	.96	.93
		10	.02	.01	.01	.09	.35	.32	.82	.79	.93	.91
		20	.04	.03	.17	.16	.44	.43	.91	.90	.96	.96
		30	.02	.02	.15	.15	.48	.47	.89	.88	.95	.94
	2	6	.03	.03	.16	.14	.42	.40	.89	.88	.96	.95
		10	.03	.02	.15	.14	.46	.44	.86	.86	.95	.94
		20	.05	.04	.14	.14	.45	.44	.90	.90	.95	.95
		30	.04	.03	.14	.14	.45	.44	.91	.90	.95	.95
	3	6	.02	.02	.15	.14	.44	.43	.86	.85	.95	.95
		10	.02	.02	.11	.11	.40	.39	.87	.86	.94	.94
		20	.03	.03	.15	.15	.45	.45	.87	.87	.95	.95
		30	.03	.03	.14	.14	.44	.44	.87	.87	.95	.94
40	1	6	.03	.03	.19	.18	.44	.42	.91	.90	.95	.94
		10	.03	.03	.14	.14	.41	.40	.90	.90	.95	.95
		20	.03	.03	.19	.19	.51	.50	.91	.91	.97	.96
		30	.02	.02	.14	.14	.44	.44	.89	.89	.95	.95
	2	6	.05	.05	.17	.16	.47	.46	.90	.90	.96	.96
		10	.02	.02	.11	.11	.42	.41	.88	.87	.95	.95
		20	.02	.02	.13	.13	.45	.44	.88	.88	.95	.95
		30	.03	.03	.14	.14	.45	.45	.90	.90	.95	.95
	3	6	.02	.02	.13	.13	.40	.40	.88	.87	.94	.94
		10	.03	.03	.15	.15	.47	.46	.90	.90	.95	.95
		20	.03	.03	.14	.14	.40	.40	.87	.86	.96	.96
		30	.02	.02	.14	.14	.43	.43	.90	.90	.96	.96
80	1	6	.03	.02	.15	.14	.41	.40	.89	.89	.95	.95
		10	.03	.03	.14	.14	.38	.37	.84	.84	.96	.95
		20	.03	.03	.15	.15	.42	.42	.89	.89	.96	.96
		30	.03	.03	.14	.14	.46	.46	.88	.88	.95	.95
	2	6	.03	.03	.13	.13	.40	.40	.89	.89	.94	.94
		10	.03	.03	.16	.16	.44	.43	.88	.88	.95	.95
		20	.02	.02	.13	.13	.44	.44	.90	.90	.95	.94
		30	.02	.02	.15	.15	.47	.46	.88	.88	.97	.97
	3	6	.03	.03	.14	.13	.43	.42	.88	.88	.96	.96
		10	.02	.02	.11	.11	.44	.43	.87	.86	.96	.96
		20	.03	.03	.15	.15	.43	.43	.87	.87	.94	.94
		30	.03	.03	.13	.13	.43	.43	.89	.89	.95	.95
100	1	6	.02	.02	.10	.10	.39	.38	.85	.84	.94	.94
		10	.02	.01	.12	.12	.39	.38	.84	.83	.94	.94
		20	.03	.02	.13	.13	.42	.42	.86	.86	.95	.95
		30	.03	.03	.14	.14	.39	.39	.86	.86	.94	.94
	2	6	.03	.03	.15	.15	.47	.47	.89	.89	.95	.95
		10	.04	.04	.14	.14	.43	.43	.91	.91	.96	.96
		20	.03	.03	.13	.13	.43	.43	.87	.87	.95	.95
		30	.02	.02	.14	.14	.43	.43	.88	.87	.95	.95
	3	6	.02	.02	.14	.14	.41	.40	.88	.88	.96	.96
		10	.03	.03	.14	.14	.42	.42	.86	.86	.96	.96
		20	.03	.02	.15	.15	.43	.43	.87	.87	.95	.95
		30	.02	.02	.13	.13	.45	.45	.88	.88	.96	.96

For the GLMM, the average proportions of significant W_G values and F across the 300 sample size combinations were $97.76$ percent and $97.63$ percent, respectively. Consequently, in terms of average power, the W_G statistic outperformed the F statistic ( $Z = 2.21$ ; $p < .01$ ). Although the power of both statistics was much higher for the GLMM than for the LMM, this difference should not be overinterpreted. As we stated above, compared to the LMM, the GLMM has a low average convergent rate (100 percent vs. 41.26%), with convergence sometimes evident for only one out of $1, 000$ replications. Consequently, any comparison of the power rates between the LMM and the GLMM must take the unequal convergent rate into account (see below). However, it seems that in the case of the GLMM, especially those models where we assumed group differences for $β_{g}$ , converge. Table 7 provides an example of the power results for the different sample size combinations. Here, we can see that the power of W_G and F depends mainly on sample size at Level 1, that is, n, and slightly less on $1 - β$ . Thus, for large effect sizes, large sample sizes are not necessary when we expect the effect size to be small.

Table 7.

Power of W_G and F for Different Levels of $1 - β$ and $α = 0.05$ (Generalized Linear Mixed Model).

			$1 - β$
			0.10		0.30		0.60		0.90		0.95
K	J	n	W_G	F	W_G	F	W_G	F	W_G	F	W_G	F
10	1	6	0.85	0.82	0.91	0.89	0.93	0.92	0.97	0.95	0.98	0.97
		10	0.98	0.98	0.99	0.99	1.00	1.00	1.00	1.00	1.00	1.00
		20	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		30	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
	2	6	0.97	0.97	0.99	0.99	0.99	0.99	1.00	1.00	1.00	1.00
		10	0.99	0.99	0.99	0.99	1.00	1.00	1.00	1.00	1.00	1.00
		20	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		30	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
	3	6	0.99	0.99	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		10	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		20	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		30	0.99	0.99	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
20	1	6	0.83	0.81	0.89	0.89	0.93	0.92	0.94	0.93	0.96	0.95
		10	0.98	0.98	0.98	0.98	0.99	0.99	1.00	1.00	1.00	1.00
		20	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		30	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
	2	6	0.96	0.96	0.98	0.98	0.98	0.98	1.00	1.00	0.99	0.99
		10	1.00	1.00	0.99	0.99	0.99	0.99	1.00	1.00	1.00	1.00
		20	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		30	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
	3	6	0.97	0.97	0.99	0.99	0.99	0.98	0.99	0.99	0.99	0.99
		10	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	0.99	0.99
		20	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		30	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
40	1	6	0.81	0.81	0.88	0.87	0.90	0.90	0.95	0.94	0.96	0.95
		10	0.98	0.98	0.98	0.98	0.98	0.98	1.00	1.00	1.00	1.00
		20	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		30	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
	2	6	0.98	0.98	0.98	0.98	0.98	0.98	0.99	0.99	0.99	0.99
		10	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		20	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	0.99	0.99
		30	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
	3	6	0.97	0.97	0.99	0.99	0.99	0.99	1.00	0.99	0.99	0.99
		10	0.99	0.99	1.00	1.00	1.00	1.00	0.99	0.99	1.00	1.00
		20	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		30	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
80	1	6	0.84	0.84	0.87	0.87	0.89	0.89	0.93	0.93	0.95	0.94
		10	0.97	0.97	0.98	0.98	0.99	0.99	0.99	0.99	0.99	0.99
		20	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		30	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
	2	6	0.96	0.96	0.98	0.98	0.96	0.96	1.00	1.00	0.97	0.97
		10	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		20	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		30	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
	3	6	1.00	1.00	0.98	0.98	0.95	0.95	0.99	0.99	0.99	0.99
		10	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		20	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		30	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
100	1	6	0.84	0.84	0.85	0.85	0.87	0.87	0.92	0.92	0.94	0.94
		10	0.97	0.97	0.98	0.98	0.97	0.97	0.99	0.99	0.99	0.99
		20	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		30	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
	2	6	0.96	0.96	0.94	0.94	1.00	1.00	1.00	1.00	0.98	0.98
		10	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		20	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		30	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
	3	6	0.96	0.96	0.97	0.97	1.00	1.00	1.00	1.00	0.98	0.98
		10	0.99	0.99	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		20	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
		30	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00

For a more objective view of the relationship between the observed power of W_G and F, the expected power, the model (LMM vs. GLMM), and the sample size, we estimated the parameters of a GLM (Nelder and Wedderburn 1972), where the number of significant W_G (F) values compared to the number of successful replications per combination was the dependent variable and the different levels of $λ$ , K, J, and n as well as the model type (LMM vs. GLMM) and the test statistic (W_G vs. F) were the independent factors (i.e., we fitted a GLM that assumed a binomial distribution for the dependent variable and a logit link for the link function). We introduced these independent factors stepwise into the analyses, thereby forming a sequence of nested models. To ascertain the overall fit of each model, we used Akaike’s (1974) information criteria (AIC) and Bayesian’s information criteria (BIC; Schwarz 1978). We also used the LRT to compare the nested models. When conducting our analyses, we used PROC GENMOD from SAS/STAT® Version 14.1 of the SAS System for Windows. Table 8 presents the fit statistics for the sequence of GLM models.

Table 8.

Overall Measures of Fit for the Generalized Linear Model Analyses Where the Number of Significant Values for W_G and F per Successful Replication Is the Dependent Variable and the Test (W_G or F) as well as $λ$ , K, J, n, and Model Type (Linear Mixed Model [LMM] vs. Generalized Linear Mixed Model) Are the Independent Factors.

Model	Np^a	Akaike Information Criteria	Bayesian Information Criteria	LogLike^b	LRT^c	$d f$	p
0	1	651,284.93	651,290.02	−325,641.46
1	2	422,042.87	422,053.05	−211,019.44	229,242.05	1	<.001
2	3	422,022.63	422,037.90	−211,008.31	20.25	1	<.001
3	7	30,472.12	30,507.76	−15,229.06	391,550.50	4	<.001
4	11	29,534.88	29,590.87	−14,756.44	937.25	4	<.001
5	13	29,378.66	29,444.83	−14,676.33	156.22	2	<.001
6	16	28,783.91	28,865.35	−14,375.95	594.75	3	<.001
7	42	26,455.99	26,669.77	−13,185.99	2,327.92	26	<.001
8	66	26,037.3	26,373.24	−12,952.65	418.69	24	<.001
9	302	24,736.22	26,273.42	−12,066.11	1,301.08	236	<.001

Note: Model 0: intercept only model; Model 1: factor for model added (LMM is the reference category); Model 2: factor for test added (F is the reference category); Model 3: factor for $λ$ added ( $λ = 1.41$ is the reference category); Model 4: factor for K added ( $K = 10$ is the reference category); Model 5: factor for J added ( $J = 1$ is the reference category); Model 6: factor for n added ( $n = 6$ is the reference category); Model 7: two-way interactions for K, J, and n added; Model 8: three-way interaction for K, J, and n added; Model 9: four-way interaction of $λ$ , K, J, and n added.

^a Number of parameters in the model.

^b Log likelihood of the model.

^c Likelihood ratio test.

As is evident from Table 8, all factors contributed significantly to the explanation of the power of W_G and F. Hence, the power of W_G and F depended strongly, as might be expected, on the effect size and not necessarily, as might also be expected, on the model.⁷ The sample size as well as the test (W_G vs. F) also explained the power differences. As for sample size, the significant LRTs for Models 7 and 8 indicated that we needed to take into account the interaction effects between the sample sizes at the different levels and that the effect of sample size on power depended on the magnitude of $λ$ , just as the significant LRT for Model 9 suggested. Whether we used the AIC or the BIC, Model 9 was the model that best fitted the data.

To obtain a closer look at the effects of the different factors associated with power, consider the estimated parameters for the different models in Table 9.⁸ Note there that both statistics were more powerful for the GLMM model than for the LMM model, even when we took the different numbers of successful replications into account (Model 1; $b = 3.79$ ). This effect became even larger when we entered factors such as sample size or expected power as control variables into the GLM model (see, e.g., Model 9; $b = 5.45$ ). What is also obvious from Table 9 is that the W_G statistic was more powerful than the F test (see, e.g., Model 9, where $b = 0.05$ ), even though this advantage was relatively small. However, as we expected, the magnitude of the expected power, that is, the magnitude of $λ$ , had a positive effect on the power of both tests: As $λ$ increased, the probability of a significant W_G or F statistic also increased (see, e.g., Model 9, where $b = 1.44$ for the dummy of $λ_{1}$ and $b = 6.02$ for the effect of $λ_{4}$ ). Somewhat unexpectedly, the sample size for Level 3 (Stage 1) had an inverse effect on the power of both statistics: the likelihood of a significant W_G or F statistic decreased when the sample size at Stage 1 was $K = 20$ , $K = 40$ , $K = 80$ , or $K = 100$ compared to $K = 10$ (see, e.g., Model 9, where $b = - 0.50$ for $K = 20$ , $b = - 0.55$ for $K = 40$ , $b = - 0.38$ for $K = 80$ , and $b = - 0.38$ for $K = 100$ ). However, increasing the sample size at Level 2 (Stage 2) or Level 1 (Stage 3) had a positive effect on the estimated power of the test statistics (Model 9, where $b = 0.65$ for $J = 2$ and $b = 0.90$ for $J = 3$ ). Hence, with respect to the models considered in this article, increasing the sample size at Levels 1 and 2 seemed to be more useful than increasing the sample size at Level 3.

Table 9.

Parameter Estimates for the Generalized Linear Model Analyses Where the Number of Significant Values for W_G and F per Successful Replication Is the Dependent Variable and the Test (W_G or F) as well as $λ$ , K, J, n, and Model Type (Linear Mixed Model [LMM] vs. Generalized Linear Mixed Model) Are the Independent Factors (With All Presented Effects Significant at $p < .001$ ).

	Model
	0		1		2		3		4		5		6		7		8		9
Effect	b	SE	b	SE	b	SE	b	SE	b	SE	b	SE	b	SE	b	SE	b	SE	b	SE
Constant	0.54	0.00	−0.04	.00	−0.05	.00	−3.05	.01	−2.84	.01	−2.91	.02	−3.05	.02	−3.19	.03	−3.19	.03	−3.41	.07
Model¹			3.79	.01	3.79	.01	5.41	.02	5.41	.02	5.46	.02	5.47	.02	5.50	.02	5.51	.02	5.45	.02
Test²					0.02	.01	0.05	.01	0.05	.01	0.05	.01	0.05	.01	0.05	.01	0.05	.01	0.05	.01
$λ_{1}$ ³							1.26	.01	1.26	.01	1.26	.01	1.26	.01	1.26	.01	1.27	.01	1.44	.10
$λ_{2}$ ³							2.77	.01	2.77	.01	2.77	.01	2.78	.01	2.78	.01	2.79	.01	3.17	.09
$λ_{3}$ ³							4.98	.02	4.99	.02	4.99	.02	5.00	.02	5.02	.02	5.02	.02	5.23	.10
$λ_{4}$ ³							5.88	.02	5.89	.02	5.89	.02	5.90	.02	5.92	.02	5.93	.02	6.02	.12
K ₁ ⁴									−0.26	.01	−0.26	.01	−0.26	.01	−0.48	.03	−0.38	.04	−0.50	.09
K ₂ ⁴									−0.21	.01	−0.21	.01	−0.21	.01	−0.38	.03	−0.40	.04	−0.55	.09
K ₃ ⁴									−0.27	.01	−0.27	.01	−0.27	.01	−0.54	.03	−0.52	.04	−0.38	.09
K ₄ ⁴									−0.33	.01	−0.32	.01	−0.33	.01	−0.70	.03	−0.78	.04	−0.38	.09
J ₁ ⁵											0.10	.01	0.11	.01	0.59	.03	0.70	.04	0.65	.10
J ₂ ⁵											0.10	.01	0.10	.01	0.67	.03	0.58	.04	0.90	.10
n ₁ ⁶													0.11	.01	0.27	.03	0.50	.04	0.69	.10
n ₂ ⁶													0.23	.01	0.41	.03	0.21	.04	0.73	.10
n ₃ ⁶													0.21	.01	0.33	.03	0.32	.04	0.66	.10

Note: [1] LMM is the reference category; [2] F is the reference category; [3] $λ = 1.41$ is the reference category ( $λ_{1} = 5.34$ ; $λ_{2} = 10.71$ ; $λ_{3} = 19.85$ ; $λ_{4} = 23.59$ ); [4] $K = 10$ is the reference category ( $K_{1} = 20$ ; $K_{2} = 40$ ; $K_{3} = 80$ ; $K_{4} = 100$ ); [5] $J = 1$ is the reference category ( $J_{1} = 2$ ; $J_{2} = 3$ ); [6] $n = 6$ is the reference category ( $n_{1} = 10$ ; $n_{2} = 20$ ; $n_{3} = 30$ ).

Discussion

Our investigation of the power of the test statistics W_G and F via a simulation study saw us considering two special cases of the GLMM: a three-level hierarchical model with either a normally distributed dependent variable (LMM) and a Bernoulli distributed dependent variable (GLMM). In the simulation study, we varied the expected power and sample size for all three levels. Use of the expected power allowed us to calculate corresponding values for $β_{g}$ . This design and the given values for g, X , Z , and $θ$ produced $1, 000$ replications in total for the dependent variable for each combination when we used the appropriate GLMM model and either took $1, 000$ random samples for e (LMM) or sampled from the Bernoulli distribution (GLMM). We then analyzed each of these replications, with either an LMM or the GLMM, which resulted in estimates for $β_{g}$ . From there, we applied the W_G as well as the F statistic to each replication that converged and used the percentage of significant W_G and F values per combination as estimates of the expected power.

Overall, we found that both statistics showed a negative bias when the expected power was small, that is, when the estimated power was smaller than the expected power. This finding implies that both tests have a higher probability of failing to reject the null hypothesis when it is in fact false (an increasing Type II error rate) for small effect sizes. During our simulation study, we found that we could not compensate for this behavior by increasing the sample size. Consequently, researchers expecting to find small effect sizes in studies comparing GLMM effects across multiple groups need to be more careful about accepting the null hypothesis. Further research is necessary to explore the magnitude of this negative bias when, for example, other GLMM models are used.

We also found that the W_G statistic outperformed the F statistic with respect to estimated power. The fact that this advantage was small in magnitude may have practical relevance because the slight advantage that W_G has over F implies that the negative W_G bias is smaller than the F bias. As such, researchers may prefer to use W_G instead of F, especially if they assume the effect sizes will be small. However, we also found that both tests had a higher estimated power when applied within the context of the GLMM model than within the context of the LMM model. Results not presented here (but which are part of this article’s Supplemental Material, which can be found at https://journals.sagepub.com/doi/suppl/10.1177/0049124120986182) showed the effect to be a general one because of the lack of a significant interaction between model type and test statistic.

In general, though, it seems that increasing the sample size at Level 1 or Level 2 increases the estimated power of W_G and F. However, and somewhat unexpectedly, we found sample size at Level 3 had an inverse relationship to the estimated power, with both statistics appearing to be more powerful when the Level 3 sample size was relatively small. Thus, researchers using a three-level hierarchical model should, from the power perspective, think about increasing the Level 2 or Level 1 sample sizes but not necessarily the Level 3 size.

Example of Application

Introduction

In 2011, the IEA conducted the PIRLS and the TIMSS jointly for the first time. PIRLS has assessed the reading comprehension achievement of Grade 4 students every five years since 2001 (Mullis et al. 2012), while TIMSS has assessed the mathematics and science achievement of Grades 4 and 8 students every four years since 1995 (Martin et al. 2012). In 2011, 34 countries and three benchmark participants (hereafter, all 37 participants are referred to as countries) collected data on Grade 4 students’ educational achievement in three competence domains: reading comprehension, mathematics, and science (Martin and Mullis 2013).

Martin et al. (2013) performed a “school-effectiveness” analysis using the TIMSS/PIRLS 2011 combined data set. For their analysis, Martin and his colleagues used five school-effectiveness variables and two student home-background variables as predictors in country-specific HLMs. In other words, the authors applied the HLM independently to each country data set. One of the predictors is an index for home resources for learning (HRL), and in line with Bourdieu’s (1986) work on cultural capital, the HRL index can be interpreted as a measure of students’ socioeconomic and cultural home learning environments (Smith, Wendt, and Kasper 2017). Martin and colleagues used students’ achievement scores (reading comprehension, mathematics achievement, and science achievement) as dependent variables. Because the goal of the study was to “present an analytic framework that could provide an overview of how these relationships vary across countries” (Martin et al. 2013:110), it is reasonable to assume that the results from the hierarchical linear modeling would have been comparable across the participating countries.

One of Martin et al. (2013) major findings was that the strength of the relationships between the school-effectiveness variables and the student achievement scores decreased substantially in nearly all 37 countries when Martin et al. included home-background control variables in their models; country-specific effects were also apparent. For example, in 15 countries, only one of the five effectiveness indicators still presented a statistically significant prediction coefficient after Martin and his colleagues controlled for students’ home background. In four countries, three prediction coefficients remained significant. If the results of these analyses had, in fact, been comparable across the participating countries, then we could assume that in most of these countries, the strength of the relationships between school-effectiveness variables and student achievement would be relatively weak once Martin et al. controlled for student home background.

If we agree that the predictors used by Martin et al. (2013) can be interpreted as measures of students’ socioeconomic and cultural home learning environments, then we can view their results as being in line with current results in educational research. Within empirical educational research, operationalizing social background is often driven by interpretations of Bourdieu’s (1986) theory of capital, wherein economic capital is the most obvious form of power (Bourdieu 1986: 242-43). Bourdieu describes members of a society as foremostly arranged in hierarchical order, with, to put this in simplified terms, each hierarchical grouping having different volumes of certain capital resources (economic, cultural, social, and symbolic). With regard to human acting, thinking, and feeling, the forms of capital become embodied within habitus (Bourdieu 1998), a shared practical sense of social group membership. Gaining empirical access to the aforementioned practical sense or the identification of resources associated with capital forms is difficult, however.

It thus makes sense that the social-class models of students’ social background used in large-scale assessments like TIMSS and PIRLS endeavor to bring order to the complexity of social structure by simplifying and reducing the original theoretical complexity. This is the reason why single social background indicators have higher significance than other indicators. For example, considerable emphasis is placed on parents’ main occupation. The assumption here is that if we know the parents’ occupational level, we automatically know the family’s social position. Established standards in large-scale surveys like the Standard Index of Occupational Prestige Scores or the International Socio-Economic Index of Occupational Status have this kind of focus. Examples include the associated social prestige of occupations or the average income of occupational groups (Ganzeboom et al. 1992). According to the aforementioned theories, the items of the HRL scale, for example, can be classified as indicators of the economic and cultural capital of student’s families (Mullis et al. 2012).

The impact of student social background on achievement is one of the most consistently observed phenomena (see, e.g., Jehangir, Glas, and van den Berg 2015; Lavrijsen 2015; Martin and Mullis 2012; Martin et al. 2008; Mullis et al. 2007, 2008, 2012; Organization for Economic Cooperation and Development 2014; Pokropek, Borgonovi, and Jakubowski 2015; Sirin 2005). In a general sense, the observed achievement differences associated with student social background can be interpreted as educational inequalities (Walzebug and Kasper 2016). However, the relationship between the social background variables and the achievement scores varies considerably across participating countries in the study reported by Martin et al. (2013), suggesting different degrees of educational inequalities within the countries. However, it is not obvious, from the results that Martin et al. presented, which procedure they used for the cross-national comparisons of the fixed effects from the HLM. Among other concerns, we simply do not yet know whether the different degrees of educational inequality within the countries observed in the study can be considered statistically significant, which is why in the present example, we decided to reapply the country-specific HLMs.

However, because the effect of the school-effectiveness variables on achievement scores nearly vanished once social-background variables were included in the HLMs, and because we were mainly interested in the question of whether the different degrees of educational inequalities observed in Martin et al.’s (2013) study could be deemed statistically significant, we investigated only the relationship between the Grade 4 students’ mathematics achievement and the home-background control variables. Our analyses did not, therefore, include reading or science competencies and did not include school-effectiveness variables. We compared the fixed effects of this analysis cross nationally by applying the W_B statistic we developed and which we have described in this article. Because we intended our analyses in this article to mainly demonstrate the application of the W_B statistic, we discourage readers from interpreting the results with respect to any sophisticated theory or hypothesis.

Data and Variables

The data sets for TIMSS/PIRLS 2011 combined that we used for our study are freely available under the URL https://timss.bc.edu/timsspirls2011/international-database.html. The data sets contain the responses of 183,475 Grade 4 students across 37 countries. In addition to students’ achievement in mathematics, science, and reading, the data sets include students’ responses to questionnaire items designed to capture information on student-background variables as well as parents’ responses to items regarding parent-background variables. Of the freely available data sets, the country-specific data sets that we used have the name ASG***B1 and ASH***B1 appended to them, wherein *** stands for a country-specific code, ASG are the Grade 4 student-background data files, and ASH are the corresponding home-background data files.⁹ The different data sets are also concatenated across the countries and then merged across the data resources. However, because analyzing a data set from 183,475 students from 37 countries was well beyond what we could include in this article, we restricted our analysis to data from the Grade 4 students from those countries that participated in TIMSS/PIRLS 2011 combined and were members of the European Union. This restriction resulted in a sample size of 15 countries (see Table 10) and a total sample size of $n = 69, 674$ students.

When constructing their home-background control model, Martin et al. (2013) used the variables ASBGHRL and ASBHAVG at the student level and ASBCHRL and ASBCAVG at the school level. ASBHAVG is the average of the variables ASBHELT and ASBHENT, and ASBCHRL and ASBCAVG form the school average of the variables ASBGHRL and ASBHAVG. The variables ASBGHRL, ASBHELT, and ASBHENT are contextual scales derived, via the partial credit model, from responses to specific sets of questions. A description of the contextual scales can be found in the context questionnaire scales section of Methods and Procedures in TIMSS and PIRLS 2011 (Martin and Mullis 2012). The variable ASBGHRL is an index for HRL, and it is based on several questionnaire items, including “number of books in the home” and “highest level of education of either parent.” ASBHELT is an index for early literacy tasks. It is based on an item that asked parents how often they or someone else did different activities in the home (e.g., read books, told stories) before the child began primary/elementary school. The index for early numeracy tasks, ASBHENT, is based on an item that asked parents to indicate which of a variety of numeracy-related activities (e.g., counting different things, playing with number toys) they or someone else did in the home before the child began primary/elementary school. The variables ASBGHRL, ASBHELT, and ASBHENT are part of the data sets that we used, while ASBHAVG, ASBCHRL, and ASBCAVG are the variables that we calculated. Table 10 includes descriptive statistics for these variables and for the mathematics achievement values. Because the means for ASBGHRL, ASBHAVG, ASBCHRL, and ASBCAVG in the table are highly similar to the values presented by Foy and O’Dwyer (2013), we assumed that, in general, we were using the same data as Martin et al. (2013) used in their analyses.¹⁰

Table 10.

Descriptive Statistics for Mathematics Achievement and the Indicators of the Home-background Control Variables.

Country	N		MAT		ASBGHRL		ASBHAVG		ASBCHR		ASBCAVG
Country	$S T$	$S L$	M	$S E$	M	$S E$	M	$S E$	M	$S E$	M	$S E$
Austria	4,587	158	505.19	2.65	10.44	.06	9.32	.03	10.43	.06	9.30	.03
Czech Republic	4,433	177	508.28	2.48	10.52	.05	9.89	.03	10.48	.05	9.88	.03
Germany	3,928	197	524.82	2.21	10.61	.06	9.46	.03	10.58	.06	9.46	.04
Hungary	5,149	149	512.36	3.40	9.97	.09	9.30	.03	9.90	.10	9.28	.04
Ireland	4,383	150	525.82	2.82	10.82	.06	9.38	.03	10.85	.06	9.36	.03
Italy	4,125	202	503.84	2.71	9.63	.05	9.23	.02	9.69	.06	9.24	.02
Lithuania	4,584	154	531.46	2.63	9.83	.05	10.07	.04	9.71	.05	10.01	.04
Malta	3,492	96	491.76	1.27	10.30	.02	10.25	.03	10.30	.01	10.25	.01
Poland	4,962	150	476.95	2.26	9.97	.06	9.89	.04	9.85	.06	9.85	.04
Portugal	3,991	147	529.74	3.44	9.85	.07	9.44	.04	9.82	.07	9.42	.04
Romania	4,643	148	476.37	6.01	8.67	.09	9.59	.10	8.51	.08	9.53	.09
Slovak Republic	5,561	197	503.12	3.93	9.93	.07	8.98	.04	9.87	.07	8.96	.04
Slovenia	4,433	195	509.81	1.99	10.43	.04	9.33	.03	10.40	.03	9.29	.03
Spain	4,105	151	478.79	2.81	10.26	.06	10.66	.04	10.20	.07	10.65	.04
Sweden	4,482	152	501.95	2.20	11.40	.05	10.29	.04	11.37	.05	10.27	.04

Note: ST = student sample size; SL = school sample size; MAT = mathematics achievement of fourth graders; ASBGHRL = home resources for learning; ASBHAVG = early literacy/numeracy tasks; ASBCHRL = school average of home resources for learning; ASBCAVG = school average of early literacy/numeracy tasks.

Prediction Model

The prediction model we used in our study was also very similar to the one Martin et al. (2013) used for their home-background control model. The variables in our country-specific HLM included ASBGHRL and ASBHAVG as the student-level predictors and ASBCHRL and ASBCAVG as the school-level predictors. We also included a random intercept term in our model, but unlike Martin et al. (2013), we did not include random effects for the slope coefficient of ASBGHRL and ASBHAVG. If, for example, for a given country g, we let $y_{i s}$ be the achievement value of student i in school s ( $i = 1, \dots, n_{l_{g}}$ ; $s = 1, \dots, l_{g}$ ), $x_{i s, H}$ be the corresponding student-level value on the HRL scale, $x_{i s, E}$ be the student-level value on “ASBHAVG”, ${\bar{x}}_{s, H}$ be the school average of $x_{i s, H}$ and ${\bar{x}}_{s, E}$ be the school average of $x_{i s, E}$ , then the combined model for explaining mathematics achievement can be expressed as

y_{i s} = γ_{00} + γ_{10} x_{i s, H}^{*} + γ_{20} x_{i s, E}^{*} + γ_{01} {\bar{x}}_{s, H} + γ_{02} {\bar{x}}_{s, E} + u_{0 s} + r_{i s},

where the $γ s$ represent the fixed effects of the predictors and the overall intercept, $x^{*} s$ are the school mean-centered values of the x, and $\bar{x}$ is the school average of the respective values on x. Furthermore, u is the random effect for the random intercept term, and r is the residual term on Level 1. In keeping with the approach by Martin et al. (2013), we assumed that u and r were mutually independent and multivariate normally distributed.

Through a slight reformulation, the prediction model could also be expressed as a GLMM with an identity link function:

(\begin{matrix} y_{11} \\ ⋮ \\ y_{n_{l_{g}} 1} \\ y_{12} \\ ⋮ \\ y_{n_{l_{g}} l_{g}} \end{matrix}) = (\begin{matrix} 1 & x_{11, H}^{*} & x_{11, E}^{*} & {\bar{x}}_{1, H} & {\bar{x}}_{1, E} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 1 & x_{n_{l_{g}} 1, H}^{*} & x_{n_{l_{g}} 1, E}^{*} & {\bar{x}}_{1, H} & {\bar{x}}_{1, E} \\ 1 & x_{12, H}^{*} & x_{12, E}^{*} & {\bar{x}}_{2, H} & {\bar{x}}_{2, E} \\ ⋮ & ⋮ & ⋮ \\ 1 & x_{n_{l_{g}} l_{g}, H}^{*} & x_{n_{l_{g}} l_{g}, E}^{*} & {\bar{x}}_{l_{g}, H} & {\bar{x}}_{l_{g}, E} \end{matrix}) (\begin{matrix} β_{0 g} \\ β_{1 g} \\ β_{2 g} \\ β_{3 g} \\ β_{4 g} \end{matrix}) + (\begin{matrix} 1 \\ ⋮ \\ 1 \\ ⋱ \\ 1 \\ ⋮ \\ 1 \end{matrix}) (\begin{matrix} θ_{01} \\ ⋮ \\ θ_{0 l_{g}} \end{matrix}) + (\begin{matrix} e_{11} \\ ⋮ \\ e_{n_{l_{g}} 1} \\ e_{12} \\ ⋮ \\ e_{n_{l_{g}} l_{g}} \end{matrix}) .

Here, $γ_{00} = β_{0}$ , $γ_{10} = β_{1}$ , $γ_{20} = β_{2}$ , $γ_{01} = β_{3}$ , $γ_{02} = β_{4}$ , $u_{0 s} = θ_{0 s}$ , and $r_{i s} = e_{i s}$ . We fitted this GLMM separately for each country and for each plausible value¹¹ and then averaged the results per country across plausible values according to the Rubin (1987) formula.

Weighting

The sampling procedure applied in TIMSS/PIRLS 2011 combined can be considered a stratified two-stage probability proportional to size systematic–clustered random sample. This procedure was applied separately in each participating country. In practice in most countries, it meant selecting a random sample of schools and then, within those schools, randomly selecting one class of Grade 4 students. Usually, all students within the selected classes participated in TIMSS/PIRLS 2011, which meant there was no need for a third sampling stage that would have involved sampling students within the classes. However, the sampling procedure meant the Grade 4 students in the participating schools did not have an equal chance of being selected. To compensate for this unequal selection probability, the TIMSS and PIRLS analysts weighted the data sets. The weights were basically the inverse of the selection probability, whether for the school, the class, or the student. One of these weights, TOTWGT, was normalized in a way that saw the sum of the weights across students within a country resulting in the population size. TIMSS and PIRLS typically use this weight for presenting results at the student level (Martin et al. 2012; Mullis et al. 2012). The descriptive statistics for mathematics achievement, ASBGHRL and ASBHAVG, presented in Table 10 were thus based on TOTWGT, and a Level 2 weight was used for ASBCHRL and ASBCAVG (accounts of the construction of level-specific weights in TIMSS and PIRLS can be found in Kasper, Schulz-Heidorf, and Schwippert 2018; Rutkowski et al. 2010). We also used level-specific weights in our GLMM analyses because various authors have suggested that these weights should be used for HLM models (Asparouhov 2006; Pfeffermann et al. 1998; Rabe-Hesketh and Skrondal 2006; Rutkowski et al. 2010).¹²

Outcome

The outcome we designated for observation was the estimated ${\hat{β}}_{g}$ . We used this ${\hat{β}}_{g}$ to apply W_B to test the hypotheses $H_{0} : β_{d} = 0$ . We then applied W_B to all pairwise comparisons between the countries $β_{g}$ to test the hypotheses $H_{0} : β_{v} = β_{i}$ with $v, i \in G$ and $v = i$ .

Missing Values and Software

Martin et al. (2013) used single imputation methods to impute the missing values in the HLMs predictor variables. However, they neither reported the details of their implemented imputation strategy nor provided readers with the imputed data set. Despite these omissions, we decided to approximate Martin et al. (2013) results as closely as possible by performing a single imputation for the missing values in the predictor variables. The method we used to do this was the Markov chain Monte Carlo method (Schafer 1997), in which all predictors and plausible values serve as the conditional variables. All of our analyses in this section are based on this imputed data set.

The software packages we employed for our analyses were SAS/STAT® and SAS/IML®, Version 9.4 (TS1M1) of the SAS System for Windows¹³ (SAS Institute Inc. 2015a, 2015b), and we used the PROC GLIMMIX procedure to complete our GLMM analysis. We used PROC IML to implement the derived test statistics for the outcomes. Those readers wanting to form a full reconstruction of our analyses need to refer to the syntax and data sets that we used. These can be found in the material that supplements this article (which can be found at https://journals.sagepub.com/doi/suppl/10.1177/0049124120986182).

Results

Table 11 displays the fixed-effect vectors for the 15 TIMSS/PIRLS countries. Here, we can see that the fixed-effects coefficients and the corresponding standard errors for the Level 1 variables ASBGHRL and ASBHAVG are very similar to the results that Martin et al. (2013) found for their home-background control model. The small deviations evident in the table for these Level 1 effects can be attributed to differences between Martin and colleagues’ (2013) model and the imputation model we used. However, the fixed effects for the Level 2 variables ASBCHRL and ASBCAVG as well as the intercept vary considerably from Martin and colleagues’ results. This variance is expected because we introduced level-specific weights in our analysis while Martin et al. (2013) used a combined weight. As Rabe-Hesketh and Skrondal (2006) showed, level-specific weights affect estimation of the random components and in turn estimation of the corresponding fixed-effect terms. As Martin et al. (2013) also found out, the fixed-effect vectors vary considerably across countries. Thus, although every country shows a relationship between students’ social background and students’ observed mathematics achievement values, the nature and strength of this relationship depend on the country in which the students were being taught. In Portugal, for example, the effect sizes are considerably lower than those in Slovenia, while in Slovenia, the effect of student-level home resources on mathematics achievement seems to be much greater than in the remaining countries.

Table 11.

Estimated Fixed Effect Vectors of the Applied Generalized Linear Mixed Model.

Country	${\hat{β}}_{0}$	${\hat{σ}}_{β_{0}}^{2}$	${\hat{β}}_{1}$	${\hat{σ}}_{β_{1}}^{2}$	${\hat{β}}_{2}$	${\hat{σ}}_{β_{2}}^{2}$	${\hat{β}}_{3}$	${\hat{σ}}_{β_{3}}^{2}$	${\hat{β}}_{4}$	${\hat{σ}}_{β_{4}}^{2}$
Austria	215.66	58.54	15.84	0.69	7.87	0.70	27.41	2.71	0.35	5.52
Czech Republic	−161.04	73.58	15.85	1.10	9.18	0.82	34.60	3.35	30.86	5.32
Germany	40.51	75.80	12.57	0.66	7.84	0.83	31.07	3.01	16.35	6.04
Hungary	171.15	48.58	17.42	0.75	8.48	0.72	33.68	1.77	0.58	5.80
Ireland	257.27	64.94	17.10	0.95	9.74	1.10	28.49	2.71	−4.24	6.36
Italy	177.20	74.24	11.25	0.96	9.14	0.83	22.24	4.42	12.18	5.95
Lithuania	30.55	35.63	11.45	1.00	17.27	0.98	20.89	2.98	29.36	4.48
Malta	22.92	69.92	16.36	1.25	9.04	1.05	35.20	3.97	10.37	6.86
Poland	187.77	37.20	15.59	0.83	13.13	0.77	21.94	2.05	7.13	4.11
Portugal	437.54	62.01	11.47	0.95	8.18	0.78	17.16	3.00	−8.14	6.83
Romania	187.38	56.80	14.91	1.87	11.10	1.94	23.48	4.58	8.89	6.96
Slovak Republic	378.28	73.09	17.51	0.91	8.21	0.89	28.05	5.32	−17.11	9.51
Slovenia	164.24	44.96	18.77	1.20	11.28	0.70	25.75	2.49	8.25	4.46
Spain	55.93	39.85	10.57	0.79	12.00	0.94	22.04	2.48	18.47	4.43
Sweden	68.68	32.88	10.86	0.84	14.41	1.02	24.09	1.73	15.42	3.48

Note: ${\hat{β}}_{0}$ = intercept; ${\hat{β}}_{1}$ = home resources for learning; ${\hat{β}}_{2}$ = early literacy/numeracy tasks; ${\hat{β}}_{3}$ = school average of home resources for learning; ${\hat{β}}_{4}$ = school average of early literacy/numeracy tasks.

To determine whether the observed differences in the fixed-effect vectors across countries were statistically significant, we calculated W_B with reference to the hypothesis $H_{0} : β_{d} = 0$ . The value of $W_{B} = 1, 853.50$ with $d f = 70$ was statistically significant ( $p < .001$ ), which meant we rejected $H_{0}$ . In general, then, it should be reasonable to assume country-specific GLMM models for the home-background control variables. To determine whether this premise also held for all pairwise comparisons across the countries, we applied W_B to all pairwise sets of $β_{g}$ to test the hypothesis $H_{0} : β_{v} = β_{i}$ with $v, i \in G$ and $v = i$ . Table 12 presents the results of this analysis.

Table 12.

Pairwise Comparison Between Country’s Fixed Effects Vectors From the Generalized Linear Mixed Model (W_B Values Above the Diagonal and p Values Below the Diagonal).

	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
1		20.19	49.11	79.64	14.80	44.55	135.40	11.84	57.42	103.49	13.23	12.54	24.17	87.82	152.10
2	0		70.83	99.42	33.57	63.88	216.64	39.28	103.56	119.85	24.17	44.13	35.31	160.77	242.02
3	0	0		38.00	31.67	8.78	103.76	65.96	167.71	39.83	11.27	30.30	61.45	141.37	279.10
4	0	0	0		25.47	32.25	108.18	47.98	155.75	56.10	5.98	16.76	54.66	172.68	361.79
5	0.01	0	0	0		28.85	77.35	17.92	58.44	47.97	5.07	3.08	6.99	103.27	193.282
6	0	0	0.12	0	0		53.41	66.33	108.71	33.75	7.91	29.05	50.11	114.63	171.16
7	0	0	0	0	0	0		395.47	450.24	91.23	55.13	95.03	99.47	579.44	433.67
8	0.04	0	0	0	0	0	0		23.00	93.20	52.33	16.93	39.31	50.54	57.52
9	0	0	0	0	0	0	0	0		202.83	53.64	55.03	84.06	48.72	44.21
10	0	0	0	0	0	0	0	0	0		25.33	47.94	100.30	186.11	309.87
11	0.02	0	0.05	0.31	0.41	0.16	0	0	0	0		8.10	5.72	110.65	93.05
12	0.03	0	0	0.01	0.69	0	0	0	0	0	0.15		16.51	106.75	166.39
13	0	0	0	0	0.22	0	0	0	0	0	0.33	0.01		134.60	222.15
14	0	0	0	0	0	0	0	0	0	0	0	0	0		5.21
15	0	0	0	0	0	0	0	0	0	0	0	0	0	0.39

Note: 1 = Austria; 2 = Czech Republic; 3 = Germany; 4 = Hungary; 5 = Ireland; 6 = Italy; 7 = Lithuania; 8 = Malta; 9 = Poland; 10 = Portugal; 11 = Romania; 12 = Slovak Republic; 13 = Slovenia; 14 = Spain; 15 = Sweden.

As is evident from the table, most of the pairwise comparisons resulted in highly statistically significant different fixed-effect vectors across countries. However, in 16 cases, the p value is greater than or equal to $p \geq .01$ . This result implies that we can consider the fixed-effect vectors of, for example, Ireland and the Slovak Republic as similar ( $p = .69$ ), and likewise for the fixed-effect vectors of Ireland and Slovenia ( $p = .22$ ) and the Slovak Republic and Slovenia ( $p = .01$ ). Therefore, based on this result, we can group these three countries together. In the same way, we can consider the fixed-effect vectors of Germany, Italy, and Romania as equal. These findings meant we could not reject our hypothesis $H_{0} : β_{v} = β_{i}$ for all $v, i \in G$ and $v = i$ and from there advise grouping countries based on the equality of the fixed-effects vectors $β_{g}$ . Thus, in accordance with the assumption that the results of the GLMM model are comparable across countries, our conclusion has to be that this GLMM model is, in general, not for all countries country specific (at least with respect to the fixed effects of the home-background control variables).

Discussion

After Martin et al. (2013) performed their “school-effectiveness” analysis using the TIMSS/PIRLS 2011 combined data set, they compared the results of their analyses of the data across the participating countries. During this process, they referred to the fixed-effects vectors $β_{g}$ for the countries in order to group the countries according to their similarity in terms of the fixed effects. However, Martin and colleagues did not document the procedure they used to conduct the cross-national comparisons of the fixed effects from their HLMs. We therefore applied the W_B statistics to (almost) the same data set to show that in fact some of the fixed-effect vectors could be considered similar across countries with regard to the home-background control variables and when a traditional approach to testing for statistical significance was used. As such, some of the participating countries can be grouped.

In line with Bourdieu’s (1986) work on cultural capital, TIMSS and PIRLS derive variables for the home-background control model (ASBGHRL and ASBHAVG) to serve as indicators for students’ social background. In general, like Martin et al. (2013), we have shown, for every country considered here, a strong association between these indicators and mathematics achievement. Hence, we must assume strong educational inequalities in all these countries. These inequalities are present not only at the student level, that is, students with small values on the social background indicators show lower performance in mathematics than students with higher values on these indicators, but also at the class level, meaning that classes with lower average values on the social background indicators show lower performance on the average mathematics achievement score (where the average is taken over the students within a class). Also, like Martin et al. (2013), we have shown that countries can be grouped according to the strength of these educational inequalities. Furthermore, whereas the overall W_B statistics were highly statistically significant, the W_B statistics for the pairwise comparisons were not always statistically significant. Thus, having taken the results of our study into account, we can conclude that the degree of educational inequality not only varies considerably across the countries considered here but also is of a similar size in several other countries.

Although our results indicate that the grouping made by Martin et al. (2013) needs closer consideration, we do caution that our example study has some limitations. First, because we did not know which procedure Martin et al. (2013) used to compare the fixed effects from their HLMs analyses, we were not able to assess the reliability of this approach and especially so with respect to the procedure we used. Second, although Martin et al. (2013) based their comparisons on the full HLMs model, that is, the model that included not only the home-background control variables but also the school-effectiveness variables, our models included only the home-background control variables. We consider that including additional variables in our models would not alter the basic results of the dissimilarity of the fixed-effects vectors $β_{g}$ across countries and that a full reanalysis of the data sets that Martin et al. (2013) used should include these variables as well. The restriction pertaining to our study is sample size. Whereas Martin et al. (2013) analyzed the data sets from 37 countries, the only data sets we analyzed were those from the 15 countries that were members of the European Union in 2011. Consequently, any full reanalysis of the data that Martin et al. (2013) used should include all of the countries that these authors analyzed. Our fourth concern relates to the fact that Martin et al. (2013) used a single imputation procedure for imputing missing values in the independent variables. However, because Martin and his colleagues did not report which imputation model they used for imputing missing values and because they did not provide the fully imputed data sets, we had to apply our own imputation method. In addition, because our descriptive statistics for the fully imputed variables were highly similar to the values presented by Foy and O’Dwyer (2013), we assumed that, in principle, we were using the same data that Martin et al. (2013) used. However, any closer consideration of Martin and his colleagues’ comparisons should take this difference into account. Finally, we used level-specific weights in our HLMs analyses whereas the results that Martin et al. (2013) generated were based on a combined weight. A reconstruction of these authors’ results should also take this consideration into account.

General Discussion and Conclusions

Basing our work on the Wald statistic, we developed an asymptotic test statistic for across-group comparisons of the fixed and random effects from the GLMM. The test can be used when the GLMM is applied independently to each group’s data set, assuming we have the same conditional distributional family of the dependent variable, link function, and distributional family for the random effects across groups. We also need to assume that the fixed effects are independent from the random effects and that both effects are asymptotically normally distributed. Under these conditions, our statistic can be used to test hypotheses of the form $H_{0} : L ξ_{d} = m$ with $ξ_{d} = ξ_{+} - ξ_{*}$ , $ξ_{+} = (\begin{matrix} ξ_{1}^{'}, \dots, ξ_{1}^{'} \end{matrix})^{'}$ , $ξ_{*} = (\begin{matrix} ξ_{2}^{'}, \dots, ξ_{G}^{'} \end{matrix})^{'}$ , and $ξ_{g} = (\begin{matrix} β_{g}^{'} & θ_{g}^{'} \end{matrix})^{'}$ .

Although the assumptions under which we developed our test statistic are very common within the context of the GLMM, it might be desirable to have a test statistic that is also valid under more general assumptions (e.g., not normally distributed random effects). Also, because it is usually not the random effects themselves but the variance components that are of interest, a valid test that simultaneously tests the fixed effects and the variance components in the setting considered here is desirable. However, development of such a test needs to take into account the findings from Verbeke and Molenberghs (2000, 2003) that emphasized the need to address the problem of invalid tests when the parameters are on the boundary of the parameter space. Finally, because our theoretical derivations of the test are very general, and we can thus assume that the test is applicable to a broad range of GLMM subtypes, investigating the applicability of this test to several special versions of this model (e.g., the HLM with cross-level interactions) in more detail is worthwhile.

By using power studies, we showed that when a hierarchical data structure is assumed, our test statistic W_G outperformed, in terms of estimated power, the F test statistic proposed by Lazar and Zerbe (2011). We also showed that both tests had a negative bias (i.e., that of underestimating the expected power) when the expected power was small. Thus, although the extent to which the W_G statistic outperformed the F test was relatively small, it can still be relevant because in situations where we expect the power to be relatively small, the negative bias for W_G is smaller. We furthermore showed that increasing the sample size at Level 1 and at Level 2 had a positive effect on the estimated power of both statistics and that sample size at Level 3 was negatively related to the estimated power. Hence, in regard to power, we consider increasing sample sizes at both Levels 1 and 2 is reasonable but that the same cannot necessarily be said in relation to Level 3.

We found the power studies very helpful for gaining an impression of how W_G and F perform within the context of the GLMM. However, because the GLMM is a very general model, and because it was beyond the scope of this article to look at all possible specific formulations of it, we had to restrict our power studies to a concrete formulation of the GLMM. We accordingly assumed a hierarchical data structure and two kinds of conditional distributions for the dependent variable with a corresponding link function. Although hierarchical data structures are very common in, for example, educational psychology, power studies featuring different formulations of the GLMM should provide further information about the performance behavior of W_G . The negative relationship between sample size at Level 3 and the estimated power of W_G and F also merits further consideration.

We also provide a practical example for the application of the W_G statistic. Here, we reanalyzed part of the TIMSS/PIRLS 2011 combined data sets that Martin et al. (2013) used for their “school-effectiveness” analysis. During their analysis, Martin and his colleagues used five school-effectiveness variables and two student home-background variables as predictors in country-specific HLMs. In general, once the home-background variables had been controlled for, the school-effectiveness variables no longer showed associations with students’ achievement, which is why we confined our reanalysis to the home-background control variables only. In line with Bourdieu’s (1986) work on cultural capital, home-background control variables can be interpreted as a measure of students’ socioeconomic and cultural home learning environments. In a general sense, we can consider associations between these variables and students achievement as educational inequalities (Walzebug and Kasper 2016). Therefore, the associations between these variables and students’ achievement found in Martin et al.’s study shows us the existence of educational inequalities in all of the participating countries.

However, the relationships between the social background variables and the achievement scores varied considerably across the participating countries in Martin et al.’s (2013) study, suggesting different degrees of educational inequality within the countries. Unfortunately, it is not clear from the results that Martin and his colleagues presented which procedure they used for the cross-national comparisons of these effects. Accordingly, we do not know whether the different degrees of educational inequality within the countries can be considered as statistically significant. It was for this reason that we decided to reapply the country-specific HLMs, using only the social background variables. We then compared the fixed effects of this analysis cross nationally by applying the test statistic we developed. Our results suggest that, in general, we can consider the observed educational inequalities as statistically significant. Moreover, because not all of the pairwise comparisons of the fixed effects across countries were statistically significant, we can assume that educational inequalities exist in all participating countries and that the extent of these inequalities is of similar size in some countries. This finding provides educational researchers with the opportunity to compare countries with a lesser degree of educational inequality, such as Portugal, with countries with a higher degree of educational inequality, such as Slovenia.

Also, while we provided a practical example for the application of the W_G statistic in this article, we do caution that our example-related work has some limitations. First, we examined only one achievement domain (mathematics achievement) out of three domains (reading achievement, science achievement, and mathematics achievement). Consequently, it is not clear whether our results will also emerge as valid if we consider other educational outcomes. Second, as Walzebug and Kasper (2016) discuss, the indicators used in our study measured the social background of the students only and in a very simplified manner. Although these measures are typically not measurement invariant across countries (Wendt et al. 2017), Martin et al. (2013) assumed that they are measurement invariant, as did we during our reanalysis. The assumption of measurement invariant is a necessary requirement for validity across group comparisons. It therefore remains unclear as to what results might be observed if the indicators used for social background were proven to be measurement invariant. A third limitation of our reanalysis is the sample size. Whereas Martin et al. (2013) analyzed the data sets from 37 countries, the only data sets we analyzed were those from the 15 countries that were the members of the European Union in 2011. Hence, our results apply only to these 15 countries. With regard to international comparative educational research studies, it therefore seems worthwhile for us to expand our analyses to include all countries that participated in TIMSS and PIRLS.

Supplemental Material

Supplemental Material, sj-pdf-1-smr-10.1177_0049124120986182 - Multiple Group Comparisons of the Fixed and Random Effects From the Generalized Linear Mixed Model

Supplemental Material, sj-pdf-1-smr-10.1177_0049124120986182 for Multiple Group Comparisons of the Fixed and Random Effects From the Generalized Linear Mixed Model by Daniel Kasper, Katrin Schulz-Heidorf and Knut Schwippert in Sociological Methods & Research

Supplemental Material

Supplemental Material, sj-pdf-2-smr-10.1177_0049124120986182 - Multiple Group Comparisons of the Fixed and Random Effects From the Generalized Linear Mixed Model

Supplemental Material, sj-pdf-2-smr-10.1177_0049124120986182 for Multiple Group Comparisons of the Fixed and Random Effects From the Generalized Linear Mixed Model by Daniel Kasper, Katrin Schulz-Heidorf and Knut Schwippert in Sociological Methods & Research

Supplemental Material

Supplemental Material, sj-zip-1-smr-10.1177_0049124120986182 - Multiple Group Comparisons of the Fixed and Random Effects From the Generalized Linear Mixed Model

Supplemental Material, sj-zip-1-smr-10.1177_0049124120986182 for Multiple Group Comparisons of the Fixed and Random Effects From the Generalized Linear Mixed Model by Daniel Kasper, Katrin Schulz-Heidorf and Knut Schwippert in Sociological Methods & Research

Supplemental Material

Supplemental Material, sj-zip-2-smr-10.1177_0049124120986182 - Multiple Group Comparisons of the Fixed and Random Effects From the Generalized Linear Mixed Model

Supplemental Material, sj-zip-2-smr-10.1177_0049124120986182 for Multiple Group Comparisons of the Fixed and Random Effects From the Generalized Linear Mixed Model by Daniel Kasper, Katrin Schulz-Heidorf and Knut Schwippert in Sociological Methods & Research

Supplemental Material

Supplemental Material, sj-tex-1-smr-10.1177_0049124120986182 - Multiple Group Comparisons of the Fixed and Random Effects From the Generalized Linear Mixed Model

Supplemental Material, sj-tex-1-smr-10.1177_0049124120986182 for Multiple Group Comparisons of the Fixed and Random Effects From the Generalized Linear Mixed Model by Daniel Kasper, Katrin Schulz-Heidorf and Knut Schwippert in Sociological Methods & Research

Supplemental Material

Supplemental Material, sj-tex-2-smr-10.1177_0049124120986182 - Multiple Group Comparisons of the Fixed and Random Effects From the Generalized Linear Mixed Model

Supplemental Material, sj-tex-2-smr-10.1177_0049124120986182 for Multiple Group Comparisons of the Fixed and Random Effects From the Generalized Linear Mixed Model by Daniel Kasper, Katrin Schulz-Heidorf and Knut Schwippert in Sociological Methods & Research

Footnotes

Authors’ Note

All authors made substantial contributions to the conception, the design, and the interpretation of the results of the study, D.K. and K.S. read and approved the manuscript. In addition, D.K. conducted the analysis and was responsible for the statistical explanations. D.K. and K.S.H. drafted the manuscript. In this article, data sets from the International Association for the Evaluation of Educational Achievement (IEA) study Trends in International Mathematics and Science Study (TIMSS) 2011 are analyzed. These data sets and a corresponding documentation of the data sets are freely available under the URL ().

Acknowledgments

The authors acknowledge the anonymous reviewers for the attention and expertise they generously shared to support the production of this article. We further thank Paula Wagemaker for pre-submission English editing support.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Daniel Kasper

Supplemental Material

Supplemental material for this article is available online.

Notes

References

Norton

E. C.

. 2003. “Interaction Terms in Logit and Probit Models.” Economics Letters 80:123–29.

Akaike

1974. “A New Look at the Statistical Model Identification.” I.E.E.E. Transactions on Automatic Control AC-19(6): 716–23.

Asparouhov

2006. “General Multi-level Modeling with Sampling Weights.” Communications in Statistics–Theory and Methods 35:439–60.

Boeck

P. D.

Bakker

Zwitser

Nivard

Hofman

Tuerlinckx

Partchev

. 2011. “The Estimation of Item Response Models with the lmer Function from the lme4 Package in R.” Journal of Statistical Software 39(12): 1–28. doi: 10.18637/jss.v039.i12.

Booth

J. G.

Hobert

J. P.

. 1999. “Maximizing Generalized Linear Mixed Model Likelihoods with an Automated Monte Carlo EM Algorithm.” Journal of the Royal Statistical Society: Series B. Methodological 61(1): 265–85. doi: 10.1111/1467-9868.00176.

Bourdieu

1986. “The Forms of Capital.” Pp. 241-58 in Handbook of Theory and Research for the Sociology of Education, edited by Richardson

. New York: Greenwood.

Bourdieu

1998. Practical Reason. On the Theory of Action. Stanford, CA: Stanford University Press.

Brame

Paternoster

Mazerolle

Piquero

. 1998. “Testing for the Equality of Maximum-likelihood Regression Coefficients between Two Independent Equations.” Journal of Quantitative Criminology 14(3): 245–61. doi: 10.1023/A:1023030312801.

Breslow

N. E.

Clayton

D. G.

. 1993. “Approximate Inference in Generalized Linear Mixed Models.” Journal of the American Statistical Association 88(421): 9–25. doi: 10.1080/01621459.1993.10594284.

10.

Breslow

N. E.

Lin

. 1995. “Bias Correction in Generalised Linear Mixed Models with a Single Component of Dispersion.” Biometrika 82(1): 81–91. doi: 10.1093/biomet/82.1.81.

11.

Bush

2015. “Sample Size Determination for Logistic Regression: A Simulation Study.” Communications in Statistics—Simulation and Computation 44(2): 360–73.

12.

Cai

T. T.

Xia

. 2014. “High-dimensional Sparse MANOVA.” Journal of Multivariate Analysis 131:174–96. doi: 10.1016/j.jmva.2014.07.002.

13.

Commenges

Jacqmin-Gadda

. 1997. “Generalized Score Test of Homogeneity based on Correlated Random Effects Models.” Journal of the Royal Statistical Society. Series B. Methodological 59(1): 157–71. doi: 10.1111/1467-9868.00061.

14.

Commenges

Letenneur

Jacqmin

Moreau

Dartigues

J.-F.

. 1994. “Test of Homogeneity of Binary Data with Explanatory Variables.” Biometrics 50(3):613–20. doi: 10.2307/2532776.

15.

Commenges

Olson

Wijsman

. 1994. “The Weighted Rank Pairwise Correlation Statistic for Linkage Analysis: Simulation Study and Application to the Alzheimer’s Disease.” Genetic Epidemiology 11:201–12.

16.

Cox

D. R.

1988. “Some Aspects of Conditional and Asymptotic Inference: A Review.” Sankhyā: The Indian Journal of Statistics, Series A 50 (3): 314–37.

17.

Daniels

M. J.

Zhao

Y. D.

. 2003. “Modelling the Random Effects Covariance Matrix in Longitudinal Data.” Statistics in Medicine 22:1631–47.

18.

Davidson

R. R.

Lever

W. E.

. 1970. “The Limiting Distribution of the Likelihood Ratio Statistic under a Class of Local Alternatives.” Sankhyā: The Indian Journal of Statistics, Series A 32(2):209–24.

19.

DeShon

R. P.

Alexander

R. A.

. 1994. “A Generalization of James’s Second-order Approximation to the Test for Regression Slope Equality.” Educational and Psychological Measurement 54(2):328–35. doi: 10.1177/0013164494054002007.

20.

DeShon

R. P.

Alexander

R. A.

. 1996. “Alternative Procedures for Testing Regression Slope Homogeneity When Group Error Variances are Unequal.” Psychological Methods 1(3):261–77. doi: 10.1037/1082-989X.1.3.261.

21.

Ene

Leighton

E. A.

Blue

G. L.

Bell

B. A.

. 2015. “Multilevel Models for Categorical Data Using SAS PROC GLIMMIX: The Basics.” Retrieved January 19, 2021 (https://support.sas.com/resources/papers/proceedings15/3430-2015.pdf).

22.

Engle

R. F.

1984. “Wald, Likelihood Ratio, and Lagrange Multiplier Tests in Econometrics.” Pp. 775-826 in Handbook of Econometrics. Volume II, Chap. 13, edited by Griliches

Intriligator

M. D.

. Amsterdam, the Netherlands: Elsevier Science.

23.

Feder

P. I.

1968. “On the Distribution of the Log Likelihood Ratio Test Statistic When the True Parameter Is “Near” the Boundaries of the Hypothesis Regions.” The Annals of Mathematical Statistics 39 (6): 2044–55.

24.

Feldman

H. A.

McKinlay

S. M.

. 1994. “Cohort Versus Cross-sectional Design in Large Field Trials: Precision, Sample Size, and a Unifying Model.” Statistics in Medicine 13:61–78.

25.

Foy

2013. TIMSS and PIRLS 2011 User Guide for the Fourth Grade Combined International Database. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College.

26.

Foy

O’Dwyer

L. M.

. 2013. “Technical Appendix B: School Effectiveness Models and Analyses.” Retrieved 19 January, 2021 (https://timssandpirls.bc.edu/timsspirls2011/downloads/TP11_Technical_AppendixB.pdf)

27.

Gamerman

1997. “Sampling from the Posterior Distribution in Generalized Linear Mixed Models.” Statistics and Computing 7(1):57–68. doi: 10.1023/A:1018509429360.

28.

Ganzeboom

H. B. G.

de Graaf

P. M.

Treiman

D. J.

de Leeuw

. 1992. “A Standard International Socio-economic Index of Occupational Status.” Social Science Research 21:1–56.

29.

Gentle

J. E.

2017. Matrix Algebra: Theory, Computations and Applications in Statistics. New York: Springer.

30.

Gujarati

1970a. “Use of Dummy Variables in Testing for Equality between Sets of Coefficients in Linear Regressions: A Generalization.” The American Statistician 24(5):18–22. doi: 10.1080/00031305.1970.10477220.

31.

Gujarati

1970b. “Use of Dummy Variables in Testing for Equality between Sets of Coefficients in Two Linear Regressions: A Note.” The American Statistician 24(1):50–52. doi: 10.1080/00031305.1970.10477181.

32.

Heagerty

P. J.

Zeger

S. L.

. 2000. “Marginalized Multilevel Models and Likelihood Inference.” Statistical Science 15:1–26.

33.

Hochweber

Hartig

. 2017. “Analyzing Organizational Growth in Repeated Cross-sectional Designs Using Multilevel Structural Equation Modeling.” Methodology 13:83–97.

34.

Hofmann

D. A.

1997. “An Overview of the Logic and Rationale of Hierarchical Linear Models.” Journal of Management 23:723–44.

35.

Hsu

Y.-S.

1991. “General Linear Hypotheses in a Two-stage Least Squares Estimation Model.” Economics Letters 36(3):275–79. doi: 10.1016/0165-1765(91)90032-G.

36.

Jehangir

Glas

van den Berg

. 2015. “Exploring the Relation between Socio-economic Status and Reading Achievement in PISA 2009 through an Intercepts-and-slopes-as-outcomes Paradigm.” International Journal of Educational Research 71:1–15.

37.

Karim

M. R.

Zeger

S. L.

. 1992. “Generalized Linear Models with Random Effects: Salamander Mating Revisited.” Biometrics 48 (2): 631–44. doi: 10.2307/2532317.

38.

Kasper

Schulz-Heidorf

Schwippert

. 2018. “%SURVEYHLM: A SAS Macro for Multilevel Analysis with Large-scale Educational Assessment Data.” Journal of Statistical Software. Retrieved (https://www.jstatsoft.org/index).

39.

Lavrijsen

2015. “New Empirical Evidence on the Effect of Educational Tracking on Social Inequalities in Reading Achievement.” European Educational Research Journal 14(3-4):206–21.

40.

Lazar

A. A.

Zerbe

G. O.

. 2011. “Solutions for Determining the Significance Region Using the Johnson-Neyman Type Procedure in Generalized Linear (Mixed) Models.” Journal of Educational and Behavioral Statistics 36(6):699–719.

41.

Liao

T. F.

2004. “Comparing Social Groups: Wald Statistics for Testing Equality among Multiple Logit Models.” International Journal of Comparative Sociology 45(1-2):3–16. doi: 10.1177/0020715204048308.

42.

Lin

1997. “Variance Component Testing in Generalised Linear Models with Random Effects.” Biometrika 84(2):309–26. doi: 10.1093/biomet/84.2.309.

43.

Lin

Breslow

N. E.

. 1996. “Bias Correction in Generalized Linear Mixed Models with Multiple Components of Dispersion.” Journal of the American Statistical Association 91(435):1007–1016. doi: 10.1080/01621459.1996.10476971.

44.

Littell

R. C.

Milliken

G. A.

Stroup

W. W.

Wolfinger

R. D.

Schabenberger

. 2006. “Appendix 1: Linear Mixed Model Theory.” Pp. 733-56 in SAS for Mixed Models, 2nd ed. Cary, NC: SAS Institute.

45.

Lui

K.-J.

Cumberland

W. G.

Chang

K.-C.

. 2014. “Notes on Testing Equality in Binary Data under a Three Period Crossover Design.” Computational Statistics and Data Analysis 80:89–98. doi:10.1016/j.csda.2014.06.015.

46.

Maity

2012. “A Powerful Test for Comparing Multiple Regression Functions.” Journal of Nonparametric Statistics 24(3):563–76. doi: 10.1080/10485252.2012.677842.

47.

Martin

M. O.

Foy

Mullis

I. V. S.

O’Dwyer

L. M.

. 2013. “Effective Schools in Reading, Mathematics, and Science at the Fourth Grade.” Pp. 109–78 in TIMSS and PIRLS 2011: Relationships among Reading, Mathematics, and Science Achievement at the Fourth Grade: Implications for Early Learning, edited by Martin

M. O.

Mullis

I. V. S.

. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.

48.

Martin

M. O.

Mullis

I. V. S.

, Eds. 2012. Methods and Procedures in TIMSS and PIRLS 2011. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.

49.

Martin

M. O.

Mullis

I. V. S.

. 2013. TIMSS and PIRLS 2011: Relationships among Reading, Mathematics, and Science Achievement at the Fourth Grade: Implications for Early Learning. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College.

50.

Martin

M. O.

Mullis

I. V. S.

Foy

Olson

J. F.

Erbeber

Preuschoff

Galia

. 2008. TIMSS 2007 International Science Report: Findings from IEA’s Trends in International Mathematics and Science Study at the Fourth and Eighth Grades. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College.

51.

Martin

M. O.

Mullis

I. V. S.

Foy

Stanco

G. M.

. 2012. TIMSS 2011 International Results in Science. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.

52.

McCullagh

Nelder

J. A.

. 1989. Generalized Linear Models. London, UK: Chapman & Hall.

53.

McCulloch

C. E.

Searle

S. R.

Neuhaus

J. M.

. 2008. Generalized, Linear, and Mixed Models. Hoboken, NJ: Wiley.

54.

Milliken

G. A.

Graybill

F. A.

. 1970. “Extensions of the General Linear Hypothesis Model.” Journal of the American Statistical Association 65(330):797–807. doi: 10.1080/01621459.1970.10481125.

55.

Moreno

Torres

Casella

. 2005. “Testing Equality of Regression Coefficients in Heteroscedastic Normal Regression Models.” Journal of Statistical Planning and Inference 131(1):117–34. doi: 10.1016/j.jspi.2003.12.016.

56.

Mullis

I. V. S.

Martin

M. O.

Foy

Arora

. 2012. TIMSS 2011 International Results in Mathematics. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.

57.

Mullis

I. V. S.

Martin

M. O.

Foy

Olson

J. F.

Preuschoff

Erbeber

Arora

Galia

. 2008. TIMSS 2007 International Mathematics Report: Findings from IEA’s Trends in International Mathematics and Science Study at the Fourth and Eighth Grades. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College.

58.

Mullis

I. V. S.

Martin

M. O.

Kennedy

A. M.

Foy

. 2007. PIRLS 2006 International Report. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College.

59.

Natarajan

Kass

R. E.

. 2000. “Reference Bayesian Methods for Generalized Linear Mixed Models.” Journal of the American Statistical Association 95(449):227–337. doi: 10.1080/01621459.2000.10473916.

60.

Nelder

Wedderburn

R. W. M.

. 1972. “Generalized Linear Models.” Journal of the Royal Statistical Society: Series A. Statistics in Society 135(3):370–84. doi: 10.2307/2344614.

61.

Neumeyer

Sperlich

. 2006. “Comparison of Separable Components in Different Samples.” Scandinavian Journal of Statistics 33(3):477–501. doi:10.1111/j.1467-9469.2006.00509.x.

62.

Olson

G. H.

1975. “Applications of the Multivariate General Linear Hypothesis in Educational Research and Evaluation.” Paper presented at the Annual Meeting of the American Educational Research Association, 30 March 1975-3 April 1975. Retrieved 19 January, 2021 (https://files.eric.ed.gov/fulltext/ED117118.pdf).

63.

Organization for Economic Cooperation and Development. 2014. PISA 2012 Results: What Students Know and Can Do—Student Performance in Mathematics, Reading and Science (Volume I, Rev. ed., February 2014) . Paris, France: PISA OECD Publishing.

64.

Park

Hannig

Kang

K.-H.

. 2014. “Nonparametric Comparison of Multiple Regression Curves in Scale-space.” Journal of Computational and Graphical Statistics 23(3):657–77. doi: 10.1080/10618600.2013.822816.

65.

Peers

H. W.

1971. “Likelihood Ratio and Associated Test Criteria.” Biometrika 58(3):577–87. doi: 10.1093/biomet/58.3.577.

66.

Pfeffermann

Skinner

C. J.

Holmes

D. J.

Goldstein

Rasbash

. 1998. “Weighting for Unequal Selection Probabilities in Multilevel Models.” Journal of the Royal Statistical Society B 60(1):23–40.

67.

Pinheiro

J. C.

Bates

D. M.

. 1995. “Approximations to the Log-likelihood Function in the Nonlinear Mixed-effects Model.” Journal of Computational and Graphical Statistics 4(1):12–35. doi: 10.1080/10618600.1995.10474663.

68.

Pokropek

Borgonovi

Jakubowski

. 2015. “Socio-economic Disparities in Academic Achievement: A Comparative Analysis of Mechanisms and Pathways.” Learning and Individual Differences 42:10–18.

69.

Powers

D. A.

Xie

. 2008. Statistical Methods for Categorical Data Analysis, 2nd ed. Bingley, UK: Emerald Group.

70.

Rabe-Hesketh

Skrondal

. 2006. “Multilevel Modelling of Complex Survey Data.” Journal of the Royal Statistical Society A 169 (4): 805–27.

71.

Rabe-Hesketh

Skrondal

. 2012. Multilevel and Longitudinal Modeling Using Stata: Volume II. Categorical Responses, Counts, and Survival. College Station, TX: Stata Press.

72.

Radhakrishnan

Robinson

D. R.

. 1996. “Testing the Equivalence of Multiple Regression Models with Additional Data.” International Journal of Mathematical Education in Science and Technology 27(3):387–95. doi: 10.1080/0020739960270309.

73.

Raudenbush

S. W.

Bryk

A. S.

. 2002. Hierarchical Linear Models. Applications and Data Analysis Methods. London, UK: Sage.

74.

Raudenbush

S. W.

Yang

M. L.

Yosef

. 2000. “Maximum Likelihood for Generalized Linear Models with Nested Random Effects via High-order, Multivariate Laplace Approximation.” Journal of Computational and Graphical Statistics 9(1):141–57.

75.

Rubin

D. B.

1987. Multiple Imputation for Nonresponse in Surveys. New York: John Wiley & Sons.

76.

Rutkowski

Gonzales

Joncas

von Davier

. 2010. “International Large-scale Assessment Data: Issues in Secondary Analysis and Reporting.” Educational Researcher 39(2):142–51. doi: 10.3102/0013189X10363170.

77.

SAS Institute Inc. 2015a. SAS/IML 14.1 User’s Guide. Retrieved 7 August, 2020 (http://support.sas.com/documentation/cdl/en/imlug/68150/PDF/default/imlug.pdf).

78.

SAS Institute Inc. 2015b. SAS/STAT 14.1 User’s Guide. Cary, NC. Retrieved 7 August, 2020 (https://support.sas.com/documentation/cdl/en/statug/68162/PDF/default/statug.pdf).

79.

Satorra

Saris

W. E.

. 1985. “Power of the Likelihood Ratio Test in Covariance Structure Analysis.” Psychometrika 50(1):83–90. doi: 10.1007/BF02294150.

80.

Schafer

J. L.

1997. Analysis of Incomplete Multivariate Data. New York: Chapman & Hall.

81.

Schwarz

1978. “Estimating the Dimension of a Model.” The Annals of Statistics 6(2):461–64.

82.

Searle

S. R.

1971. Linear Models. New York: Wiley.

83.

Self

S. G.

Liang

K.-Y.

. 1987. “Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions.” Journal of the American Statistical Association 82(398):605–10. doi: 10.1080/01621459.1987.10478472.

84.

Shieh

2005. “On Power and Sample Size Calculations for Wald Tests in Generalized Linear Models.” Journal of Statistical Planning and Inference 128(1):43–59. doi: 10.1016/j.jspi.2003.09.017.

85.

Shun

1997. “Another Look at the Salamander Mating Data: A Modified Laplace Approximation Approach.” Journal of the American Statistical Association 92(437):341–49. doi: 10.1080/01621459.1997.10473632.

86.

Shun

McCullagh

. 1995. “Laplace Approximation of High-dimensional Integrals.” Journal of the Royal Statistical Society: Series B. Methodological 57 (4): 749–60.

87.

Sirin

S. R.

2005. “Socioeconomic Status and Academic Achievement: A Meta-analytic Review of Research.” Review of Educational Research 75(3):417–53.

88.

Skrondal

Rabe-Hesketh

. 2004. Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models. Boca Raton, FL: Chapman & Hall/CRC.

89.

Skvarcius

Cromer

. 1971. “A Note on the Use of Categorical Vectors in Testing for Equality of Two Regression Equations.” The American Statistician 25(3):27–29. doi: 10.1080/00031305.1971.10478906.

90.

Smith

D. S.

Wendt

Kasper

. 2017. “Social Reproduction and Sex in German Primary Schools.” Compare: A Journal of Comparative and International Education 47(2):240–56.

91.

Stram

D. O.

Lee

J. W.

. 1994. “Variance Components Testing in the Longitudinal Mixed Effects Model.” Biometrics 50:1171–77.

92.

Stram

D. O.

Lee

J. W.

. 1995. “Correction to ‘Variance Components Testing in the Longitudinal Mixed Effects Model’.” Biometrics 51:1196.

93.

Stroud

T. W. F.

1972. “Fixed Alternatives and Wald’s Formulation of the Noncentral Asymptotic Behavior of the Likelihood Ratio Statistic.” The Annals of Mathematical Statistics 43 (2): 447–54.

94.

Stroud

T. W. F.

1974. “Comparing Regressions When Measurement Error Variances Are Known.” Psychometrika 39(1):53–68. doi: 10.1007/BF02291577.

95.

Stroup

W. W.

Kachman

S. D.

. 1994. “Generalized Linear Mixed Models—An Overview.” Pp. 82–98 in Applied Statistics in Agriculture: Proceedings of Conference on Applied Statistics in Agriculuture, Kansas State University, edited by Schwank

J. R.

. Manhattan, KS: New Prairie Press. Retrieved 19 January, 2021 (http://newprairiepress.org/agstatconference/1994/proceedings/7).

96.

Sutradhar

B. C.

Bartlett

R. F.

. 1993. “Monte Carlo Comparison of Wald’s, Likelihood Ratio and Rao’s Tests.” Journal of Statistical Computation and Simulation 46 (1-2):23–33. doi: 10.1080/00949659308811490.

97.

Tonggumnead

Thongteerparp

Chomtee

Leerawat

. 2010. “Testing for Comparison of Two Expectation Functions of Non-parametric Regression.” Scientia Magna 6 (4): 92–101.

98.

Tuerlinckx

Rijmen

Verbeke

De Boeck

. 2006. “Statistical Inference in Generalized Linear Mixed Models: A Review.” British Journal of Mathematical and Statistical Psychology 59(2):225–55.

99.

Verbeke

Molenberghs

. 2000. Linear Mixed Models for Longitudinal Data. Berlin, Germany: Springer-Verlag.

100.

Verbeke

Molenberghs

. 2003. “The Use of Score Tests for Inference on Variance Components.” Biometrics 59(2):254–62.

101.

Wald

1943. “Tests of Statistical Hypotheses Concerning Several Parameters When the Number of Observations is Large.” Transactions of the American Mathematical Society 54(3):426–82. doi: 10.1090/S0002-9947-1943-0012401-3.

102.

Walzebug

Kasper

. 2016. “Distributional Properties of the PIRLS-home Resource for Learning Scale and Observed Effects on Reading Achievement: Are Measures of Educational Inequalities by Latent Indices without Bias?” Assessment in Education: Principles, Policy & Practice 25:28–51.

103.

Weerahandi

1987. “Testing Regression Equality with Unequal Variances.” Econometrica 55(5):1211–15. doi:10.2307/1911268.

104.

Wendt

Kasper

Trendtel

. 2017. “Assuming Measurement Invariance of Background Indicators in International Comparative Educational Achievement Studies: A Challenge for the Interpretation of Achievement Differences.” Large-scale Assessments in Education 5(10):1–34.

105.

Werts

C. E.

Rock

D. A.

Linn

R. L.

Joreskog

K. G.

. 1976. “Comparison of Correlations, Variances, Covariances, and Regression Weights with or without Measurement Error.” Psychological Bulletin 83(6):1007–13. doi: 10.1037/0033-2909.83.6.1007.

106.

Wilks

S. S.

1938. “The Large-sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses.” The Annals of Mathematical Statistics 9(1):60–62.

107.

Wolfinger

O’Connell

. 1993. “Generalized Linear Mixed Models: A Pseudo-likelihood Approach.” Journal of Statistical Computation and Simulation 48(3-4):233–43. doi: 10.1080/00949659308811554.

108.

Woltman

Feldstain

MacKay

J. C.

Rocchi

. 2012. “An Introduction to Hierarchical Linear Modeling.” Tutorials in Quantitative Methods for Psychology 8(1):52–69.

109.

Zeger

S. L.

Karim

M. R.

. 1991. “Generalized Linear Models with Random Effects: A Gibbs Sampling Approach.” Journal of the American Statistical Association 86(413):79–86. doi: 10.1080/01621459.1991.10475006.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

22.28 MB

0.02 MB

0.00 MB

0.07 MB

0.06 MB

0.01 MB

0.00 MB

K	J	n	${\bar{e}}_{d,..}$	${\hat{σ}}_{d,..}^{2}$	$min (\| {\hat{σ}}_{d, e g}^{2} \|)$	$max (\| {\hat{σ}}_{d, e g}^{2} \|)$
10	1	6	.01	.00	.00	.01
		10	.00	.00	.00	.01
		20	.00	.00	.00	.01
		30	.00	.00	.00	.00
	2	6	.00	.00	.00	.01
		10	.00	.00	.00	.00
		20	.00	.00	.00	.01
		30	.00	.00	.00	.00
	3	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
20	1	6	.00	.00	.00	.01
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	2	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	3	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
40	1	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	2	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	3	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
80	1	6	.00	.00	.00	.01
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	2	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	3	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
100	1	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	2	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	3	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00

K	J	n	${\bar{e}}_{d,..}$	${\hat{σ}}_{d,..}^{2}$	$min (\| {\hat{σ}}_{d, e g}^{2} \|)$	$max (\| {\hat{σ}}_{d, e g}^{2} \|)$
10	1	6	.01	.00	.00	.01
		10	.00	.00	.00	.01
		20	.00	.00	.00	.01
		30	.00	.00	.00	.00
	2	6	.00	.00	.00	.01
		10	.00	.00	.00	.00
		20	.00	.00	.00	.01
		30	.00	.00	.00	.00
	3	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
20	1	6	.00	.00	.00	.01
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	2	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	3	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
40	1	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	2	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	3	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
80	1	6	.00	.00	.00	.01
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	2	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	3	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
100	1	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	2	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	3	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00

Multiple Group Comparisons of the Fixed and Random Effects From the Generalized Linear Mixed Model

Abstract

Keywords

The GLMM

Inference About the Fixed and Random Effects of the GLMM

Single Population

Multiple Populations

Power Analyses

Design of the Power Study

Evaluation Criteria

Results

Evaluation criteria

Power of WG and F

Discussion

Example of Application

Introduction

Data and Variables

Prediction Model

Weighting

Outcome

Missing Values and Software

Results

Discussion

General Discussion and Conclusions

Supplemental Material

Supplemental Material, sj-pdf-1-smr-10.1177_0049124120986182 - Multiple Group Comparisons of the Fixed and Random Effects From the Generalized Linear Mixed Model

Supplemental Material

Supplemental Material, sj-pdf-2-smr-10.1177_0049124120986182 - Multiple Group Comparisons of the Fixed and Random Effects From the Generalized Linear Mixed Model

Supplemental Material

Supplemental Material, sj-zip-1-smr-10.1177_0049124120986182 - Multiple Group Comparisons of the Fixed and Random Effects From the Generalized Linear Mixed Model

Supplemental Material

Supplemental Material, sj-zip-2-smr-10.1177_0049124120986182 - Multiple Group Comparisons of the Fixed and Random Effects From the Generalized Linear Mixed Model

Supplemental Material

Supplemental Material, sj-tex-1-smr-10.1177_0049124120986182 - Multiple Group Comparisons of the Fixed and Random Effects From the Generalized Linear Mixed Model

Supplemental Material

Supplemental Material, sj-tex-2-smr-10.1177_0049124120986182 - Multiple Group Comparisons of the Fixed and Random Effects From the Generalized Linear Mixed Model

Footnotes

Authors’ Note

Acknowledgments

Declaration of Conflicting Interests

Funding

ORCID iD

Supplemental Material

Notes

References

Supplementary Material

Power of W_G and F

K	J	n	${\bar{e}}_{d,..}$	${\hat{σ}}_{d,..}^{2}$	$min (\| {\hat{σ}}_{d, e g}^{2} \|)$	$max (\| {\hat{σ}}_{d, e g}^{2} \|)$
10	1	6	.01	.00	.00	.01
		10	.00	.00	.00	.01
		20	.00	.00	.00	.01
		30	.00	.00	.00	.00
	2	6	.00	.00	.00	.01
		10	.00	.00	.00	.00
		20	.00	.00	.00	.01
		30	.00	.00	.00	.00
	3	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
20	1	6	.00	.00	.00	.01
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	2	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	3	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
40	1	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	2	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	3	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
80	1	6	.00	.00	.00	.01
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	2	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	3	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
100	1	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	2	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00
	3	6	.00	.00	.00	.00
		10	.00	.00	.00	.00
		20	.00	.00	.00	.00
		30	.00	.00	.00	.00