Sage Journals: Discover world-class research

Abstract

Three-level multisite individual randomized trials (MIRTs), in which individuals are nested within teachers and schools and randomly assigned to treatment or control conditions, provide a robust framework for assessing overall intervention effects and moderation at multiple levels. This study develops a statistical framework for designing three-level MIRTs to evaluate moderated treatment effects and examines the impact of ignoring one level of nesting on Type I error rates and statistical power. We illustrate the framework with an example of an online tutoring program and derive formulas for statistical power and the minimum detectable effect size difference. These formulas are validated through Monte Carlo simulations, which also demonstrate the risks of ignoring nesting. Finally, we introduce a software tool to facilitate power analysis for moderation in three-level MIRTs and summarize key findings.

Keywords

minimum detectable effect size difference moderator multisite individual randomized trials statistical power

Multisite randomized trials (MRTs), including multisite individual randomized trials (MIRTs) and multisite cluster randomized trials (MCRTs), are widely used in program evaluation, particularly in educational and social science research (e.g., Bai et al., 2025a, 2025b; Cox et al., 2024; Shen & Kelcey, 2022; Spybrook et al., 2020). These designs are especially valuable for assessing the effectiveness of interventions across diverse settings. In educational contexts, it is common to encounter a three-level nested data structure, where students (Level 1) are nested within teachers or classrooms (Level 2), which are further nested within schools or sites (Level 3). Three-level MIRTs, where individuals within classrooms within schools are randomly assigned to treatment or control conditions, provide a powerful framework for examining both overall treatment effects and heterogeneous effects across levels.

Treatment effects may vary across teachers and schools, and these heterogeneous effects may be explained by moderator variables. Moderators are variables that influence the strength or direction of a treatment effect (Baron & Kenny, 1986), allowing researchers to explore for whom or under what conditions an intervention is effective (Spybrook et al., 2020). In educational research, moderators can be at different levels (student-, teacher-, or school-level) and scales (binary or continuous). For instance, student characteristics (e.g., prior achievement and socioeconomic status), teacher attributes (e.g., experience), or school contexts (e.g., urban vs. rural) often serve as moderators (Raudenbush, 1988). In addition, these moderation effects can also exhibit random or nonrandom variation across higher levels, making it essential to properly account for the hierarchical structure of the data (e.g., Cox et al., 2025; Dong et al., 2021a).

Power analysis is a critical step in designing three-level MIRTs because it ensures that studies are adequately powered to detect meaningful effects. Including power calculations for moderators is essential because moderation effects are often central to theory and practice. For example, moderators help determine whether a tutoring program benefits students with lower baseline math scores more than higher scorers, or whether a professional development initiative is more effective in schools with more or less experienced teachers. By formally integrating power analyses for moderation effects into study planning, researchers can ensure that their MIRTs are appropriately designed to detect meaningful variation in treatment impacts across contextual and individual levels. Although statistical tools for power analysis of moderation effects in two-level MIRTs and three-level MCRTs have been developed (Dong et al., 2021a, 2021b, 2024a, 2024b; Snijers, 2001, 2005), such tools are notably absent for three-level MIRTs. This gap in the literature leaves researchers without the necessary methods to accurately plan and analyze studies involving three-level nested data.

Additionally, the consequences of ignoring one level of nesting have been examined in three-level cluster randomized trials (CRTs) and regression discontinuity designs (Bulus & Dong, 2022; Moerbeek, 2004; Opdenakker & Van Damme, 2000; Van den Noortgate et al., 2005; Zhu et al., 2011); however, their impact on moderation effects in three-level MIRTs has not been well studied. Prior research suggests that in some instances, a level can be safely ignored (e.g., Bloom et al., 2008). In other cases, ignoring a level of nesting can lead to underestimated standard errors, inflated Type I error rates, or overstated statistical power, which undermine the validity of results (Moerbeek, 2004; Van Landeghem et al., 2005). In addition, ignoring a level of nesting limits the ability to examine effect heterogeneity and/or moderator effects in MRTs. For example, removing the classroom level eliminates the possibility of assessing whether effects vary across classrooms, in addition to variation across schools. Clear guidance for addressing these issues in the design and analysis of moderation in three-level MIRTs remains limited.

By addressing these gaps, this study aims to enhance the rigor and applicability of three-level MIRTs in educational and social science research, enabling researchers to better understand and account for the complexities of hierarchical data structures. Specifically, the purpose of this study is to develop a statistical framework for designing three-level MIRTs to investigate a moderated treatment effect, explore the consequences of ignoring one level of nesting on statistical power for moderation effects, and provide researchers with a practical tool for designing such studies. In the following, we first provide an illustrative example of an online tutoring program for investigating moderation effects in three-level MIRTs. We then present the formulas to calculate the statistical power and the minimum detectable effect size difference (MDESD) for moderator effects in three-level MIRTs. We validate these formulas and investigate the consequences of ignoring one level of nesting using Monte Carlo simulation, offering insights into the potential risks of model misspecification. We demonstrate how to conduct power analyses for moderation effects in three-level MIRTs using a user-friendly software tool we developed. Finally, we conclude by summarizing key findings.

An Illustrative Example for Investigating Moderation Effects in Three-Level MIRTs

Our working example is based on a recent MIRT evaluating the impact of an online tutoring program on students’ math achievement (Gortazar et al., 2024). In the study, students were randomly assigned within each classroom, within each school, to either receive an online tutoring intervention or continue with business as usual. This design represents a three-level MIRT, where students (Level 1) are nested within teachers/classrooms (Level 2), which are in turn nested within schools (Level 3), with treatment occurring at the student level.

In addition to estimating the average treatment effect in this experimental evaluation, the researchers probed the degree to which the treatment effects were moderated by characteristics at the student, classroom, or school level, and whether these moderation effects varied across sites. Moderators can be either continuous (e.g., student's pretest scores, teacher's years of teaching, and school size) or binary (e.g., student's sex, teacher with master's degree or not, and urbanity).

Using this example, we present and explain formulas for calculating the statistical power and the MDESD for different moderated treatment effects. We also examine the consequences of ignoring one level of nesting by simplifying the model to two levels on the Type I error rate and statistical power of moderation effect estimates.

Statistical Framework

The statistical models used to investigate moderator effects in three-level MIRTs depend on three key factors: the level of the moderator, whether the moderation slope varies across Level 2 units (teachers/classrooms), and whether the moderation slope varies across Level 3 units (schools). Table 1 presents 12 model variations based on different combinations of these factors. A description is provided to help distinguish each model and clarify the model structure/assumptions necessary for interpretation. For example, Model MRT3-1RR-1 represents a three-level MIRT with treatment at Level 1 (MRT3-1), a Level 1 moderator (such as pretest scores), and random slopes for the interaction between treatment and the moderator across both classrooms and schools (RR-1). This model allows the moderated treatment effect to vary randomly at both classrooms and schools. By contrast: MRT3-1NR-1 allows random variation only across schools, MRT3-1RN-1 allows random variation only across classrooms, and MRT3-1NN-1 assumes no random variation, with the moderated treatment effect constant across classrooms and schools.

Table 1.
List of Design and Software Modules of Three-Level Multisite Individual Randomized Trials (MIRTs).

Model number Level of moderator Slope of treatment or moderation across Level 2 Slope of treatment or moderation across Level 3 Description of model structure/assumptions

MRT3-1RR-1 1 Random Random For the interaction/moderation term: random slopes across Levels 2 and 3

MRT3-1NR-1 Nonrandomly varying Random For the interaction/moderation term: nonrandomly varying slope across Level 2; random slope across Level 3

MRT3-1RN-1 Random Nonrandom/constant For the interaction/moderation term: random slope across Level 2; nonrandom/constant slope across Level 3

MRT3-1NN-1 Nonrandomly varying Nonrandom/constant For the interaction/moderation term: nonrandomly varying slope across Level 2; nonrandom/constant slope across Level 3

MRT3-1RR-2 2 Random Random Random slope for treatment across Level 2; random slope for the interaction/moderation across Level 3

MRT3-1NR-2 Nonrandomly varying Random Nonrandomly varying slope for treatment across Level 2; random slope for the interaction/moderation across Level 3

MRT3-1RN-2 Random Nonrandom/constant Random slope for treatment across Level 2; nonrandom/constant slope for the interaction/moderation across Level 3

MRT3-1NN-2 Nonrandomly varying Nonrandom/constant Nonrandomly varying slope for treatment across Level 2; nonrandom/constant slope for the interaction/moderation across Level 3

MRT3-1RR-3 3 Random Random For treatment: random slopes across Levels 2 and 3

MRT3-1NR-3 Nonrandomly varying Random For treatment: nonrandomly varying slope across Level 2; random slope across Level 3

MRT3-1RN-3 Random Nonrandomly varying For treatment: random slope across Level 2; nonrandomly varying slope across Level 3

MRT3-1NN-3 Nonrandomly varying Nonrandomly varying For treatment: nonrandomly varying slope across Level 2; nonrandomly varying slope across Level 3

For the power analysis, researchers can calculate either the statistical power or the MDESD, along with 95% confidence intervals (CIs). It is also important to distinguish between binary and continuous moderators, as their effect size metrics differ. For a binary moderator (e.g., sex), the focus is on the difference in treatment effect between the two subgroups (e.g., girls vs. boys). For a continuous moderator (e.g., pretest score), the focus is on the change in the treatment effect associated with a one-standard-deviation change in the moderator (i.e., the standardized regression coefficient).

For models with nonrandomly varying (or constant) slopes, the formulas for the statistical power and the MDESD can be derived by omitting the variance components that represent treatment or moderation effect variability across the corresponding level of nesting in the random slope models. Therefore, we focus on presenting and explaining the formulas for the statistical power and the MDESD (with CIs) for three primary models with random slopes across both the classroom and school levels. These models are MRT3-1RR-1, MRT3-1RR-2, and MRT3-1RR-3, which correspond to moderators at Level 1, Level 2, and Level 3, respectively. We begin by describing the case of a continuous moderator, and then extend the formulas to accommodate binary moderators and models with nonrandom slopes. To conserve space, derivations of the standard error (SE) and some MDESDs are provided in the Supplement. All 12 statistical models and their corresponding formulas for statistical power are summarized in Table 2. Additional formulas for MDESD and CIs are provided in Table S1 in the Supplement.

Table 2.
Summary of Standardized Noncentrality Parameters for Three-Level MIRTs.

Model number HLM Standardized noncentrality parameter (λ) Degree of freedom (v)

MRT3-1RR-1 L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} M_{i j k}^{(1)} + π_{3 j k} T_{i j k} M_{i j k}^{(1)} + π_{4 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 | T, M, X}^{2})$ .L2: $π_{0 j k} = β_{00 k} + r_{0 j k}$ $π_{1 j k} = β_{10 k} + r_{1 j k}$ $π_{2 j k} = β_{20 k}$ $π_{3 j k} = β_{30 k} + r_{3 j k}$ $π_{4 j k} = β_{40 k}$ $(\begin{matrix} r_{0 j k} \\ r_{1 j k} \\ r_{3 j k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{00}^{2} & τ_{01} & τ_{03} \\ τ_{11}^{2} & τ_{13} \\ τ_{33}^{2} \end{matrix})]$ L3: $β_{00 k} = γ_{000} + u_{00 k}$ $β_{10 k} = γ_{100} + u_{10 k}$ $β_{20 k} = γ_{200}$ $β_{30 k} = γ_{300} + u_{30 k}$ $β_{40 k} = γ_{400}$ $(\begin{matrix} u_{00 k} \\ u_{10 k} \\ u_{30 k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{0000}^{2} & τ_{0010} & τ_{0030} \\ τ_{1010}^{2} & τ_{1030} \\ τ_{3030}^{2} \end{matrix})]$ Binary moderator: ${\hat{δ}}_{1 b} / \sqrt{\frac{ω_{3 T M^{(1)}}^{2}}{K} + \frac{ω_{2 T M^{(1)}}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{1} (1 - Q_{1}) K J n}}$ Continuous moderator: ${\hat{δ}}_{1 c} / \sqrt{\frac{ω_{3 T M^{(1)}}^{2}}{K} + \frac{ω_{2 T M^{(1)}}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$ K − 1

MRT3-1NR-1 L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} M_{i j k}^{(1)} + π_{3 j k} T_{i j k} M_{i j k}^{(1)} + π_{4 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 | T, M, X}^{2})$ .L2: $π_{0 j k} = β_{00 k} + r_{0 j k}$ $π_{1 j k} = β_{10 k}$ $π_{2 j k} = β_{20 k}$ $π_{3 j k} = β_{30 k}$ $π_{4 j k} = β_{40 k}$ $r_{0 j k} \sim N (0, τ_{00}^{2})$ L3: $β_{00 k} = γ_{000} + u_{00 k}$ $β_{10 k} = γ_{100} + u_{10 k}$ $β_{20 k} = γ_{200}$ $β_{30 k} = γ_{300} + u_{30 k}$ $β_{40 k} = γ_{400}$ $(\begin{matrix} u_{00 k} \\ u_{10 k} \\ u_{30 k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{0000}^{2} & τ_{0010} & τ_{0030} \\ τ_{1010}^{2} & τ_{1030} \\ τ_{3030}^{2} \end{matrix})]$ Binary moderator: ${\hat{δ}}_{1 b} / \sqrt{\frac{ω_{3 T M^{(1)}}^{2}}{K} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{1} (1 - Q_{1}) K J n}}$ Continuous moderator: ${\hat{δ}}_{1 c} / \sqrt{\frac{ω_{3 T M^{(1)}}^{2}}{K} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$ K − 1

MRT3-1RN-1 L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} M_{i j k}^{(1)} + π_{3 j k} T_{i j k} M_{i j k}^{(1)} + π_{4 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 | T, M, X}^{2})$ .L2: $π_{0 j k} = β_{00 k} + r_{0 j k}$ $π_{1 j k} = β_{10 k} + r_{1 j k}$ $π_{2 j k} = β_{20 k}$ $π_{3 j k} = β_{30 k} + r_{3 j k}$ $π_{4 j k} = β_{40 k}$ $(\begin{matrix} r_{0 j k} \\ r_{1 j k} \\ r_{3 j k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{00}^{2} & τ_{01} & τ_{03} \\ τ_{11}^{2} & τ_{13} \\ τ_{33}^{2} \end{matrix})]$ L3: $β_{00 k} = γ_{000} + u_{00 k}$ $β_{10 k} = γ_{100}$ $β_{20 k} = γ_{200}$ $β_{30 k} = γ_{300}$ $β_{40 k} = γ_{400}$ $u_{00 k} \sim N (0, τ_{0000}^{2})$ Binary moderator: ${\hat{δ}}_{1 b} / \sqrt{\frac{ω_{2 T M^{(1)}}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{1} (1 - Q_{1}) K J n}}$ Continuous moderator: ${\hat{δ}}_{1 c} / \sqrt{\frac{ω_{2 T M^{(1)}}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$ KJ − 1

MRT3-1NN-1 L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} M_{i j k}^{(1)} + π_{3 j k} T_{i j k} M_{i j k}^{(1)} + π_{4 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 | T, M, X}^{2})$ .L2: $π_{0 j k} = β_{00 k} + r_{0 j k}$ $π_{1 j k} = β_{10 k}$ $π_{2 j k} = β_{20 k}$ $π_{3 j k} = β_{30 k}$ $π_{4 j k} = β_{40 k}$ $r_{0 j k} \sim N (0, τ_{00}^{2})$ L3: $β_{00 k} = γ_{000} + u_{00 k}$ $β_{10 k} = γ_{100}$ $β_{20 k} = γ_{200}$ $β_{30 k} = γ_{300}$ $β_{40 k} = γ_{400}$ $u_{00 k} \sim N (0, τ_{0000}^{2})$ Binary moderator: ${\hat{δ}}_{1 b} / \sqrt{\frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{1} (1 - Q_{1}) K J n}}$ Continuous moderator: ${\hat{δ}}_{1 c} / \sqrt{\frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$ KJ(n − 1) − 4

MRT3-1RR-2 L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 | T, X}^{2})$ L2: $π_{0 j k} = β_{00 k} + β_{01 k} M_{j k}^{(2)} + r_{0 j k}$ $π_{1 j k} = β_{10 k} + β_{11 k} M_{j k}^{(2)} + r_{1 j k}$ $π_{2 j k} = β_{20 k}$ $(\begin{matrix} r_{0 j k} \\ r_{1 j k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{00 | M}^{2} & τ_{01 | M} \\ τ_{11 | M}^{2} \end{matrix})]$ L3: $β_{00 k} = γ_{000} + u_{00 k}$ $β_{01 k} = γ_{010}$ $β_{10 k} = γ_{100} + u_{10 k}$ $β_{11 k} = γ_{110} + u_{11 k}$ $β_{20 k} = γ_{200}$ $(\begin{matrix} u_{00 k} \\ u_{10 k} \\ u_{11 k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{0000}^{2} & τ_{0010} & τ_{0011} \\ τ_{1010}^{2} & τ_{1011} \\ τ_{1111}^{2} \end{matrix})]$ Binary moderator: ${\hat{δ}}_{2 b} / \sqrt{\frac{ω_{3 T M^{(2)}}^{2}}{K} + \frac{ω_{2 T}^{2}}{Q_{2} (1 - Q_{2}) K J} - \frac{ω_{3 T M^{(2)}}^{2} + {\hat{δ}}_{2 b}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{2} (1 - Q_{2}) K J n}}$ Continuous moderator: ${\hat{δ}}_{2 c} / \sqrt{\frac{ω_{3 T M^{(2)}}^{2}}{K} + \frac{ω_{2 T}^{2} - ω_{3 T M^{(2)}}^{2} - {\hat{δ}}_{2 c}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$ K − 1

MRT3-1NR-2 L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 | T, X}^{2})$ L2: $π_{0 j k} = β_{00 k} + β_{01 k} M_{j k}^{(2)} + r_{0 j k}$ $π_{1 j k} = β_{10 k} + β_{11 k} M_{j k}^{(2)}$ $π_{2 j k} = β_{20 k}$ $r_{0 j k} \sim N (0, τ_{00 | M}^{2})$ L3: $β_{00 k} = γ_{000} + u_{00 k}$ $β_{01 k} = γ_{010}$ $β_{10 k} = γ_{100} + u_{10 k}$ $β_{11 k} = γ_{110} + u_{11 k}$ $β_{20 k} = γ_{200}$ $(\begin{matrix} u_{00 k} \\ u_{10 k} \\ u_{11 k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{0000}^{2} & τ_{0010} & τ_{0011} \\ τ_{1010}^{2} & τ_{1011} \\ τ_{1111}^{2} \end{matrix})]$ Binary moderator: ${\hat{δ}}_{2 b} / \sqrt{\frac{ω_{3 T M^{(2)}}^{2}}{K} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{2} (1 - Q_{2}) K J n}}$ Continuous moderator: ${\hat{δ}}_{2 c} / \sqrt{\frac{ω_{3 T M^{(2)}}^{2}}{K} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$ K − 1

MRT3-1RN-2 L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 | T, X}^{2})$ L2: $π_{0 j k} = β_{00 k} + β_{01 k} M_{j k}^{(2)} + r_{0 j k}$ $π_{1 j k} = β_{10 k} + β_{11 k} M_{j k}^{(2)} + r_{1 j k}$ $π_{2 j k} = β_{20 k}$ $(\begin{matrix} r_{0 j k} \\ r_{1 j k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{00 | M}^{2} & τ_{01 | M} \\ τ_{11 | M}^{2} \end{matrix})]$ L3: $β_{00 k} = γ_{000} + u_{00 k}$ $β_{01 k} = γ_{010}$ $β_{10 k} = γ_{100}$ $β_{11 k} = γ_{110}$ $β_{20 k} = γ_{200}$ $u_{00 k} \sim N (0, τ_{0000}^{2})$ Binary moderator: ${\hat{δ}}_{2 b} / \sqrt{\frac{ω_{2 T}^{2}}{Q_{2} (1 - Q_{2}) K J} - \frac{{\hat{δ}}_{2 b}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{2} (1 - Q_{2}) K J n}}$ Continuous moderator: ${\hat{δ}}_{2 c} / \sqrt{\frac{ω_{2 T}^{2} - {\hat{δ}}_{2 c}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$ KJ − 2

MRT3-1NN-2 L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 | T, X}^{2})$ L2: $π_{0 j k} = β_{00 k} + β_{01 k} M_{j k}^{(2)} + r_{0 j k}$ $π_{1 j k} = β_{10 k} + β_{11 k} M_{j k}^{(2)}$ $π_{2 j k} = β_{20 k}$ $r_{0 j k} \sim N (0, τ_{00 | M}^{2})$ L3: $β_{00 k} = γ_{000} + u_{00 k}$ $β_{01 k} = γ_{010}$ $β_{10 k} = γ_{100}$ $β_{11 k} = γ_{110}$ $β_{20 k} = γ_{200}$ $u_{00 k} \sim N (0, τ_{0000}^{2})$ Binary moderator: ${\hat{δ}}_{2 b} / \sqrt{\frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{2} (1 - Q_{2}) K J n}}$ Continuous moderator: ${\hat{δ}}_{2 c} / \sqrt{\frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$ KJ(n − 1) − 3

MRT3-1RR-3 L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 | T, X}^{2})$ L2: $π_{0 j k} = β_{00 k} + r_{0 j k}$ $π_{1 j k} = β_{10 k} + r_{1 j k}$ $π_{2 j k} = β_{20 k}$ $(\begin{matrix} r_{0 j k} \\ r_{1 j k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{00}^{2} & τ_{01} \\ τ_{10} & τ_{11}^{2} \end{matrix})]$ L3: $β_{00 k} = γ_{000} + γ_{001} M_{k}^{(3)} + u_{00 k}$ $β_{10 k} = γ_{100} + γ_{101} M_{k}^{(3)} + u_{10 k}$ $β_{20 k} = γ_{200}$ $(\begin{matrix} u_{00 k} \\ u_{10 k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{0000 | M}^{2} & τ_{0010 | M} \\ τ_{1000 | M} & τ_{1010 | M}^{2} \end{matrix})]$ Binary moderator: ${\hat{δ}}_{3 b} / \sqrt{\frac{ω_{3 T}^{2}}{K Q_{3} (1 - Q_{3})} - \frac{{\hat{δ}}_{3 b}^{2}}{K} + \frac{ω_{2 T}^{2}}{Q_{3} (1 - Q_{3}) K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{3} (1 - Q_{3}) K J n}}$ Continuous moderator: ${\hat{δ}}_{3 c} / \sqrt{\frac{ω_{3 T}^{2} - {\hat{δ}}_{3 c}^{2}}{K} + \frac{ω_{2 T}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$ K − 2

MRT3-1NR-3 L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 | T, X}^{2})$ L2: $π_{0 j k} = β_{00 k} + r_{0 j k}$ $π_{1 j k} = β_{10 k}$ $π_{2 j k} = β_{20 k}$ $r_{0 j k} \sim N (0, τ_{00}^{2})$ L3: $β_{00 k} = γ_{000} + γ_{001} M_{k}^{(3)} + u_{00 k}$ $β_{10 k} = γ_{100} + γ_{101} M_{k}^{(3)} + u_{10 k}$ $β_{20 k} = γ_{200}$ $(\begin{matrix} u_{00 k} \\ u_{10 k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{0000 | M}^{2} & τ_{0010 | M} \\ τ_{1000 | M} & τ_{1010 | M}^{2} \end{matrix})]$ Binary moderator: ${\hat{δ}}_{3 b} / \sqrt{\frac{ω_{3 T}^{2}}{K Q_{3} (1 - Q_{3})} - \frac{{\hat{δ}}_{3 b}^{2}}{K} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{3} (1 - Q_{3}) K J n}}$ Continuous moderator: ${\hat{δ}}_{3 c} / \sqrt{\frac{ω_{3 T}^{2} - {\hat{δ}}_{3 c}^{2}}{K} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$ K − 2

MRT3-1RN-3 L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 | T, X}^{2})$ L2: $π_{0 j k} = β_{00 k} + r_{0 j k}$ $π_{1 j k} = β_{10 k} + r_{1 j k}$ $π_{2 j k} = β_{20 k}$ $(\begin{matrix} r_{0 j k} \\ r_{1 j k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{00}^{2} & τ_{01} \\ τ_{10} & τ_{11}^{2} \end{matrix})]$ L3: $β_{00 k} = γ_{000} + γ_{001} M_{k}^{(3)} + u_{00 k}$ $β_{10 k} = γ_{100} + γ_{101} M_{k}^{(3)}$ $β_{20 k} = γ_{200}$ $u_{00 k} \sim N (0, τ_{0000 | M}^{2})$ Binary moderator: ${\hat{δ}}_{3 b} / \sqrt{\frac{ω_{2 T}^{2}}{Q_{3} (1 - Q_{3}) K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{3} (1 - Q_{3}) K J n}}$ Continuous moderator: ${\hat{δ}}_{3 c} / \sqrt{\frac{ω_{2 T}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$ KJ − 2

MRT3-1NN-3 L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 | T, X}^{2})$ L2: $π_{0 j k} = β_{00 k} + r_{0 j k}$ $π_{1 j k} = β_{10 k}$ $π_{2 j k} = β_{20 k}$ $r_{0 j k} \sim N (0, τ_{00}^{2})$ L3: $β_{00 k} = γ_{000} + γ_{001} M_{k}^{(3)} + u_{00 k}$ $β_{10 k} = γ_{100} + γ_{101} M_{k}^{(3)}$ $β_{20 k} = γ_{200}$ $u_{00 k} \sim N (0, τ_{0000 | M}^{2})$ Binary moderator: ${\hat{δ}}_{3 b} / \sqrt{\frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{3} (1 - Q_{3}) K J n}}$ Continuous moderator: ${\hat{δ}}_{3 c} / \sqrt{\frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$ KJ(n − 1) − 3

Note. The model number is the same as in Table 1. MIRTs = multisite individual randomized trials; HLM = hierarchical linear modeling; L1 = Level 1; L2 = Level 2; L3 = Level 3.

Model MRT3-1RR-1 for a Level 1 Moderator

To test for the Level 1 moderation in Model MRT3-1RR-1, we use three-level random slope hierarchical linear modeling (HLM) (Raudenbush & Bryk, 2002):
$\begin{aligned} Level 1 : Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} M_{i j k}^{(1)} + π_{3 j k} T_{i j k} M_{i j k}^{(1)} + π_{4 j k} X_{i j k} + e_{i j k} \\ e_{i j k} \sim N (0, σ_{1 | T, M, X}^{2}) . \end{aligned}$
(1)
$\begin{aligned} Level 2 : π_{0 j k} = β_{00 k} + r_{0 j k} \\ π_{1 j k} = β_{10 k} + r_{1 j k} \\ π_{2 j k} = β_{20 k} \\ π_{3 j k} = β_{30 k} + r_{3 j k} \\ π_{4 j k} = β_{40 k} \\ (\begin{matrix} r_{0 j k} \\ r_{1 j k} \\ r_{3 j k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{00}^{2} & τ_{01} & τ_{03} \\ τ_{10} & τ_{11}^{2} & τ_{13} \\ τ_{30} & τ_{31} & τ_{33}^{2} \end{matrix})] \end{aligned}$
(2)
$\begin{aligned} Level 3 : β_{00 k} = γ_{000} + u_{00 k} \\ β_{10 k} = γ_{100} + u_{10 k} \\ β_{20 k} = γ_{200} \\ β_{30 k} = γ_{300} + u_{30 k} \\ β_{40 k} = γ_{400} \\ (\begin{matrix} u_{00 k} \\ u_{10 k} \\ u_{30 k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{0000}^{2} & τ_{0010} & τ_{0030} \\ τ_{1000} & τ_{1010}^{2} & τ_{1030} \\ τ_{3000} & τ_{3010} & τ_{3030}^{2} \end{matrix})] . \end{aligned}$
(3)

The combined model is:
$Y_{i j k} = γ_{000} + u_{00 k} + r_{0 j k} + (γ_{100} + u_{10 k} + r_{1 j k}) T_{i j k} + γ_{200} M_{i j k}^{(1)} + (γ_{300} + u_{30 k} + r_{3 j k}) T_{i j k} M_{i j k}^{(1)} + γ_{400} X_{i j k} + e_{i j k} .$
(4)

$Y_{i j k}$ denotes the math achievement score (i.e., the outcome) for student i with teacher/classroom j within school k. The treatment variable, $T_{i j k}$ , is a binary indicator equal to 1 if the student received an online tutoring intervention and 0 otherwise. $X_{i j k}$ is a Level-1 covariate (e.g., socioeconomic status), and $M_{i j k}^{(1)}$ is a Level-1 moderator (e.g., pretest score and sex). In this model, the intercepts and the slopes of the treatment and moderation term are allowed to randomly vary across classrooms and schools. To balance power and error rates, remaining slopes are fixed across classrooms and schools (e.g., Matuschek et al., 2017). We adopt this approach is through our analyses because prior research has demonstrated that it tends to preserve power while without inflating type one error rates that are ostensibly associated with under-specified variance-covariance matrices of random effect. The primary parameter of interest in the Level-1 moderator analysis is $γ_{300}$ , which captures the moderation effect, that is, how the effect of online tutoring varies depending on the Level-1 moderator (e.g., pretest and sex).

Power Formulas

We test the moderation effect of pretest or sex ( $γ_{300}$ ) using a t-test based on the standard error formula provided in (S1) in the Supplement. Assuming the alternative hypothesis is true, the test statistic follows a non-central t-distribution, T’, and the unstandardized noncentrality parameter for the moderation effect is:
$λ_{| M^{(1)}} = {\hat{γ}}_{300} / S E ({\hat{γ}}_{300}) = {\hat{γ}}_{300} / \sqrt{\frac{τ_{3030}^{2}}{K} + \frac{τ_{33}^{2}}{K J} + \frac{σ_{1 | T, M, X}^{2}}{σ_{M^{(1)}}^{2} P (1 - P) K J n}} .$
(5)

The unstandardized noncentrality parameter is defined as the ratio of the Level-1 moderator effect to its standard error. The standard error incorporates multiple sources of variability: the variance of the moderation effect across schools ( $τ_{3030}^{2}$ ), the variance of the moderation effect across classrooms ( $τ_{33}^{2}$ ), and the Level 1 residual variance conditional on treatment, covariate, and moderator ( $σ_{1 | T, M, X}^{2}$ ). It also depends on the variance of $M_{i j k}^{(1)}$ , the sample sizes (K, J, n), and the proportion of the students assigned to the treatment group (P).

Continuous Moderator. We standardize the continuous moderator (pretest) by letting $σ_{M^{(1)}}^{2} = 1$ and we use the effect size for the moderation effect by defining $δ_{1 c} = γ_{300} / \sqrt{τ_{3}^{2} + τ_{2}^{2} + σ_{1}^{2}}$ . The standardized noncentrality parameter for the continuous Level 1 moderator is:
$λ_{| M^{(1)}} = {\hat{δ}}_{1 c} / \sqrt{\frac{ω_{3 T M^{(1)}}^{2}}{K} + \frac{ω_{2 T M^{(1)}}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}},$
(6)where $ρ_{3}$ is the unconditional intraclass correlation of math achievement at the school level, $ρ_{3} = τ_{3}^{2} / (τ_{3}^{2} + τ_{2}^{2} + σ_{1}^{2})$ . $ρ_{2}$ is the unconditional intraclass correlation at classroom level within school, $ρ_{2} = τ_{2}^{2} / (τ_{3}^{2} + τ_{2}^{2} + σ_{1}^{2})$ . $ω_{3 T M^{(1)}}^{2} = τ_{3030}^{2} / (τ_{3}^{2} + τ_{2}^{2} + σ_{1}^{2})$ indicates the standardized effect variability of the moderation ( $M_{i j k}^{(1)} \times T_{i j k}$ ) across schools. $ω_{2 T M^{(1)}}^{2} = τ_{33}^{2} / (τ_{3}^{2} + τ_{2}^{2} + σ_{1}^{2})$ indicates the standardized effect variability of the moderation ( $M_{i j k}^{(1)} \times T_{i j k}$ ) across classrooms. $R_{1}^{2}$ is the proportion of variance at Level 1 that is explained by the Level-1 covariate ( $X_{i j k}$ ), treatment ( $T_{i j k}$ ), and moderator ( $M_{i j k}^{(1)}$ ): $R_{1}^{2} = 1 - σ_{1 | T, M, X}^{2} / σ_{1}^{2}$ .

Binary Moderator. Let $δ_{1 b} = γ_{300} / \sqrt{τ_{3}^{2} + τ_{2}^{2} + σ_{1}^{2}}$ , the standardized noncentrality parameter for the binary Level 1 moderator (sex) based on (S3) is:
$λ_{| M^{(1)}} = {\hat{δ}}_{1 b} / \sqrt{\frac{ω_{3 T M^{(1)}}^{2}}{K} + \frac{ω_{2 T M^{(1)}}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{1} (1 - Q_{1}) K J n}},$
(7)where $Q_{1}$ is the proportion of the Level 1 binary moderator variable in one subgroup (e.g., proportion of girls).

The statistical power for a two-sided test with the degrees of freedom of $v = K - 1$ is:
$1 - β = 1 - P [T^{'} (K - 1, λ_{| M^{(1)}}) < t_{0}] + P [T^{'} (K - 1, λ_{| M^{(1)}}) \leq - t_{0}], where t_{0} = t_{1 - \frac{α}{2}, K - 1} .$

MDESD with CI

In addition to knowing the statistical power for a study to detect a desired effect size, it is useful to know the minimum effect size difference that a moderation study can detect with sufficient power (e.g., 80%) given sample sizes. This goal is to derive the formulas for calculating the MDESD with a CI.

The MDESD can be expressed as (Bloom, 1995; Dong et al., 2018; Murray, 1998):
$M D E S D (| \hat{δ} |) = M_{v} \times S E (\hat{γ}) / S D_{Y},$
(8)where $M_{v} = t_{α} + t_{1 - β}$ for one-tailed tests with v degrees of freedom, and $M_{v} = t_{α / 2} + t_{1 - β}$ for two-tailed tests. $S E (\hat{γ})$ is the standard error of the moderation effect estimate as in (S1) and (S3). $S D_{Y}$ is the standard deviation of the outcome measure (Y) in the control group, and is defined as the square root of the total unconditional variance, $S D_{Y} = \sqrt{τ_{3}^{2} + τ_{2}^{2} + σ_{1}^{2}}$ .

Hence, based on the standard error formula provided in (S1), the MDESD regarding the standardized coefficient for a continuous Level 1 moderator, where $σ_{M^{(1)}}^{2} = 1$ , is:
$M D E S D (| {\hat{δ}}_{1 c} |) = M_{v} \times \sqrt{\frac{τ_{3030}^{2}}{K} + \frac{τ_{33}^{2}}{K J} + \frac{σ_{1 | T, M, X}^{2}}{σ_{M^{(1)}}^{2} P (1 - P) K J n}} / \sqrt{τ_{3}^{2} + τ_{2}^{2} + σ_{1}^{2}} = M_{v} \sqrt{\frac{ω_{3 T M^{(1)}}^{2}}{K} + \frac{ω_{2 T M^{(1)}}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}},$
(9)where the degrees of freedom of $v = K - 1$ . The interpretations to the parameters in (9) are the same as those in (5) and (6) for the noncentrality parameters.

The 100(1-α)% CI for $M D E S D (| {\hat{δ}}_{1 c} |)$ is given by:
$(M_{v} \pm t_{α / 2}) \sqrt{\frac{ω_{3 T M^{(1)}}^{2}}{K} + \frac{ω_{2 T M^{(1)}}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}} .$
(10)

Similarly, the MDESD regarding the standardized mean difference for a binary moderator is:
$M D E S D (| {\hat{δ}}_{1 b} |) = M_{v} \sqrt{\frac{ω_{3 T M^{(1)}}^{2}}{K} + \frac{ω_{2 T M^{(1)}}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{1} (1 - Q_{1}) K J n}} .$
(11)

The 100(1-α)% CI for $M D E S D (| {\hat{δ}}_{1 b} |)$ is given by:
$(M_{v} \pm t_{α / 2}) \sqrt{\frac{ω_{3 T M^{(1)}}^{2}}{K} + \frac{ω_{2 T M^{(1)}}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{1} (1 - Q_{1}) K J n}} .$
(12)

Model MRT3-1RR-2 for a Level 2 Moderator

To test for Level 2 moderation, we use three-level random slope HLM (Raudenbush & Bryk, 2002):
$Level 1 : Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} X_{i j k} + e_{i j k}, e_{i j k} \sim N (0, σ_{1 | T, X}^{2}) .$
(13)
$\begin{aligned} Level 2 : π_{0 j k} = β_{00 k} + β_{01 k} M_{j k}^{(2)} + r_{0 j k} \\ π_{1 j k} = β_{10 k} + β_{11 k} M_{j k}^{(2)} + r_{1 j k} \\ π_{2 j k} = β_{20 k} \\ (\begin{matrix} r_{0 j k} \\ r_{1 j k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{00 | M}^{2} & τ_{01 | M} \\ τ_{10 | M} & τ_{11 | M}^{2} \end{matrix})] . \end{aligned}$
(14)
$\begin{aligned} Level 3 : β_{00 k} = γ_{000} + u_{00 k} \\ β_{01 k} = γ_{010} \\ β_{10 k} = γ_{100} + u_{10 k} \\ β_{11 k} = γ_{110} + u_{11 k} \\ β_{20 k} = γ_{200} \\ (\begin{matrix} u_{00 k} \\ u_{10 k} \\ u_{11 k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{0000}^{2} & τ_{0010} & τ_{0011} \\ τ_{1000} & τ_{1010}^{2} & τ_{1011} \\ τ_{1100} & τ_{1110} & τ_{1111}^{2} \end{matrix})] . \end{aligned}$
(15)

The combined model is:
$Y_{i j k} = γ_{000} + u_{00 k} + γ_{010} M_{j k}^{(2)} + r_{0 j k} + ((γ_{100} + u_{10 k}) + (γ_{110} + u_{11 k}) M_{j k}^{(2)} + r_{1 j k}) T_{i j k} + γ_{200} X_{i j k} + e_{i j k} .$
(16)

$M_{j k}^{(2)}$ is a Level 2 moderator (e.g., teacher's years of teaching, teachers with master's degree or not). The primary parameter of interest in the Level-2 moderator analysis is $γ_{110}$ , which captures the moderation effect, that is, how the effect of online tutoring varies depending on the Level-2 moderator.

Power Formulas

We test the moderation effect of teacher's years of teaching ( $γ_{110}$ ) using a t-test based on the standard error formula provided in (S5). Assuming the alternative hypothesis is true, the test statistic follows a non-central t-distribution, T’, and the unstandardized noncentrality parameter for the moderation effect is:
$λ_{| M^{(2)}} = {\hat{γ}}_{110} / \sqrt{\frac{τ_{1111}^{2}}{K} + \frac{τ_{11}^{2} - τ_{1111}^{2} σ_{M^{(2)}}^{2} - {\hat{γ}}_{110}^{2} σ_{M^{(2)}}^{2}}{σ_{M^{(2)}}^{2} K J} + \frac{σ_{1 | T, X}^{2}}{σ_{M^{(2)}}^{2} P (1 - P) K J n}} .$
(17)

The unstandardized noncentrality parameter is defined as the ratio of the Level 2 moderator effect to its standard error. The standard error incorporates multiple sources of variability: the variance of the moderation effect across schools ( $τ_{1111}^{2}$ ), the variance of the treatment effect across classrooms without conditional on the Level 2 moderator ( $τ_{11}^{2}$ ), and the Level 1 residual variance conditional on treatment and covariate ( $σ_{1 | T, X}^{2}$ ) $.$ It also depends on the variance of $M_{j k}^{(2)}$ , the sample sizes (K, J, n), and the proportion of the students assigned to the treatment group (P).

Continuous Moderator. We standardize the continuous moderator (teacher's years of teaching) by letting $σ_{M^{(2)}}^{2} = 1$ and we use the effect size for the moderation effect by defining $δ_{2 c} = γ_{110} / \sqrt{τ_{3}^{2} + τ_{2}^{2} + σ_{1}^{2}}$ . The standardized noncentrality parameter for the continuous Level 2 moderator is:
$λ_{| M^{(2)}} = {\hat{δ}}_{2 c} / \sqrt{\frac{ω_{3 T M^{(2)}}^{2}}{K} + \frac{ω_{2 T}^{2} - ω_{3 T M^{(2)}}^{2} - {\hat{δ}}_{2 c}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}},$
(18)where $ω_{3 T M^{(2)}}^{2} = τ_{1111}^{2} / (τ_{3}^{2} + τ_{2}^{2} + σ_{1}^{2})$ indicates the standardized effect variability of the moderation ( $T_{i j k} M_{j k}^{(2)}$ ) across schools. $ω_{2 T}^{2} = τ_{11}^{2} / (τ_{3}^{2} + τ_{2}^{2} + σ_{1}^{2})$ indicates the standardized treatment effect variability across classrooms.

Binary Moderators. Let $δ_{2 b} = γ_{110} / \sqrt{τ_{3}^{2} + τ_{2}^{2} + σ_{1}^{2}}$ , the standardized noncentrality parameter for the binary Level 2 moderator based on (S7) is:
$λ_{| M^{(2)}} = {\hat{δ}}_{2 b} / \sqrt{\frac{ω_{3 T M^{(2)}}^{2}}{K} + \frac{ω_{2 T}^{2}}{Q_{2} (1 - Q_{2}) K J} - \frac{ω_{3 T M^{(2)}}^{2} + {\hat{δ}}_{2 b}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{2} (1 - Q_{2}) K J n}},$
(19)where $Q_{2}$ is the proportion of the Level 2 binary moderator variable in one subgroup (e.g., proportion of teachers with master's degree).

The statistical power for a two-sided test with the degrees of freedom of $v = K - 1$ is:
$1 - β = 1 - P [T^{'} (K - 1, λ_{| M^{(2)}}) < t_{0}] + P [T^{'} (K - 1, λ_{| M^{(2)}}) \leq - t_{0}], where t_{0} = t_{1 - \frac{α}{2}, K - 1} .$

MDESD with CI

Based on the definition of MDESD in (8) and the standard error formulas in (S5) and (S7), we derive the MDESD for the standardized coefficient associated with a continuous Level 2 moderator, as well as the MDESD for the standardized mean difference associated with a binary moderator. The derivations and formulas for MDESD and their corresponding CIs are provided in the Supplement.

Model MRT3-1RR-3 for a Level 3 Moderator

To test Level 3 moderation, we use three-level random slope HLM (Raudenbush & Bryk, 2002):
$Level 1 : Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} X_{i j k} + e_{i j k}, e_{i j k} \sim N (0, σ_{1 | T, X}^{2}) .$
(20)
$\begin{aligned} Level 2 : π_{0 j k} = & β_{00 k} + r_{0 j k} \\ π_{1 j k} = β_{10 k} + r_{1 j k} \\ π_{2 j k} = β_{20 k} \\ (\begin{matrix} r_{0 j k} \\ r_{1 j k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{00}^{2} & τ_{01} \\ τ_{10} & τ_{11}^{2} \end{matrix})] . \end{aligned}$
(21)
$\begin{aligned} Level 3 : β_{00 k} = & γ_{000} + γ_{001} M_{k}^{(3)} + u_{00 k} \\ β_{10 k} = γ_{100} + γ_{101} M_{k}^{(3)} + u_{10 k} \\ β_{20 k} = γ_{200} \\ (\begin{matrix} u_{00 k} \\ u_{10 k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{0000 | M}^{2} & τ_{0010 | M} \\ τ_{1000 | M} & τ_{1010 | M}^{2} \end{matrix})] \end{aligned}$

The combined model is:
$Y_{i j k} = γ_{000} + γ_{001} M_{k}^{(3)} + u_{00 k} + r_{0 j k} + ((γ_{100} + γ_{101} M_{k}^{(3)} + u_{10 k}) + r_{1 j k}) T_{i j k} + γ_{200} X_{i j k} + e_{i j k}$
(22)

$M_{k}^{(3)}$ is a Level 3 moderator (e.g., school size and urbanity). The primary parameter of interest in the Level-3 moderator analysis is $γ_{101}$ , which captures the moderation effect, that is, how the effect of online tutoring varies depending on the Level-3 moderator.

Power Formulas

We test the moderation effect of school size or urbanity ( $γ_{101}$ ) using a t-test based on the standard error formula provided in (S13). Assuming the alternative hypothesis is true, the test statistic follows a non-central t-distribution, T’, and the unstandardized noncentrality parameter for the moderation effect is given by:
$λ_{| M^{(3)}} = {\hat{γ}}_{101} / \sqrt{\frac{τ_{1010}^{2} - {\hat{γ}}_{101}^{2} σ_{M^{(3)}}^{2}}{σ_{M^{(3)}}^{2} K} + \frac{τ_{11}^{2}}{σ_{M^{(3)}}^{2} K J} + \frac{σ_{1 | T, X}^{2}}{σ_{M^{(3)}}^{2} P (1 - P) K J n}} .$
(23)

The unstandardized noncentrality parameter is defined as the ratio of the Level 3 moderator effect to its standard error. The standard error incorporates multiple sources of variability: the variance of the treatment effect across schools without conditional on the moderator ( $τ_{1010}^{2}$ ), the variance of the treatment effect across classrooms ( $τ_{11}^{2}$ ), and the Level 1 residual variance conditional on treatment and covariate ( $σ_{1 | T, X}^{2}$ ). It also depends on the variance of $M_{k}^{(3)}$ , the sample sizes (K, J, n), and the proportion of the students assigned to the treatment group (P).

Continuous Moderator. We standardize the continuous moderator (school size) by letting $σ_{M^{(3)}}^{2} = 1$ and we use the effect size for the moderation effect by defining $δ_{3 c} = γ_{101} / \sqrt{τ_{3}^{2} + τ_{2}^{2} + σ_{1}^{2}}$ . The standardized noncentrality parameter for the continuous Level 3 moderator is:
$λ_{| M^{(3)}} = {\hat{δ}}_{3 c} / \sqrt{\frac{ω_{3 T}^{2} - {\hat{δ}}_{3 c}^{2}}{K} + \frac{ω_{2 T}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$
(24)where $ω_{3 T}^{2} = τ_{1010}^{2} / (τ_{3}^{2} + τ_{2}^{2} + σ_{1}^{2})$ indicates the standardized effect variability of the treatment effect across schools.

Binary Moderators. Let $δ_{3 b} = γ_{101} / \sqrt{τ_{3}^{2} + τ_{2}^{2} + σ_{1}^{2}}$ , the standardized noncentrality parameter for the binary Level-3 moderator is:
$λ_{| M^{(3)}} = {\hat{δ}}_{3 b} / \sqrt{\frac{ω_{3 T}^{2}}{Q_{3} (1 - Q_{3}) K} - \frac{{\hat{δ}}_{3 b}^{2}}{K} + \frac{ω_{2 T}^{2}}{Q_{3} (1 - Q_{3}) K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{3} (1 - Q_{3}) K J n}}$
(25)where $Q_{3}$ is the proportion of the Level 3 binary moderator variable in one subgroup (e.g., proportion of urban schools).

The statistical power for a two-sided test with the degrees of freedom of $v = K - 2$ is:
$1 - β = 1 - P [T^{'} (K - 2, λ_{| M^{(3)}}) < t_{0}] + P [T^{'} (K - 2, λ_{| M^{(3)}}) \leq - t_{0}], where t_{0} = t_{1 - \frac{α}{2}, K - 2} .$

MDESD with CI

Based on the definition of MDESD in (8) and the standard error formulas in (S13) and (S15), we derive the MDESD for the standardized coefficient associated with a continuous Level 3 moderator and the MDESD for the standardized mean difference associated with a binary moderator. The derivations and formulas for MDESD and their corresponding CIs are provided in the Supplement.

The Other Models

We presented and explained the formulas for statistical power and the MDESD for three primary models with random slopes across both the classroom and school levels. In summary, the statistical power and the MDESD are primarily influenced by three sources of variability: the standardized effect variabilities of the moderation or treatment across schools and classrooms, and the residual variance at the student level.

For the models with the nonrandomly varying (or constant) slope, the power and MDESD formulas can be obtained by dropping the corresponding parameters that represent the variability of the moderation or treatment across that level of site in the random slope models. For example, consider Model MRT3-1NR-1, in which the moderation effect does not randomly vary across classrooms but does randomly vary across schools (see Tables 1 and 2). In this case, the standardized effect variability of the moderation across classrooms is $ω_{2 T M^{(1)}}^{2}$ = 0, and the relevant part can be dropped from (S6) for Model MRT3-1RR-1. The resulting standardized noncentrality parameter for the continuous Level-1 moderator is:
$λ_{| M^{(1)}} = {\hat{δ}}_{1 c} / \sqrt{\frac{ω_{3 T M^{(1)}}^{2}}{K} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}} .$
(26)

Similarly, for Model MRT3-1RN-1, the moderation effect does randomly vary across classrooms but does not randomly vary across schools. In this case, the standardized effect variability of the moderation across schools is $ω_{3 T M^{(1)}}^{2}$ = 0 and the relevant part can be dropped from (S6) for Model MRT3-1RR-1. For Model MRT3-1NN-1, the moderation effect does not randomly vary across either classrooms or schools. In this case, both the standardized effect variabilities are zero, that is, $ω_{3 T M^{(1)}}^{2}$ = 0 and $ω_{2 T M^{(1)}}^{2}$ = 0, and the relevant parts can be dropped from (S6) as well.

This approach also applies to models with Level-2 and Level-3 moderators. As a result, the formulas for statistical power and MDESD are identical across Models MRT3-1NN-1, MRT3-1NN-2, and MRT3-1NN-3, in which all standardized effect variabilities of the moderation (or treatment) across schools and classrooms are zero and thus dropped from the corresponding formulas in the random slopes models. For instance, the standardized noncentrality parameters for the continuous Level-1, Level-2, and Level-3 moderators are all equal to the moderator effect size divided by the same factor, $\sqrt{\frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$ (see Table 2). This implies that the statistical power is the same across Models MRT3-1NN-1, MRT3-1NN-2, and MRT3-1NN-3, that is, the level at which the moderator is defined does not affect power under these conditions.

Monte Carlo Simulation

We conducted Monte Carlo simulations to validate the formulas for the standard error and power and to investigate how power/Type I error changes when one level of nesting (Levels 2 or 3) is ignored when the moderator is at Level 1.

The Procedures for the Monte Carlo Simulation

In the first step, we generated data using 12 HLMs in Table 2. For each model, we generated data for a continuous and a binary moderator separately, and with a zero and nonzero (0.20) moderator effect separately. Hence, there were 48 scenarios: 3 (levels of moderators) × 4 (random and nonrandom slope combinations) × 2 (scales of the moderator: continuous and binary) × 2 (nonzero and zero moderator effect). The parameters ( $ρ_{3}$ , $ρ_{2}$ , and $R_{1}^{2}$ ) can be estimated from observational studies without interventions, or from the intervention studies after controlling for the treatment effect. For the main simulations, we used consistent intraclass correlation coefficients (ICCs) of the outcome at classrooms and schools: $ρ_{2}$ = .10 and $ρ_{3}$ = .15 (e.g., Dong et al., 2016; Hedges & Hedberg, 2007, 2013; Shen et al., 2023). We also used consistent proportions of variances at Level 1 explained by covariates (e.g., pretest), $R_{1}^{2}$ = .5 (Bloom et al., 2007; Dong et al., 2016; Hedges & Hedberg, 2007, 2013; Kelcey & Phelps, 2013; Kelcey et al., 2016; Westine et al., 2013). The sample size for students per classroom (n) was 20, the sample size for classrooms per school (J) was 4, and the total sample size of schools (K) was 20. The proportion of students assigned to the online tutoring treatment group was fixed at P = .5. For binary moderators, we set $Q_{1}$ = $Q_{2}$ = $Q_{3}$ = .5. The parameters ( $ω_{3 T M^{(1)}}^{2}$ , $ω_{2 T M^{(1)}}^{2}$ , $ω_{3 T M^{(2)}}^{2}$ , $ω_{3 T}^{2}$ , and $ω_{2 T}^{2}$ ) must be estimated from the intervention studies. For the Level 1 moderator, the standardized effect variability of the moderation effect across schools is $ω_{3 T M^{(1)}}^{2}$ = 0.05, and across classrooms is $ω_{2 T M^{(1)}}^{2}$ = 0.05 (Dong et al., 2023). That is, the variances of the moderation effect of pretest scores across classrooms and schools are each equal to 0.05 times the total variance of the math achievement score. For the Level 2 moderator, the standardized effect variability of the moderation effect across schools is $ω_{3 T M^{(2)}}^{2}$ = 0.05, meaning that the variance of the moderation effect of pretest scores across schools is 0.05 times the total variance of the math achievement score. The standardized treatment effect variability across classrooms conditional on the moderator is $ω_{2 T | M}^{2}$ = 0.05, indicating that the residual variance of the online tutoring treatment effect across schools (after controlling for the moderator) is 0.05 times the variance of the math achievement score. For the Level 3 moderator, the standardized treatment effect variability across schools conditional on the moderator is $ω_{3 T | M}^{2}$ = 0.01, and the standardized treatment effect variability across classrooms is $ω_{2 T}^{2}$ = 0.05 (Olsen et al., 2017; Weiss et al., 2017). That is, the residual variances of the online tutoring treatment effect across schools (after controlling for the moderator) is 0.01 times the total variance of the math achievement score, while the variance of the online tutoring treatment effect across classrooms is 0.05 times the total variance.

Furthermore, we conducted additional simulations to validate our formulas for the standard error and statistical power for a continuous moderator under more varied assumptions: (1) smaller sample size (K = 10, J = 4, n = 20); (2) non-negligible classroom-level variance, consistent with prior research (Jacob et al., 2010; Mulolli et al., 2025; Nye et al., 2004; Xu & Nichols, 2010), with classroom-level ICC ( $ρ_{2}$ = .20) exceeding the school-level ICC ( $ρ_{3}$ = .10); and (3) greater effect heterogeneity. Specifically, for the Level-1 moderator, we set $ω_{3 T M^{(1)}}^{2}$ = 0.10 and $ω_{2 T M^{(1)}}^{2}$ = 0.10; for the Level-2 moderator, $ω_{3 T M^{(2)}}^{2}$ = 0.10 and $ω_{2 T | M}^{2}$ = 0.06; and for the Level-3 moderator, $ω_{3 T | M}^{2}$ = 0.06 and $ω_{2 T}^{2}$ = 0.10.) The proportion of Level-1 variance explained by covariates was held constant at $R_{1}^{2}$ = .50.

In the second step, we used SAS PROC MIXED to analyze the data sets. We first used three-level HLMs for data analysis. We estimated unconditional ICCs at school and class levels using unconditional HLMs. We estimated the moderator effect, the standardized effect variability of the moderation across schools and classrooms ( $ω_{3 T M^{(1)}}^{2}$ and $ω_{2 T M^{(1)}}^{2}$ ) for the Level 1 moderator, the standardized effect variability of the moderation across schools ( $ω_{3 T M^{(2)}}^{2}$ ) for the Level 2 moderator, and the proportions of variances at Level 1 explained by covariates ( $R_{1}^{2}$ ) using the same estimation models as the models for generating data. The standardized treatment effect variability across classrooms ( $ω_{2 T}^{2}$ ) and across schools ( $ω_{3 T}^{2}$ ) were estimated using the models that only included the treatment variable.

Then, we used two-level HLM models for the Level 1 moderator analysis by ignoring classroom and schools, separately. When classrooms were ignored, the two-level HLM comprised students nested within schools; when the schools were ignored, the two-level HLM comprised students nested within classrooms and the total number of classrooms were K*J. We estimated the moderator effect, SE, and other parameters for the two-level MIRTs.

In the third step, the moderator effect was standardized to the standardized mean difference for the binary moderators or the standardized coefficient for the continuous moderators. A p-value of the moderator effect less than .05 was coded as a rejection of the null hypothesis of no moderation.

Lastly, we replicated the first three steps 2,000 times and calculated the means of the moderator effect size and the other parameters; The standard deviation of 2,000 moderator effect sizes served as the standard error estimate based on the empirical distribution of the moderator effect; We also calculated the standard error based on our formulas, and constructed the 95% CI for each point estimate. We then calculated the absolute difference and relative difference between the standard errors based on our formulas and from the empirical distribution. Furthermore, we calculated the coverage rate of the 95% CI as the percentage of the 95% CI based on our formulas covering the true moderator effect. The proportion of times the null was rejected across the 2,000 replications estimated the Type I error rate when the moderation effect was set to 0 and the empirical power when the moderation effect was not set as 0. We compared the power and Type I error rate calculated from our derived formulas with those estimated from simulation.

Simulation Results

Accuracy of Formulas for Standard Error and Statistical Power/Type I Error Rate

Tables 3 –5 present the coverage rates of the 95% CI, statistical power, and Type I error rates from the main Monte Carlo simulations, alongside corresponding results from the analytical formulas for Level 1, Level 2, and Level 3 moderators, respectively. The results demonstrate strong agreement between the formula-based calculations and the empirical distributions obtained from the simulations. For example, across all scenarios, the absolute and relative differences between the standard errors derived from simulations and those calculated using our formulas range from −0.009 to 0.004 and from −12.0% to 4.6%, respectively. The coverage rates of the 95% CIs range from 0.92 to 0.97. The absolute differences between the Type I error rates from the formulas and those estimated from simulations range from −0.008 to 0.011. Similarly, the absolute differences in power estimates between the formulas and simulations range from 0.013 to 0.076.

Table 3.
Coverage of 95% CI and Power (Type I Error Rate) from Monte Carlo Simulation and the Formulas for Level-1 Moderator.

Moderator Power Type I error

RR NR RN NN RR NR RN NN

Continuous moderator

SE from simulation 0.063 0.059 0.041 0.031 0.066 0.060 0.042 0.032

SE from formula 0.064 0.059 0.040 0.031 0.064 0.059 0.040 0.031

Absolute difference in SE 0.001 −0.001 −0.001 0.000 −0.002 −0.001 −0.003 −0.001

Relative difference in SE (%) 2.3 −1.1 −2.7 −1.2 −3.3 −1.7 −6.2 −3.5

Coverage rate of 95% CI 0.966 0.960 0.950 0.950 0.964 0.961 0.940 0.949

Power/Type I error rate from simulation 0.854 0.905 0.999 1.000 0.053 0.052 0.059 0.046

Power/Type I error rate from formulas 0.845 0.900 0.999 1.000 0.050 0.050 0.050 0.050

Absolute difference −0.009 −0.005 0.000 0.000 0.002 −0.002 −0.008 0.004

Binary moderator

SE from simulation 0.084 0.080 0.067 0.062 0.085 0.081 0.066 0.063

SE from formula 0.084 0.080 0.067 0.061 0.084 0.080 0.067 0.061

Absolute difference in SE 0.000 0.000 0.000 0.000 −0.001 −0.001 0.001 −0.002

Relative difference in SE (%) 0.0 −0.1 0.1 −0.6 −1.0 −1.0 0.8 −2.7

Coverage rate of 95% CI 0.963 0.962 0.957 0.942 0.963 0.964 0.957 0.941

Power/Type I error rate from simulation 0.622 0.646 0.831 0.889 0.051 0.052 0.044 0.051

Power/Type I error rate from formulas 0.631 0.665 0.846 0.897 0.050 0.050 0.050 0.050

Absolute difference 0.009 0.020 0.016 0.009 0.000 −0.001 0.007 −0.001

Note. RR, NR, RN, and NN refer to the slopes of the interaction/moderation term across Levels 2 and 3 as in Table 1. R is for the random slope and N is for the nonrandom/constant slope. Results were based on 2,000 replications. The moderator effect size is 0.20. The intraclass correlation coefficients at Levels 2 and 3 are $ρ_{2}$ = .10 and $ρ_{3}$ = .15. The standardized effect variability of the moderation across Level-3 sites is $ω_{3 T M^{(1)}}^{2}$ = 0.05. The standardized effect variability of the moderation across Level-2 sites is $ω_{2 T M^{(1)}}^{2}$ = 0.05. The proportions of variances at Level 1 explained by covariates is $R_{1}^{2}$ = .5. Sample size per Level-2 unit (n) is 20, sample size per Level-3 site (J) is 4, and total sample size of Level-3 sites (K) is 20. The proportion of individuals assigned to the treatment group is P = .5. $Q_{1}$ = .5 is for binary moderators. The 95% CIs were constructed from the SE that was calculated from formulas at $α$ = 0.05. Coverage rates were calculated based on percent of times that the 95% CIs included the true moderator effect. CI = confidence interval; SE = standard error.

Table 4.
Coverage of 95% CI and Power (Type I Error Rate) from Monte Carlo Simulation and the Formulas for Level-2 Moderator.

Moderator Power Type I error

RR NR RN NN RR NR RN NN

Continuous moderator

SE from simulation 0.071 0.065 0.042 0.031 0.073 0.065 0.040 0.031

SE from formula 0.064 0.059 0.040 0.031 0.064 0.059 0.040 0.031

Absolute difference in SE −0.007 −0.007 −0.002 0.000 −0.009 −0.006 0.000 0.000

Relative difference in SE (%) −9.9 −10.1 −4.8 −0.7 −12.0 −9.8 −0.5 0.0

Coverage rate of 95% CI 0.943 0.934 0.943 0.944 0.937 0.939 0.950 0.947

Power/Type I error rate from simulation 0.769 0.834 0.997 1.000 0.056 0.056 0.050 0.048

Power/Type I error rate from formulas 0.844 0.894 0.999 1.000 0.050 0.050 0.050 0.050

Absolute difference 0.076 0.061 0.002 0.000 −0.005 −0.005 0.000 0.003

Binary moderator

SE from simulation 0.092 0.080 0.078 0.061 0.095 0.077 0.079 0.060

SE from formula 0.097 0.079 0.079 0.061 0.096 0.079 0.079 0.061

Absolute difference in SE 0.004 0.000 0.001 0.000 0.001 0.002 0.001 0.002

Relative difference in SE (%) 4.6 −0.5 1.4 0.4 1.4 2.8 0.7 3.0

Coverage rate of 95% CI 0.971 0.963 0.950 0.952 0.970 0.971 0.956 0.955

Power/Type I error rate from simulation 0.513 0.668 0.708 0.896 0.045 0.040 0.051 0.046

Power/Type I error rate from formulas 0.500 0.661 0.701 0.895 0.050 0.050 0.050 0.050

Absolute difference −0.013 −0.007 −0.007 −0.001 0.005 0.011 0.000 0.005

Note. RR, NR, RN, and NN refer to the slopes of the treatment variable across Level 2 and the interaction/moderation term across Level 3 as in Table 1. R is for the random slope and N is for the nonrandom/constant slope. Results were based on 2,000 replications. The moderator effect size is 0.20. The intraclass correlation coefficients at Levels 2 and 3 are $ρ_{2}$ = .10 and $ρ_{3}$ = .15. The standardized effect variability of the moderation across schools is $ω_{3 T M^{(2)}}^{2}$ = 0.05 and the standardized treatment effect variability across classrooms conditional the moderator is $ω_{2 T | M}^{2}$ = 0.05. The proportions of variances at Level 1 explained by covariates is $R_{1}^{2}$ = .5. Sample size per Level-2 unit (n) is 20, sample size per Level-3 site (J) is 4, and total sample size of Level-3 sites (K) is 20. The proportion of individuals assigned to the treatment group is P = .5. $Q_{2}$ = .5 is for binary moderators. The 95% CIs were constructed from the SE that was calculated from formulas at $α$ = 0.05. Coverage rates were calculated based on percent of times that the 95% CIs included the true moderator effect. CI = confidence interval; SE = standard error.

Table 5.
Coverage of 95% CI and Power (Type I Error Rate) from Monte Carlo Simulation and the Formulas for Level-3 Moderator.

Moderator Power Type I error

RR NR RN NN RR NR RN NN

Continuous moderator

SE from simulation 0.049 0.040 0.043 0.033 0.050 0.042 0.045 0.033

SE from formula 0.045 0.038 0.039 0.031 0.047 0.039 0.040 0.031

Absolute difference in SE −0.004 −0.003 −0.004 −0.002 −0.003 −0.002 −0.005 −0.003

Relative difference in SE (%) −8.1 −6.6 −8.9 −7.3 −5.8 −5.5 −11.9 −8.3

Coverage rate of 95% CI 0.949 0.950 0.929 0.932 0.961 0.953 0.919 0.931

Power/Type I error rate from simulation 0.954 0.989 0.987 0.998 0.043 0.051 0.055 0.047

Power/Type I error rate from formulas 0.986 0.999 0.999 1.000 0.050 0.050 0.050 0.050

Absolute difference 0.032 0.010 0.012 0.002 0.007 0.000 −0.005 0.003

Binary moderator

SE from simulation 0.091 0.077 0.082 0.060 0.092 0.077 0.078 0.061

SE from formula 0.093 0.077 0.079 0.061 0.094 0.079 0.079 0.061

Absolute difference in SE 0.002 0.000 −0.002 0.001 0.002 0.002 0.001 0.000

Relative difference in SE (%) 2.1 −0.5 −2.9 1.8 2.3 2.5 1.2 0.5

Coverage rate of 95% CI 0.968 0.962 0.946 0.954 0.966 0.967 0.960 0.954

Power/Type I error rate from simulation 0.538 0.696 0.694 0.905 0.046 0.054 0.046 0.047

Power/Type I error rate from formulas 0.533 0.682 0.700 0.903 0.050 0.050 0.050 0.050

Absolute difference −0.005 −0.014 0.006 −0.002 0.005 −0.004 0.005 0.004

Note. RR, NR, RN, and NN refer to the slopes of the treatment variable across Levels 2 and 3 as in Table 1. R is for the random slope and N is for the nonrandom slope. Results were based on 2,000 replications. The moderator effect size = 0.20. The intraclass correlation coefficients at Levels 2 and 3 are $ρ_{2}$ = .10 and $ρ_{3}$ = .15. The standardized treatment effect variability across schools conditional on the moderator is $ω_{3 T | M}^{2}$ = 0.01 and the standardized treatment effect variability across classrooms is $ω_{2 T}^{2}$ = 0.05. The proportions of variances at Level 1 explained by covariates is $R_{1}^{2}$ = .5. Sample size per Level-2 unit (n) is 20, sample size per Level-3 site (J) is 4, and total sample size of Level-3 sites (K) is 20. The proportion of individuals assigned to the treatment group is P = .5. $Q_{3}$ = .5 is for binary moderators. The 95% CIs were constructed from the standard error (SE) that was calculated from formulas at $α$ = 0.05. Coverage rates were calculated based on percent of times that the 95% CIs included the true moderator effect. CI = confidence interval; SE = standard error.

In addition, Table S2 reports results from supplementary simulation conditions designed to further evaluate the accuracy of our formulas. Specifically, we examined scenarios with classroom-level ICCs larger than school-level ICCs, greater effect variability, and smaller sample sizes. The findings under these alternative conditions are consistent with those from the main simulations.

Consequence for One Level of Nesting (Level 2 or 3) Ignored

When either classrooms or schools were ignored, two-level HLMs for a Level 1 moderator analysis still produced unbiased point estimates of the moderator effect. However, ignoring schools (using a two-level HLM with students nested in classrooms) led to underestimated standard errors (0.047), with an absolute bias of −0.016 and a relative bias of −25.4% compared to the standard error (0.063) from the simulation. In contrast, ignoring classrooms (using a two-level HLM with students nested in schools) produced standard errors nearly identical to those from the simulation and the three-level HLM.

As a result, the Type I error rate was inflated when schools were ignored (0.152 vs. 0.05), while ignoring classrooms had a negligible impact on Type I error rates (0.049 vs. 0.05). Furthermore, the statistical power calculated from formulas based on the two-level model ignoring schools was overestimated, with an absolute bias of 0.136 and relative bias of 15.9%. By comparison, power estimates from the two-level model ignoring classrooms closely matched both the simulation results and the three-level HLM model (Table 6).

Table 6.
SE and Power in Mis-Specified Models for the Analysis of the Moderated Treatment Effect of a Level-1 Continuous Moderator.

Analysis SE Bias Relative bias (%) Power Bias Relative bias (%)

Based on simulation 0.063 NA NA 0.854 NA NA

Based on formulas

True model (students nested in classes nested in schools) 0.064 0.001 2.3 0.845 −0.009 −1.0

Ignore Level 2 (students nested in schools) 0.066 0.003 5.5 0.823 −0.031 −3.6

Ignore Level 3 (students nested in classes) 0 . 047 −0.016 −25.4 0.989 0.136 15.9

Note. Results were based on 2,000 replications. The moderator effect size is 0.20. The intraclass correlation coefficients at Levels 2 and 3 are $ρ_{2}$ = .10 and $ρ_{3}$ = .15. The standardized effect variability of the moderation across Level-3 sites is $ω_{3 T M^{(1)}}^{2}$ = 0.05. The standardized effect variability of the moderation across Level-2 sites is $ω_{2 T M^{(1)}}^{2}$ = 0.05. The proportions of variances at Level 1 explained by covariates is $R_{1}^{2}$ = .5. Sample size per Level-2 unit (n) is 20, sample size per Level-3 site (J) is 4, and total sample size of Level-3 sites (K) is 20. The proportion of individuals assigned to the treatment group is P = .5. SE = standard error; NA = not available.

The simulation results also revealed that ignoring one level of nesting shifted variance components. When schools were ignored, the variance at the school level would be shifted to the new top level (classroom) (Moerbeek, 2004; Van den Noortgate et al., 2005). As a result, the classroom-level ICC rose to .24, close to the sum of Level 2 and Level 3 ICCs in the true model ( $ρ_{2}$ = .10 and $ρ_{3}$ = .15). $R_{1}^{2}$ remained close to .50, and the standardized effect variability of moderation across classrooms was approximately 0.10, aligning with the combined variability of Levels 2 and 3 (each 0.05).

Conversely, when classrooms were ignored, the unconditional ICC ( $ρ$ ) increased to .17 (compared to $ρ_{3}$ = .15 in the correctly specified model), $R_{1}^{2}$ decreased to .43 (vs. .50), and the standardized effect variability of moderation across schools increased to 0.06 (vs. 0.05). This shift occurs because variance at the classroom level is redistributed to the student and school levels (Moerbeek, 2004; Van den Noortgate et al., 2005; Opdenakker & Van Damme, 2000), resulting in an inflated school-level ICC that remains smaller than the combined ICCs of the school and classroom levels.

The findings under the supplementary simulation conditions are consistent with those from the main simulations (Tables S3 and S4).

Software Demonstration

We implemented the formulas in Microsoft Excel to develop PowerUp!-Moderator-MRTs, a tool for calculating statistical power and MDESD for moderation analysis in MRTs. Similar to power analysis procedures for moderation in two-level MRTs and three-level CRTs and MCRTs (Dong et al., 2018, 2021a, 2024a; Spybrook et al., 2016), researchers first identify the moderator of interest (binary or continuous) and its level (1, 2, or 3). They can then select the appropriate module for either MDESD or statistical power calculations based on their study design.

Within each module, users input key design parameters, such as ICCs. Guidance for selecting these parameters is available in the literature (e.g., Bloom et al., 2007; Dong et al., 2016, 2024b; Hedges & Hedberg, 2007, 2013; Jacob et al., 2010; Kelcey & Phelps, 2013; Kelcey et al., 2016; Mulolli et al., 2025; Shen et al., 2023; Spybrook et al., 2016; Westine et al., 2013). Each module accommodates both random and non-random slope models. For non-random slope models, the effect size variability is set to zero. Once the parameters are entered, the MDESD and its 95% CI, or the statistical power, is automatically calculated. Users can adjust sample sizes to achieve the desired MDESD or power.

For example, Figure 1 shows the module for calculating MDESD for a Level-1 binary moderator (e.g., sex) effect with random slopes across both schools and classrooms. We used typical assumptions for power analysis with a Type I error rate of 0.05 for a two-tailed test with a power of 0.80. In addition, we chose ICCs at classrooms and schools: $ρ_{2}$ = .10 and $ρ_{3}$ = .15 (e.g., Dong et al., 2016; Hedges & Hedberg, 2007, 2013; Shen et al., 2023). We used the proportion of variances at Level 1 explained by covariates (e.g., pretest), $R_{1}^{2}$ = .5 (Bloom et al., 2007; Dong et al., 2016; Hedges & Hedberg, 2007, 2013; Kelcey & Phelps, 2013; Kelcey et al., 2016; Westine et al., 2013). The standardized effect variability of the moderation effect across schools is $ω_{3 T M^{(1)}}^{2}$ = 0.05, and across classrooms is $ω_{2 T M^{(1)}}^{2}$ = 0.05 (Dong et al., 2023). We choose a balanced design with the proportion of students assigned to receive treatment is P = .5, and the proportion of the female (moderator) is $Q_{1}$ = .5. When we have 20 schools with 4 classrooms per school and 20 students per classroom, the MDESD is 0.245 with a 95% CI of [0.071, 0.418].

Figure 1.
MDESD Calculator for Three-Level MIRTs—Treatment at Level 1 and Binary Moderator at Level 1.

Furthermore, Figure S1 shows the module for calculating statistical power for a Level 1 continuous moderator (e.g., pretest), Figure S2 shows the module for calculating MDESD for a Level 2 continuous moderator (e.g., years of teaching), and Figure S3 shows the module for calculating statistical power for a Level 3 binary moderator (e.g., urbanity).

Conclusion

This study investigates formulas for calculating the MDESD and statistical power for moderation effects in three-level MIRTs. Monte Carlo simulations confirm the accuracy of these formulas and examine the consequences of ignoring one level of nesting for Level 1 moderation. A software tool was developed to implement these formulas in Microsoft Excel.

Based on our formulas (Table 2 and Table S1), we draw the following conclusions:

First, statistical power increases (and MDESD decreases) with larger sample sizes at each level (K, J, and n), as is typically expected in power analyses, because larger sample sizes enlarge the standardized noncentrality parameters. Consistent with findings from two-level MIRTs (Dong et al., 2021a, 2021b) and three-level MCRTs (Dong et al., 2024a, 2024b), sample sizes at higher levels are generally more influential in random slope models, while in nonrandom slope models, lower-level sample sizes can be equally important.

Second, statistical power decreases (and MDESD increases) as the standardized variability of treatment and moderation effects increases, because greater variability reduces the standardized noncentrality parameters. Power and MDESD reach their optimal values when these variabilities are zero (i.e., in nonrandom slope models), regardless of whether the moderator is at Level 1, 2, or 3. Hence, when conducting power analyses for moderation in three-level MIRTs, it is advisable to assume random effects in the absence of clear theoretical justification or prior evidence indicating fixed (non-random) variation in treatment or moderation effects (Dong et al., 2021a, 2021b).

Third, as in two-level MIRTs (Dong et al., 2021a, 2021b), statistical power increases with higher ICCs at Level-2 and Level-3 sites, because greater site-level variance reduces the Level-1 variance and thereby enlarges the standardized noncentrality parameters.

Fourth, power increases with higher proportions of Level-1 variance explained by covariates ( $R_{1}^{2}$ ), because a larger $R_{1}^{2}$ reduces the residual Level-1 variance and thereby increases the standardized noncentrality parameters. In contrast, power is unaffected by the variance explained by covariates at the Level-2 or Level-3 intercepts.

Fifth, statistical power increase (and MDESD reduces) when the allocation proportions to the treatment group (P) and moderator subgroups ( $Q_{1}$ , $Q_{2}$ , $Q_{3}$ ) are closer to .5. Power is maximized under a balanced design (P = $Q_{1}$ = $Q_{2}$ = $Q_{3}$ = .5), where the variances of the treatment and moderator variables are greatest and the standardized noncentrality parameter reaches its maximum value.

Sixth, because the treatment variable is assigned at Level 1 in three-level MIRTs, statistical power is generally higher than in three-level MCRTs and CRTs under comparable assumptions (e.g., sample sizes and design parameters). For example, detecting a meaningful moderation effect for a school-level moderator in three-level CRT often requires very large sample sizes (Dong et al., 2018; Spybrook et al., 2016), and similar limitations apply to three-level MCRTs (Dong et al., 2024a). In contrast, three-level MIRTs can achieve adequate statistical power for detecting school-level moderation effects with a more modest sample size. This is because a Level-3 moderator in three-level MIRTs functions analogously to a Level-1 moderator in three-level CRTs, conditions under which power is typically higher than for detecting main effect analysis in three-level CRTs (Dong et al., 2018; Spybrook et al., 2016).

Finally, based on our simulation results, we can conclude that ignoring Level 2 nesting has minimal impact on the standard error or power for Level 1 moderator effects. However, ignoring Level 3 nesting leads to underestimated standard errors, inflated Type I error rates, and overestimated power for Level 1 moderator effects. This result aligns with findings in the literature on main effect estimates when nesting structures are overlooked (Bulus & Dong, 2022; Moerbeek, 2004; Opdenakker & Van Damme, 2000; Van den Noortgate et al., 2005; Zhu et al., 2011).

This study focuses on Level 1 moderator effects when either Level 2 or Level 3 nesting is ignored in three-level MIRTs. Future research could extend this work by examining the impact of omitting a level of nesting on other types of moderation effects—for example, assessing Level 2 moderator effects after ignoring Level 3 nesting, or Level 3 moderator effects after ignoring Level 2 nesting.

Supplemental Material

sj-docx-1-aje-10.1177_10982140251394304 - Supplemental material for Statistical Power for Moderation in Three-Level Multisite Individual Randomized Trials and Consequences of Ignoring a Level of Nesting

Supplemental material, sj-docx-1-aje-10.1177_10982140251394304 for Statistical Power for Moderation in Three-Level Multisite Individual Randomized Trials and Consequences of Ignoring a Level of Nesting by Nianbo Dong, Ben Kelcey, Jessaca Spybrook, Kyle Nickodem and Ning Sui in American Journal of Evaluation

Model number	Level of moderator	Slope of treatment or moderation across Level 2	Slope of treatment or moderation across Level 3	Description of model structure/assumptions
MRT3-1RR-1	1	Random	Random	For the interaction/moderation term: random slopes across Levels 2 and 3
MRT3-1NR-1	Nonrandomly varying	Random	For the interaction/moderation term: nonrandomly varying slope across Level 2; random slope across Level 3
MRT3-1RN-1	Random	Nonrandom/constant	For the interaction/moderation term: random slope across Level 2; nonrandom/constant slope across Level 3
MRT3-1NN-1	Nonrandomly varying	Nonrandom/constant	For the interaction/moderation term: nonrandomly varying slope across Level 2; nonrandom/constant slope across Level 3
MRT3-1RR-2	2	Random	Random	Random slope for treatment across Level 2; random slope for the interaction/moderation across Level 3
MRT3-1NR-2	Nonrandomly varying	Random	Nonrandomly varying slope for treatment across Level 2; random slope for the interaction/moderation across Level 3
MRT3-1RN-2	Random	Nonrandom/constant	Random slope for treatment across Level 2; nonrandom/constant slope for the interaction/moderation across Level 3
MRT3-1NN-2	Nonrandomly varying	Nonrandom/constant	Nonrandomly varying slope for treatment across Level 2; nonrandom/constant slope for the interaction/moderation across Level 3
MRT3-1RR-3	3	Random	Random	For treatment: random slopes across Levels 2 and 3
MRT3-1NR-3	Nonrandomly varying	Random	For treatment: nonrandomly varying slope across Level 2; random slope across Level 3
MRT3-1RN-3	Random	Nonrandomly varying	For treatment: random slope across Level 2; nonrandomly varying slope across Level 3
MRT3-1NN-3	Nonrandomly varying	Nonrandomly varying	For treatment: nonrandomly varying slope across Level 2; nonrandomly varying slope across Level 3

Model number	HLM	Standardized noncentrality parameter (λ)	Degree of freedom (v)
MRT3-1RR-1	L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} M_{i j k}^{(1)} + π_{3 j k} T_{i j k} M_{i j k}^{(1)} + π_{4 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 \| T, M, X}^{2})$ .L2: $π_{0 j k} = β_{00 k} + r_{0 j k}$ $π_{1 j k} = β_{10 k} + r_{1 j k}$ $π_{2 j k} = β_{20 k}$ $π_{3 j k} = β_{30 k} + r_{3 j k}$ $π_{4 j k} = β_{40 k}$ $(\begin{matrix} r_{0 j k} \\ r_{1 j k} \\ r_{3 j k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{00}^{2} & τ_{01} & τ_{03} \\ τ_{11}^{2} & τ_{13} \\ τ_{33}^{2} \end{matrix})]$ L3: $β_{00 k} = γ_{000} + u_{00 k}$ $β_{10 k} = γ_{100} + u_{10 k}$ $β_{20 k} = γ_{200}$ $β_{30 k} = γ_{300} + u_{30 k}$ $β_{40 k} = γ_{400}$ $(\begin{matrix} u_{00 k} \\ u_{10 k} \\ u_{30 k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{0000}^{2} & τ_{0010} & τ_{0030} \\ τ_{1010}^{2} & τ_{1030} \\ τ_{3030}^{2} \end{matrix})]$	Binary moderator: ${\hat{δ}}_{1 b} / \sqrt{\frac{ω_{3 T M^{(1)}}^{2}}{K} + \frac{ω_{2 T M^{(1)}}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{1} (1 - Q_{1}) K J n}}$ Continuous moderator: ${\hat{δ}}_{1 c} / \sqrt{\frac{ω_{3 T M^{(1)}}^{2}}{K} + \frac{ω_{2 T M^{(1)}}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$	K − 1
MRT3-1NR-1	L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} M_{i j k}^{(1)} + π_{3 j k} T_{i j k} M_{i j k}^{(1)} + π_{4 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 \| T, M, X}^{2})$ .L2: $π_{0 j k} = β_{00 k} + r_{0 j k}$ $π_{1 j k} = β_{10 k}$ $π_{2 j k} = β_{20 k}$ $π_{3 j k} = β_{30 k}$ $π_{4 j k} = β_{40 k}$ $r_{0 j k} \sim N (0, τ_{00}^{2})$ L3: $β_{00 k} = γ_{000} + u_{00 k}$ $β_{10 k} = γ_{100} + u_{10 k}$ $β_{20 k} = γ_{200}$ $β_{30 k} = γ_{300} + u_{30 k}$ $β_{40 k} = γ_{400}$ $(\begin{matrix} u_{00 k} \\ u_{10 k} \\ u_{30 k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{0000}^{2} & τ_{0010} & τ_{0030} \\ τ_{1010}^{2} & τ_{1030} \\ τ_{3030}^{2} \end{matrix})]$	Binary moderator: ${\hat{δ}}_{1 b} / \sqrt{\frac{ω_{3 T M^{(1)}}^{2}}{K} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{1} (1 - Q_{1}) K J n}}$ Continuous moderator: ${\hat{δ}}_{1 c} / \sqrt{\frac{ω_{3 T M^{(1)}}^{2}}{K} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$	K − 1
MRT3-1RN-1	L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} M_{i j k}^{(1)} + π_{3 j k} T_{i j k} M_{i j k}^{(1)} + π_{4 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 \| T, M, X}^{2})$ .L2: $π_{0 j k} = β_{00 k} + r_{0 j k}$ $π_{1 j k} = β_{10 k} + r_{1 j k}$ $π_{2 j k} = β_{20 k}$ $π_{3 j k} = β_{30 k} + r_{3 j k}$ $π_{4 j k} = β_{40 k}$ $(\begin{matrix} r_{0 j k} \\ r_{1 j k} \\ r_{3 j k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{00}^{2} & τ_{01} & τ_{03} \\ τ_{11}^{2} & τ_{13} \\ τ_{33}^{2} \end{matrix})]$ L3: $β_{00 k} = γ_{000} + u_{00 k}$ $β_{10 k} = γ_{100}$ $β_{20 k} = γ_{200}$ $β_{30 k} = γ_{300}$ $β_{40 k} = γ_{400}$ $u_{00 k} \sim N (0, τ_{0000}^{2})$	Binary moderator: ${\hat{δ}}_{1 b} / \sqrt{\frac{ω_{2 T M^{(1)}}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{1} (1 - Q_{1}) K J n}}$ Continuous moderator: ${\hat{δ}}_{1 c} / \sqrt{\frac{ω_{2 T M^{(1)}}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$	KJ − 1
MRT3-1NN-1	L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} M_{i j k}^{(1)} + π_{3 j k} T_{i j k} M_{i j k}^{(1)} + π_{4 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 \| T, M, X}^{2})$ .L2: $π_{0 j k} = β_{00 k} + r_{0 j k}$ $π_{1 j k} = β_{10 k}$ $π_{2 j k} = β_{20 k}$ $π_{3 j k} = β_{30 k}$ $π_{4 j k} = β_{40 k}$ $r_{0 j k} \sim N (0, τ_{00}^{2})$ L3: $β_{00 k} = γ_{000} + u_{00 k}$ $β_{10 k} = γ_{100}$ $β_{20 k} = γ_{200}$ $β_{30 k} = γ_{300}$ $β_{40 k} = γ_{400}$ $u_{00 k} \sim N (0, τ_{0000}^{2})$	Binary moderator: ${\hat{δ}}_{1 b} / \sqrt{\frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{1} (1 - Q_{1}) K J n}}$ Continuous moderator: ${\hat{δ}}_{1 c} / \sqrt{\frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$	KJ(n − 1) − 4
MRT3-1RR-2	L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 \| T, X}^{2})$ L2: $π_{0 j k} = β_{00 k} + β_{01 k} M_{j k}^{(2)} + r_{0 j k}$ $π_{1 j k} = β_{10 k} + β_{11 k} M_{j k}^{(2)} + r_{1 j k}$ $π_{2 j k} = β_{20 k}$ $(\begin{matrix} r_{0 j k} \\ r_{1 j k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{00 \| M}^{2} & τ_{01 \| M} \\ τ_{11 \| M}^{2} \end{matrix})]$ L3: $β_{00 k} = γ_{000} + u_{00 k}$ $β_{01 k} = γ_{010}$ $β_{10 k} = γ_{100} + u_{10 k}$ $β_{11 k} = γ_{110} + u_{11 k}$ $β_{20 k} = γ_{200}$ $(\begin{matrix} u_{00 k} \\ u_{10 k} \\ u_{11 k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{0000}^{2} & τ_{0010} & τ_{0011} \\ τ_{1010}^{2} & τ_{1011} \\ τ_{1111}^{2} \end{matrix})]$	Binary moderator: ${\hat{δ}}_{2 b} / \sqrt{\frac{ω_{3 T M^{(2)}}^{2}}{K} + \frac{ω_{2 T}^{2}}{Q_{2} (1 - Q_{2}) K J} - \frac{ω_{3 T M^{(2)}}^{2} + {\hat{δ}}_{2 b}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{2} (1 - Q_{2}) K J n}}$ Continuous moderator: ${\hat{δ}}_{2 c} / \sqrt{\frac{ω_{3 T M^{(2)}}^{2}}{K} + \frac{ω_{2 T}^{2} - ω_{3 T M^{(2)}}^{2} - {\hat{δ}}_{2 c}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$	K − 1
MRT3-1NR-2	L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 \| T, X}^{2})$ L2: $π_{0 j k} = β_{00 k} + β_{01 k} M_{j k}^{(2)} + r_{0 j k}$ $π_{1 j k} = β_{10 k} + β_{11 k} M_{j k}^{(2)}$ $π_{2 j k} = β_{20 k}$ $r_{0 j k} \sim N (0, τ_{00 \| M}^{2})$ L3: $β_{00 k} = γ_{000} + u_{00 k}$ $β_{01 k} = γ_{010}$ $β_{10 k} = γ_{100} + u_{10 k}$ $β_{11 k} = γ_{110} + u_{11 k}$ $β_{20 k} = γ_{200}$ $(\begin{matrix} u_{00 k} \\ u_{10 k} \\ u_{11 k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{0000}^{2} & τ_{0010} & τ_{0011} \\ τ_{1010}^{2} & τ_{1011} \\ τ_{1111}^{2} \end{matrix})]$	Binary moderator: ${\hat{δ}}_{2 b} / \sqrt{\frac{ω_{3 T M^{(2)}}^{2}}{K} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{2} (1 - Q_{2}) K J n}}$ Continuous moderator: ${\hat{δ}}_{2 c} / \sqrt{\frac{ω_{3 T M^{(2)}}^{2}}{K} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$	K − 1
MRT3-1RN-2	L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 \| T, X}^{2})$ L2: $π_{0 j k} = β_{00 k} + β_{01 k} M_{j k}^{(2)} + r_{0 j k}$ $π_{1 j k} = β_{10 k} + β_{11 k} M_{j k}^{(2)} + r_{1 j k}$ $π_{2 j k} = β_{20 k}$ $(\begin{matrix} r_{0 j k} \\ r_{1 j k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{00 \| M}^{2} & τ_{01 \| M} \\ τ_{11 \| M}^{2} \end{matrix})]$ L3: $β_{00 k} = γ_{000} + u_{00 k}$ $β_{01 k} = γ_{010}$ $β_{10 k} = γ_{100}$ $β_{11 k} = γ_{110}$ $β_{20 k} = γ_{200}$ $u_{00 k} \sim N (0, τ_{0000}^{2})$	Binary moderator: ${\hat{δ}}_{2 b} / \sqrt{\frac{ω_{2 T}^{2}}{Q_{2} (1 - Q_{2}) K J} - \frac{{\hat{δ}}_{2 b}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{2} (1 - Q_{2}) K J n}}$ Continuous moderator: ${\hat{δ}}_{2 c} / \sqrt{\frac{ω_{2 T}^{2} - {\hat{δ}}_{2 c}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$	KJ − 2
MRT3-1NN-2	L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 \| T, X}^{2})$ L2: $π_{0 j k} = β_{00 k} + β_{01 k} M_{j k}^{(2)} + r_{0 j k}$ $π_{1 j k} = β_{10 k} + β_{11 k} M_{j k}^{(2)}$ $π_{2 j k} = β_{20 k}$ $r_{0 j k} \sim N (0, τ_{00 \| M}^{2})$ L3: $β_{00 k} = γ_{000} + u_{00 k}$ $β_{01 k} = γ_{010}$ $β_{10 k} = γ_{100}$ $β_{11 k} = γ_{110}$ $β_{20 k} = γ_{200}$ $u_{00 k} \sim N (0, τ_{0000}^{2})$	Binary moderator: ${\hat{δ}}_{2 b} / \sqrt{\frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{2} (1 - Q_{2}) K J n}}$ Continuous moderator: ${\hat{δ}}_{2 c} / \sqrt{\frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$	KJ(n − 1) − 3
MRT3-1RR-3	L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 \| T, X}^{2})$ L2: $π_{0 j k} = β_{00 k} + r_{0 j k}$ $π_{1 j k} = β_{10 k} + r_{1 j k}$ $π_{2 j k} = β_{20 k}$ $(\begin{matrix} r_{0 j k} \\ r_{1 j k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{00}^{2} & τ_{01} \\ τ_{10} & τ_{11}^{2} \end{matrix})]$ L3: $β_{00 k} = γ_{000} + γ_{001} M_{k}^{(3)} + u_{00 k}$ $β_{10 k} = γ_{100} + γ_{101} M_{k}^{(3)} + u_{10 k}$ $β_{20 k} = γ_{200}$ $(\begin{matrix} u_{00 k} \\ u_{10 k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{0000 \| M}^{2} & τ_{0010 \| M} \\ τ_{1000 \| M} & τ_{1010 \| M}^{2} \end{matrix})]$	Binary moderator: ${\hat{δ}}_{3 b} / \sqrt{\frac{ω_{3 T}^{2}}{K Q_{3} (1 - Q_{3})} - \frac{{\hat{δ}}_{3 b}^{2}}{K} + \frac{ω_{2 T}^{2}}{Q_{3} (1 - Q_{3}) K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{3} (1 - Q_{3}) K J n}}$ Continuous moderator: ${\hat{δ}}_{3 c} / \sqrt{\frac{ω_{3 T}^{2} - {\hat{δ}}_{3 c}^{2}}{K} + \frac{ω_{2 T}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$	K − 2
MRT3-1NR-3	L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 \| T, X}^{2})$ L2: $π_{0 j k} = β_{00 k} + r_{0 j k}$ $π_{1 j k} = β_{10 k}$ $π_{2 j k} = β_{20 k}$ $r_{0 j k} \sim N (0, τ_{00}^{2})$ L3: $β_{00 k} = γ_{000} + γ_{001} M_{k}^{(3)} + u_{00 k}$ $β_{10 k} = γ_{100} + γ_{101} M_{k}^{(3)} + u_{10 k}$ $β_{20 k} = γ_{200}$ $(\begin{matrix} u_{00 k} \\ u_{10 k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{0000 \| M}^{2} & τ_{0010 \| M} \\ τ_{1000 \| M} & τ_{1010 \| M}^{2} \end{matrix})]$	Binary moderator: ${\hat{δ}}_{3 b} / \sqrt{\frac{ω_{3 T}^{2}}{K Q_{3} (1 - Q_{3})} - \frac{{\hat{δ}}_{3 b}^{2}}{K} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{3} (1 - Q_{3}) K J n}}$ Continuous moderator: ${\hat{δ}}_{3 c} / \sqrt{\frac{ω_{3 T}^{2} - {\hat{δ}}_{3 c}^{2}}{K} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$	K − 2
MRT3-1RN-3	L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 \| T, X}^{2})$ L2: $π_{0 j k} = β_{00 k} + r_{0 j k}$ $π_{1 j k} = β_{10 k} + r_{1 j k}$ $π_{2 j k} = β_{20 k}$ $(\begin{matrix} r_{0 j k} \\ r_{1 j k} \end{matrix}) \sim N [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} τ_{00}^{2} & τ_{01} \\ τ_{10} & τ_{11}^{2} \end{matrix})]$ L3: $β_{00 k} = γ_{000} + γ_{001} M_{k}^{(3)} + u_{00 k}$ $β_{10 k} = γ_{100} + γ_{101} M_{k}^{(3)}$ $β_{20 k} = γ_{200}$ $u_{00 k} \sim N (0, τ_{0000 \| M}^{2})$	Binary moderator: ${\hat{δ}}_{3 b} / \sqrt{\frac{ω_{2 T}^{2}}{Q_{3} (1 - Q_{3}) K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{3} (1 - Q_{3}) K J n}}$ Continuous moderator: ${\hat{δ}}_{3 c} / \sqrt{\frac{ω_{2 T}^{2}}{K J} + \frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$	KJ − 2
MRT3-1NN-3	L1: $Y_{i j k} = π_{0 j k} + π_{1 j k} T_{i j k} + π_{2 j k} X_{i j k} + e_{i j k}$ , $e_{i j k} \sim N (0, σ_{1 \| T, X}^{2})$ L2: $π_{0 j k} = β_{00 k} + r_{0 j k}$ $π_{1 j k} = β_{10 k}$ $π_{2 j k} = β_{20 k}$ $r_{0 j k} \sim N (0, τ_{00}^{2})$ L3: $β_{00 k} = γ_{000} + γ_{001} M_{k}^{(3)} + u_{00 k}$ $β_{10 k} = γ_{100} + γ_{101} M_{k}^{(3)}$ $β_{20 k} = γ_{200}$ $u_{00 k} \sim N (0, τ_{0000 \| M}^{2})$	Binary moderator: ${\hat{δ}}_{3 b} / \sqrt{\frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) Q_{3} (1 - Q_{3}) K J n}}$ Continuous moderator: ${\hat{δ}}_{3 c} / \sqrt{\frac{(1 - ρ_{3} - ρ_{2}) (1 - R_{1}^{2})}{P (1 - P) K J n}}$	KJ(n − 1) − 3

Moderator	Power	Type I error
Continuous moderator
SE from simulation	0.063	0.059	0.041	0.031	0.066	0.060	0.042	0.032
SE from formula	0.064	0.059	0.040	0.031	0.064	0.059	0.040	0.031
Absolute difference in SE	0.001	−0.001	−0.001	0.000	−0.002	−0.001	−0.003	−0.001
Relative difference in SE (%)	2.3	−1.1	−2.7	−1.2	−3.3	−1.7	−6.2	−3.5
Coverage rate of 95% CI	0.966	0.960	0.950	0.950	0.964	0.961	0.940	0.949
Power/Type I error rate from simulation	0.854	0.905	0.999	1.000	0.053	0.052	0.059	0.046
Power/Type I error rate from formulas	0.845	0.900	0.999	1.000	0.050	0.050	0.050	0.050
Absolute difference	−0.009	−0.005	0.000	0.000	0.002	−0.002	−0.008	0.004
Binary moderator
SE from simulation	0.084	0.080	0.067	0.062	0.085	0.081	0.066	0.063
SE from formula	0.084	0.080	0.067	0.061	0.084	0.080	0.067	0.061
Absolute difference in SE	0.000	0.000	0.000	0.000	−0.001	−0.001	0.001	−0.002
Relative difference in SE (%)	0.0	−0.1	0.1	−0.6	−1.0	−1.0	0.8	−2.7
Coverage rate of 95% CI	0.963	0.962	0.957	0.942	0.963	0.964	0.957	0.941
Power/Type I error rate from simulation	0.622	0.646	0.831	0.889	0.051	0.052	0.044	0.051
Power/Type I error rate from formulas	0.631	0.665	0.846	0.897	0.050	0.050	0.050	0.050
Absolute difference	0.009	0.020	0.016	0.009	0.000	−0.001	0.007	−0.001

Moderator	Power	Type I error
Continuous moderator
SE from simulation	0.071	0.065	0.042	0.031	0.073	0.065	0.040	0.031
SE from formula	0.064	0.059	0.040	0.031	0.064	0.059	0.040	0.031
Absolute difference in SE	−0.007	−0.007	−0.002	0.000	−0.009	−0.006	0.000	0.000
Relative difference in SE (%)	−9.9	−10.1	−4.8	−0.7	−12.0	−9.8	−0.5	0.0
Coverage rate of 95% CI	0.943	0.934	0.943	0.944	0.937	0.939	0.950	0.947
Power/Type I error rate from simulation	0.769	0.834	0.997	1.000	0.056	0.056	0.050	0.048
Power/Type I error rate from formulas	0.844	0.894	0.999	1.000	0.050	0.050	0.050	0.050
Absolute difference	0.076	0.061	0.002	0.000	−0.005	−0.005	0.000	0.003
Binary moderator
SE from simulation	0.092	0.080	0.078	0.061	0.095	0.077	0.079	0.060
SE from formula	0.097	0.079	0.079	0.061	0.096	0.079	0.079	0.061
Absolute difference in SE	0.004	0.000	0.001	0.000	0.001	0.002	0.001	0.002
Relative difference in SE (%)	4.6	−0.5	1.4	0.4	1.4	2.8	0.7	3.0
Coverage rate of 95% CI	0.971	0.963	0.950	0.952	0.970	0.971	0.956	0.955
Power/Type I error rate from simulation	0.513	0.668	0.708	0.896	0.045	0.040	0.051	0.046
Power/Type I error rate from formulas	0.500	0.661	0.701	0.895	0.050	0.050	0.050	0.050
Absolute difference	−0.013	−0.007	−0.007	−0.001	0.005	0.011	0.000	0.005

Moderator	Power	Type I error
Continuous moderator
SE from simulation	0.049	0.040	0.043	0.033	0.050	0.042	0.045	0.033
SE from formula	0.045	0.038	0.039	0.031	0.047	0.039	0.040	0.031
Absolute difference in SE	−0.004	−0.003	−0.004	−0.002	−0.003	−0.002	−0.005	−0.003
Relative difference in SE (%)	−8.1	−6.6	−8.9	−7.3	−5.8	−5.5	−11.9	−8.3
Coverage rate of 95% CI	0.949	0.950	0.929	0.932	0.961	0.953	0.919	0.931
Power/Type I error rate from simulation	0.954	0.989	0.987	0.998	0.043	0.051	0.055	0.047
Power/Type I error rate from formulas	0.986	0.999	0.999	1.000	0.050	0.050	0.050	0.050
Absolute difference	0.032	0.010	0.012	0.002	0.007	0.000	−0.005	0.003
Binary moderator
SE from simulation	0.091	0.077	0.082	0.060	0.092	0.077	0.078	0.061
SE from formula	0.093	0.077	0.079	0.061	0.094	0.079	0.079	0.061
Absolute difference in SE	0.002	0.000	−0.002	0.001	0.002	0.002	0.001	0.000
Relative difference in SE (%)	2.1	−0.5	−2.9	1.8	2.3	2.5	1.2	0.5
Coverage rate of 95% CI	0.968	0.962	0.946	0.954	0.966	0.967	0.960	0.954
Power/Type I error rate from simulation	0.538	0.696	0.694	0.905	0.046	0.054	0.046	0.047
Power/Type I error rate from formulas	0.533	0.682	0.700	0.903	0.050	0.050	0.050	0.050
Absolute difference	−0.005	−0.014	0.006	−0.002	0.005	−0.004	0.005	0.004

Analysis	SE	Bias	Relative bias (%)	Power	Bias	Relative bias (%)
Based on simulation	0.063	NA	NA	0.854	NA	NA
Based on formulas
True model (students nested in classes nested in schools)	0.064	0.001	2.3	0.845	−0.009	−1.0
Ignore Level 2 (students nested in schools)	0.066	0.003	5.5	0.823	−0.031	−3.6
Ignore Level 3 (students nested in classes)	0 . 047	−0.016	−25.4	0.989	0.136	15.9

Footnotes

ORCID iDs

Nianbo Dong

Kyle Nickodem

Ning Sui

Funding

This project is supported by the National Science Foundation [1913563]. The opinions expressed herein are those of the authors and not the funding agency.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Supplemental Material

Supplemental material for this article is available online.

References

Bai

Kelcey

Ataneka

Xie

Cox

Dong

(2025a). Statistical power for (conditional) mediation effects in multisite randomized trials. American Journal of Evaluation, Advance online publication.

Bai

Kelcey

Xie

Ataneka

Cox

Dong

(2025b). Design and analysis of multisite cluster-randomized trials targeting (conditional) mediation effects. Journal of Experimental Education. Advance online publication. https://doi.org/10.1080/00220973.2025.2521755

Baron

R. M.

Kenny

D. A.

(1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182. https://doi.org/10.1037//0022-3514.51.6.1173

Bloom

H. S.

(1995). Minimum detectable effects: A simple way to report the statistical power of experimental designs. Evaluation Review, 19(5), 547–556. https://doi.org/10.1177/0193841X9501900504

Bloom

H. S.

Richburg-Hayes

Black

A. R.

(2007). Using covariates to improve precision for studies that randomize schools to evaluate educational interventions. Educational Evaluation and Policy Analysis, 29(1), 30–59. https://doi.org/10.3102/0162373707299550

Bloom

H. S.

Zhu

Jacob

Raudenbush

Martinze

Fen

(2008). Empirical issues in the design of group-randomized studies to measure the effects of interventions for children. Working Paper. https://www.mdrc.org/sites/default/files/full_85.pdf.

Bulus

Dong

(2022). Consequences of ignoring a level of nesting on design and analysis of blocked three-level regression discontinuity designs: Power and type I error rates. Adıyaman University Journal of Educational Sciences, 12(1), 42–55. https://doi.org/10.17984/adyuebd.1068923

Cox

Kelcey

Luce

(2024). Power to detect moderated effects in three-level partially nested designs. Journal of Experimental Education, 92(1), 130–153. https://doi.org/10.1080/00220973.2022.2130130

Cox

Kelcey

Luce

(2025). Power to detect moderated effects with random slopes in partially nested designs. General Linear Model Journal, 49(1), 1–17. https://doi.org/10.31523/glmj.049001.001

10.

Dong

Herman

K. C.

Reinke

W. M.

Wilson

S. J.

Bradshaw

C. P.

(2023). Gender, racial, and socioeconomic disparities on social and behavioral skills for K-8 students with and without interventions: An integrative data analysis of eight cluster randomized trials. Prevention Science, 24(8), 1483–1498. https://doi.org/10.1007/s11121-022-01425-w

11.

Dong

Kelcey

Spybrook

(2018). Power analyses of moderator effects in three-level cluster randomized trials. Journal of Experimental Education, 86(3), 489–514. https://doi.org/10.1080/00220973.2017.1315714

12.

Dong

Kelcey

Spybrook

(2021a). Design considerations in multisite randomized trials to probe moderated treatment effects. Journal of Educational and Behavioral Statistics, 46(5), 527–559. https://doi.org/10.3102/1076998620961492

13.

Dong

Kelcey

Spybrook

(2024a). Experimental design and power for moderation in multisite cluster randomized trials. The Journal of Experimental Education, 92(4), 741–757. https://doi.org/10.1080/00220973.2023.2226934

14.

Dong

Kelcey

Spybrook

Xie

Pham

Qiu

Sui

(2024b). A practical guide to power analyses of moderation effects in multisite individual and cluster randomized trials. The Journal of Experimental Education. Advance online publication. https://doi.org/10.1080/00220973.2024.2338521

15.

Dong

Reinke

W. M.

Herman

K. C.

Bradshaw

C. P.

Murray

D. W.

(2016). Meaningful effect sizes, intraclass correlations, and proportions of variance explained by covariates for panning two- and three-level cluster randomized trials of social and behavioral outcomes. Evaluation Review, 40(4), 334–377. https://doi.org/10.1177/0193841X16671283

16.

Dong

Spybrook

Kelcey

Bulus

(2021b). Power analyses for moderator effects with (non)random slopes in cluster randomized trials. Methodology, 17(2), 92–110. https://doi.org/10.5964/meth.4003

17.

Gortazar

Hupkau

Roldán-Monés

(2024). Online tutoring works: Experimental evidence from a program with vulnerable children. Journal of Public Economics, Publication advance online. https://doi.org/10.1016/j.jpubeco.2024.105082

18.

Hedges

L. V.

Hedberg

(2007). Intraclass correlation values for planning group randomized trials in education. Educational Evaluation and Policy Analysis, 29(1), 60–87. https://doi.org/10.3102/0162373707299706

19.

Hedges

L. V.

Hedberg

(2013). Intraclass correlations and covariate outcome correlations for planning two- and three-level cluster-randomized experiments in education. Evaluation Review, 37(6), 445–489. https://doi.org/10.1177/0193841X14529126

20.

Jacob

Zhu

Bloom

(2010). New empirical evidence for the design of group randomized trials in education. Journal of Research on Educational Effectiveness, 3(2), 157–198. https://doi.org/10.1080/19345741003592428

21.

Kelcey

Phelps

(2013). Considerations for designing group randomized trials of professional development with teacher knowledge outcomes. Educational Evaluation and Policy Analysis, 35, 370–390. https://doi.org/10.3102/0162373713482766

22.

Kelcey

Shen

Spybrook

(2016). Intraclass correlation coefficients for designing school randomized trials in education in sub-saharan Africa. Evaluation Review, 40, 500–525. https://doi.org/10.1177/0193841X16660246

23.

Matuschek

Kliegl

Vasishth

Baayen

Bates

(2017). Balancing Type I error and power in linear mixed models. Journal of Memory and Language, 94, 305–315. https://doi.org/10.1016/j.jml.2017.01.001

24.

Moerbeek

(2004). The consequence of ignoring a level of nesting in multilevel analysis. Multivariate Behavioral Research, 39(1), 129–149. https://doi.org/10.1207/s15327906mbr3901_5

25.

Mulolli

Hedberg

E. C.

Bogia

Spybrook

Berglund

Unlu

Opper

I. M.

(2025). Improving the design of evaluations that include students, teachers, and schools: An empirical investigation of key design parameters. AERA Open, 11, https://doi.org/10.1177/23328584251320380

26.

Murray

(1998). Design and analysis of group-randomized trials. Oxford University Press.

27.

Nye

Konstantopoulos

Hedges

L. V.

(2004). How large are teacher effects? Educational Evaluation and Policy Analysis, 26(3), 237–257. https://doi.org/10.3102/01623737026003237

28.

Olsen

Bein

Judkins

(2017). Sample size requirements for education multi-site RCTs that select sites randomly. Social Science Research Network. https://doi.org/10.2139/ssrn.2956576

29.

Opdenakker

M. C.

Van Damme

(2000). The importance of identifying levels in multilevel analysis: An illustration of the effects of ignoring the top or intermediate levels in school effectiveness research. School Effectiveness and School Improvement, 11(1), 103–130. https://doi.org/10.1076/0924-3453(200003)11:1;1-A;FT103

30.

Raudenbush

S. W.

(1988). Educational applications of hierarchical linear models: A review. Journal of Educational and Behavioral Statistics, 13(2), 85–116. https://doi.org/10.3102/10769986013002085

31.

Raudenbush

S. W.

Bryk

A. S.

(2002). Hierarchical Linear Models: Applications and Data Analysis Methods (p. 485). Sage.

32.

Shen

Curran

F. C.

You

Splett

J. W.

Zhang

(2023). Intraclass correlations for evaluating the effects of teacher empowerment programs on student educational outcomes. Educational Evaluation and Policy Analysis, 45(1), 134–156. https://doi.org/10.3102/01623737221111400

33.

Shen

Kelcey

(2022). Optimal sample allocation for three-level multisite cluster randomized trials. Journal of Research on Educational Effectiveness, 15(1), 130–150. https://doi.org/10.1080/19345747.2021.1953200

34.

Snijders

(2001). Sampling. In Leyland

A. H.

Goldstein

(Eds.), Multilevel modeling of health statistics (pp. 159–173). John Wiley.

35.

Snijders

(2005). Power and sample size in multilevel linear models. In Everitt

B. S.

Howell

D. C.

(Eds.), Encyclopedia of statistics in behavioral science (Vol. 3, pp. 1570–1573). Wiley.

36.

Spybrook

Kelcey

Dong

(2016). Power for detecting treatment by moderator effects in two and three-level cluster randomized trials. Journal of Educational and Behavioral Statistics, 41(6), 605–627. https://doi.org/10.3102/1076998616655442

37.

Spybrook

Zhang

Kelcey

Dong

(2020). Learning from cluster randomized trials in education: An assessment of the capacity of studies to determine what works, for whom, and under what conditions. Educational Evaluation and Policy Analysis, 42(3), 354–374. https://doi.org/10.3102/0162373720929018

38.

Van den Noortgate

Opdenakker

M. C.

Onghena

(2005). The effects of ignoring a level in multilevel analysis. School Effectiveness and School Improvement, 16(3), 281–303. https://doi.org/10.1080/09243450500114850

39.

Van Landeghem

De Fraine

Van Damme

(2005). The consequence of ignoring a level of nesting in multilevel analysis: A comment. Multivariate Behavioral Research, 40(4), 423–434. https://doi.org/10.1207/s15327906mbr4004_2

40.

Weiss

M. J.

Bloom

H. S.

Verbitsky-Savitz

Gupta

Vigil

A. E.

Cullinan

D. N.

(2017). How much do the effects of education and training programs vary across sites? Evidence from past multisite randomized trials. Journal of Research on Educational Effectiveness, 10(4), 843–876. https://doi.org/10.1080/19345747.2017.1300719

41.

Westine

C. D.

Spybrook

Taylor

J. A.

(2013). An empirical investigation of variance design parameters for planning cluster-randomized trials of science achievement. Evaluation Review, 37(6), 490–519. https://doi.org/10.1177/0193841X14531584

42.

Nichols

(2010). New estimates of design parameters for clustered randomization studies: Findings from North Carolina and Florida (Working Paper 43). National Center for Analysis of Longitudinal Data in Education Research.

43.

Zhu

Jacob

Bloom

(2011). Designing and analyzing studies that randomize schools to estimate intervention effects on student academic outcomes without classroom-level information. Educational Evaluation and Policy Analysis, 34(1), 45–68. https://doi.org/10.3102%2F0162373711423786

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.15 MB