Abstract
This systematic review and meta-analysis synthesized the body of research on behavioral interventions for improving group engagement. Fifty studies were included (n = 10 group designs; n = 40 single case designs [SCDs]). Most research on interventions to improve group engagement was conducted in elementary and middle school general education settings. Class-Wide Function-Related Intervention Teams (n = 6), the Good Behavior Game (n = 6), Tootling (n = 8), and multicomponent approaches (n = 9) were the most commonly reported intervention types. Antecedent strategies that were most commonly embedded in behavioral interventions to increase group engagement were goal setting (n = 29), precorrections (n = 24), and teaming (n = 15). Consequence strategies included group contingencies (n = 34), points (n = 21), tangible rewards (n = 35), and teacher-delivered praise (n = 24). Meta-analyses of the effect size estimates from SCD studies (SMD = 2.28) and group design studies (g = 1.69) indicated an overall large and positive effect of the included interventions on group engagement. This effect was not moderated by grade level, inclusion of students with disabilities, number of antecedent or consequence components, or inclusion of a group contingency. Implications for researchers and practitioners are discussed.
Keywords
Supporting student behavior continues to present as one of the primary challenges for both general and special education teachers (Lin et al., 2024; Moore et al., 2017; National Council on Teacher Quality, 2013). This challenge likely reflects a lack of preparation within preservice programs (Flower et al., 2017), limited access to in-service professional development (Hirsch et al., 2019), and the continued need for researchers to identify effective interventions to increase prosocial behaviors (Chow et al., 2023). Following the COVID-19 pandemic, these underlying issues have been amplified by schools reporting increased rates of disruptive behavior and an increased use of exclusionary discipline (Flores & Losen, 2023; Rotherham et al., 2023). Ineffective and exclusionary practices can result in a cascade of negative impacts, including poor academic and social outcomes for the students most in need of support (e.g., students with emotional or behavioral disorders; Kauffman & Landrum, 2013) and negative perceptions of school climate (La Salle-Finley, 2024). Furthermore, as a teacher shortage persists across the United States, teachers frequently report discipline issues as one of the most prominent reasons for leaving their job (Gilmour & Wehby, 2020). Embedding training and coaching within multitiered systems of support (MTSS) for student behavior can enhance the delivery of behavioral supports through a proactive and efficient framework.
Universal Practices for Supporting Student Behavior
Implementation of universal practices (i.e., Tier 1) with all students is a core feature of MTSS frameworks for supporting student behavior (e.g., Positive Behavioral Interventions and Supports [PBIS], Center on PBIS [2023]; Comprehensive, Integrated, Three-Tiered Models of Prevention [Ci3T], Lane et al., 2014). A robust body of research on PBIS indicates that universal practices typically meet the behavioral needs of 80% of students in a school, often based on school-wide measures such as office discipline referrals and attendance (Santiago-Rosario et al., 2023). In contrast, fewer studies have evaluated the effects of classroom-level universal practices on the types of student behaviors that teachers continuously support across the school day (Reinke et al., 2012). Teachers may particularly benefit from training on practices that increase academic engagement, given the influence of engagement on academic outcomes (Chow et al., 2021; Macdonald et al., 2021) and evidence that teachers perceive maintaining student engagement as their greatest job demand (Gao et al., 2025). To support teachers in creating tiered classroom support plans aligned with school-wide MTSS frameworks, a distinction between practices with evidence for increasing engagement for all students during large- or small-group instruction (i.e., group engagement) versus individual students is needed.
Group engagement describes the extent to which multiple students are concurrently demonstrating observable behaviors that comprise academic engagement (e.g., attending, participating, and completing tasks; Greenwood et al., 2002). Behavioral supports applied at the class-wide level to increase group engagement can generally be categorized as: (a) antecedent strategies that increase the likelihood of prosocial behaviors (e.g., clear expectations and frequent opportunities to respond [OTRs]), or (b) consequence strategies that increase the future likelihood of desirable behavior (e.g., behavior specific praise [BSP] and tangible rewards) and decrease the future likelihood of undesirable behavior (e.g., corrective feedback). Group engagement data can serve as a proximal indicator of Tier 1 implementation fidelity, offering practitioners a practical means to assess the effectiveness of universal practices and accurately identify students in need of more intensive, Tier 2 interventions (Van Camp et al., 2020). It is critical to only elevate students to advanced tiers of behavioral support if warranted (i.e., low-quality implementation of Tier 1 practices) to avoid over-identification and resource strains (McIntosh et al., 2018; Sugai & Horner, 2009). Furthermore, teachers should ensure a strong Tier 1 foundation is in place before layering on Tier 2 interventions for a student (Van Camp et al., 2020).
The extant literature has provided evidence on the use of singular antecedent and consequence-based strategies to increase engagement. For example, MacSuga-Gage and Simonsen (2015) synthesized the effects of OTRs on student outcomes and found that they led to better class-wide outcomes across a variety of academic and behavioral outcomes. In another review, Royer et al. (2019) found that BSP could potentially be an evidence-based practice according to Council for Exceptional Children quality indicators. Multiple studies included in Royer et al.’s review evaluated the effect of BSP across multiple contexts (e.g., impact on individual student behavior and class-wide behavior) and included engagement as an outcome measure. Taken together, OTRs and BSP are two discrete strategies that can be used to improve student outcomes, such as group and individual engagement.
Research has also indicated that combining antecedent and consequence strategies can positively impact engagement, most commonly within the context of group contingencies. In a systematic review and meta-analysis of group contingency research, Maggin et al. (2017) identified 40 single case design (SCD) studies that investigated group contingencies in K–12 classrooms of students with or without disabilities and met minimum design standards (Kratochwill et al., 2013). Eighteen of those studies reported a measure of engagement, including both individual engagement (n = 5) and group engagement (n = 13). Meta-analysis of between-case standardized mean difference (BC-SMD; Hedges et al., 2012) effect size estimates indicated d = 1.80 and d = 1.88 for immediate and gradual effects models, respectively. Unit of analysis (i.e., individual vs. group) did not moderate the results. Maggin et al. noted that most studies were conducted in general education classrooms with students in late elementary or middle school grades.
Additionally, researchers have conducted broader syntheses of practices implemented within the context of universal behavior supports. For example, Chaffee et al. (2017) conducted a systematic review and meta-analysis of class-wide interventions for supporting student behavior in general education settings. Researchers identified 83 SCD studies that met inclusion criteria and included 29 studies that met design standards (Kratochwill et al., 2013) in their meta-analysis. Group contingencies were implemented in most studies, and 11 studies included a measure of class-wide appropriate behavior (e.g., group engagement). Analyses combined study effects for appropriate and inappropriate behavior, and results indicated large, positive effects (Tau-U = 0.93; Hedges’ g = 2.04). In alignment with Maggin et al. (2017), the unit of analysis did not moderate either effect size estimate and most studies were conducted at the elementary level.
Need for Additional Research
Results from previous reviews have provided substantive contributions toward the field’s understanding of applying antecedent and consequence-based strategies in school settings to improve academic engagement. Although many of the aforementioned strategies are applied at the class-wide level, authors of previous syntheses have identified relatively few studies that included measures of group engagement (Chaffee et al., 2017; Maggin et al., 2017). Furthermore, many of those studies were conducted decades ago and are likely no longer in alignment with current instructional practices and recommendations (e.g., integration of technology into instructional routines; Sindelar et al., 1986). Given that the original design of universally applied strategies is to improve class-wide behavior, it is important to understand findings from studies that distinctly analyzed effects at the group level. This is especially critical in settings that include students with disabilities. For example, Chow et al. (2023) found that average effect sizes for individual-level engagement were lower for children with disabilities versus children with no identified disability in early childhood settings. Applying strategies that can improve behavioral outcomes for all students in inclusive settings is an uncontested goal in education. To that point, ensuring that foundational strategies are firmly in place so that supplemental and more intensive strategies have the leverage to work as designed is paramount for promoting positive outcomes for students with or at risk of disabilities.
In sum, there is a need for an updated synthesis of research investigating the impact of universal practices on group engagement. Recent advances in the development of design-comparable effect sizes have increased the value and feasibility of synthesizing SCD and group design studies within the same review (e.g., Ferron et al., 2023). Thus, an updated review should include both design types. There has also been increased focus on the inclusion of students with disabilities within MTSS frameworks (Meyer et al., 2021); thus, there is a need for a review that includes the full spectrum of settings in which specialized instruction and related services may occur (Individuals with Disabilities Education Act [IDEA], 2004). A review of studies conducted across the continuum of educational environments (e.g., general education, resource, and self-contained) could inform the application of universal practices for students with disabilities in their least restrictive environments.
Finally, an updated review should include identification of the specific antecedent and consequence practices included in multicomponent interventions (Sutherland et al., 2018). Previous reviews have categorized outcomes broadly (e.g., inappropriate or appropriate behavior; Chaffee et al., 2017) or focused on a specific intervention (e.g., group contingencies; Maggin et al., 2017). The present systematic review and meta-analysis extend previous work by synthesizing evidence across a wide range of behavioral interventions that included a measure of group engagement and examining the potential of differential effects based on participant characteristics and intervention components.
Research Questions
What are the characteristics of behavioral intervention studies that include a measure of group engagement (i.e., participants and settings, intervention components, study quality)?
What is the effect of behavioral interventions on group engagement?
RQ3: How does the effect of behavioral interventions on group engagement vary based on participant and intervention characteristics?
Method
Inclusion Criteria
Studies included in this review met five criteria. First, articles were published in a peer-reviewed journal and were available in English. Second, data were collected in K-12 school settings during academic instruction led by in-service teachers. We excluded studies that occurred during nonacademic periods (e.g., transition times and related arts classes), in nontraditional schools (e.g., juvenile justice facilities), or included nonteacher implementers (e.g., researchers and interns). Third, research designs allowed for causal determination. For group designs, we included randomized control trials (RCTs) or quasi-experiments with control groups. For SCDs, we required studies to allow for at least three potential demonstrations of effect (e.g., ABAB designs, multiple baseline designs with three tiers) with at least three data points in each condition. We excluded nonconcurrent multiple baseline designs and SCD studies that did not present group engagement data in a line graph. Fourth, studies included a direct observation measure of engagement or a synonym (e.g., on-task behavior) for groups of at least two students. Direct behavior ratings were eligible if they were collected during an observation. Fifth, a behavioral intervention was compared with a baseline and business-as-usual condition. We defined behavioral interventions as changing antecedents or consequences within the classroom environment. We excluded pharmacological interventions or studies that compared two interventions (e.g., independent vs. interdependent contingency) without a baseline condition.
Study Identification Procedures
We identified eligible studies through a three-stage search process that was aligned with PRISMA 2020 guidelines (Page et al., 2021) and is depicted in Figure 1. First, we contacted the authors of a large-scale search for SCD studies that included measures of engagement or problem behavior and were granted access to their search and screening files. That search was conducted in 2019 across PsycINFO, ProQuest Dissertations and Theses, and PubMed databases; it was not limited by year; it included terms related to problem behavior, engagement, and SCDs, and returned 1,248 unique records. There are currently two published articles that report results of that search for early childhood (Chow et al., 2023; n = 53 studies) and elementary settings (Ledford et al., 2024; n = 131 studies); readers may refer to those articles for additional details about their search and screening processes. We conducted an initial review of their screening spreadsheets and identified 28 studies that were: (a) published articles, (b) reported a measure of group engagement, and (c) occurred in K–12 settings. We retained these articles for inclusion in our full-text review, and subsequent search procedures were designed to update and extend Chow et al.’s (2023) and Ledford et al.’s (2024) search to identify: (a) SCD studies that were published after their search (i.e., January 2020 and beyond); and (b) group design studies published across any year.

PRISMA Flow Diagram of Search Procedures.
Database Search and Screening
Second, we conducted concurrent searches of PsycINFO and Medline with Full Text using the EBSCO database and replicating the search strings reported in Chow et al. (2023) to specify: (a) outcome (engag* or “on-task” OR on task); (b) setting (class* OR school*); and (c) intervention (treatment* OR intervention* OR prevent* OR training*). To increase the relevance of results, all searches included the Boolean operator “AND” between search strings and required the use of search terms within abstracts. Our first search also replicated terms for SCDs used in Chow et al. (“singlesubject” OR “single subject” OR “single-case” OR “single case” OR “multiple baseline” OR “multiple-baseline” OR “multiple probe” OR “multiple-probe” OR “changing criterion” OR “withdrawal” OR “ABAB” OR “A-B-A-B” OR “reversal” OR “alternating treatment*”) and was limited to studies published from 2020 to 2023. This search returned 289 records. We conducted a second search that included terms for group designs ([random OR randomized OR RCT] OR [quasi-experiment* OR QED]) and was not limited by year, which returned 2,665 records. Duplicate records were removed by EBSCO and Covidence screening software, and 2,410 records remained for screening.
Title and Abstract Screening
We reviewed the titles and abstracts of all records obtained from our database searches using Covidence software. Screeners included the first author and a master’s-level research assistant (RA) who was trained until reaching 90% agreement on a set of 10 studies. We double-screened titles and abstracts of all remaining records, and interrater agreement (IRA) was 95.24%. Disagreements were resolved through consensus discussions between the two screeners, leading to the exclusion of 2,204 records.
Full Text Screening
We next screened the full text of 206 records identified through database searches using a hierarchical coding scheme based on our inclusion criteria. If a study failed to meet one criterion, it was excluded, and subsequent criteria were not evaluated. The first author trained the same master’s level RA using sets of 10 records until reaching 90% agreement on the final inclusion decision. The RA screened all remaining records, and the first author double screened 30%. IRA was 89.29%, and disagreements were resolved through consensus discussions. Through this process, we identified 20 studies for inclusion in the review.
Additional Search and Screening Methods
At the third stage, we reviewed the reference list of a previous review of class-wide interventions (Chaffee et al., 2017). We also hand-searched issues published in the past 10 years (2013–2023) by the four most commonly represented journals from studies identified in the database searches. This included Journal of Behavioral Education, Journal of Positive Behavior Interventions, Psychology in the Schools, and School Psychology Review. The master’s-level RA conducted initial screening of titles and abstracts from these sources and identified 60 potential articles for inclusion. The first author then applied our full-text screening process to these articles and the 28 studies retained from Chow et al.’s (2023) search and excluded 58 articles (see reasons in Figure 1). Thus, searches of previous reviews and journals resulted in the inclusion of 30 studies in this review.
Data Extraction
Descriptive Data
The first and second authors developed a codebook to capture descriptive information about: (a) participants and settings, and (b) intervention components. We coded total students, classroom types (e.g., general vs. special education), academic areas (e.g., reading, math), student grade level, and teacher certification. We also coded whether students with disabilities were included, and if so, the number of students and specific disability categories described. We coded the classroom type consistent with how the authors reported it.
We coded the name of the behavioral intervention and then coded for the presence of specific antecedent and consequence components within the authors’ description of the intervention. The first two authors developed an initial list of components based on recommendations for universal classroom supports within MTSS frameworks (Simonsen et al., 2021) and then refined the list across an initial set of four studies. In the final codebook, antecedent components included: (a) choice, (b) goal setting, (c) opportunities to respond, (d) precorrections, (e) seating, (f) social-emotional learning curriculum, (g) self-monitoring, and (h) teaming. Consequence components included: (a) corrective feedback, (b) group contingency, (c) peer praise, (d) points, (e) tangible rewards, and (f) teacher praise.
The first two authors double-coded studies until reaching 90% point-by-point IRA on two consecutive studies. We then split primary coding for the remaining studies and double-coded a random selection of 32%. We calculated point-by-point IRA by comparing the codes in each cell of our spreadsheets. Mean IRA was 91.10% (range = 79.31%–100%), and all disagreements were resolved through consensus discussions between the two coders.
Quality Evaluation
We evaluated the quality of included studies using a codebook based on the What Works Clearinghouse (WWC, 2022) Version 5.0 Standards for SCD and group designs. We coded SCD studies at the design level and we evaluated whether: (a) the measure of group engagement met criteria for interobserver agreement (1 = yes); (b) the researcher systematically manipulated the independent variable (1 = yes); (c) there were sufficient data points in the baseline phase (1 = 6 or more, 2 = 5 or more), and (d) there were sufficient data points in all other phases (1 = 5 or more; 2 = 3 or more). For multiple baseline and multiple probe designs, we evaluated whether: (a) the design was concurrent (1 = yes), and (b) baseline phases had sufficient overlap (1 = first 3 sessions overlapped; 2 = at least 1 of the first 3 sessions overlapped). We also coded multiple probe designs for: (a) sufficient probes before introducing the independent variable (1 = 3 probes, 2 = 1 probe), and (b) tiers in the baseline phase had a probe when other tiers were in the intervention phase (1 = yes). Studies were categorized as “Meets with Reservations” if any criteria were coded 2, and “Does not Meet” if any criteria were coded 0.
For studies that initially received all codes of 1, we followed WWC procedures to evaluate risk of bias by calculating nonoverlap of all pairs (NAP) for baseline trend and, if the study used an ABAB design, reversibility. If NAP was below 0.85 for all comparisons, we coded the study as “Meets” standards. If NAP was higher than 0.85 for any comparisons, we coded the study as “Meets with Reservations.” Additional details on these coding criteria can be found in the WWC (2022) handbook.
For group design studies, we evaluated whether: (a) the measure of group engagement met criteria for interobserver agreement (1 = yes); (b) participants were randomly assigned to conditions (1 = yes; 2 = no); (c) attrition was above accepted levels (1 = low attrition; 2 = high attrition; but not differential); and (d) for studies with nonrandom assignment, baseline differences in groups were minimal or covariates were included in analyses (1 = yes). Studies were categorized as “Meets” if all criteria were coded 1, “Meets with Reservations” if any criteria were coded 2, and “Does Not Meet” if any criteria were coded 0.
The first author was the primary quality coder for all studies. A doctoral-level RA was trained on the SCD coding criteria until reaching 90% agreement on two consecutive designs. We then double-coded 32% and calculated point-by-point IRA for each criterion and the final design. Mean IRA was 95.09% (range = 66.67%-100%), and disagreements were resolved through consensus. The second author double-coded 30% of group design studies, and agreement was 100%.
Effect Size Extraction
Single-Case Design Studies
Raw data for six of the studies included in Ledford et al. (2024) were available via an open science repository. For the remaining studies, we extracted raw data from graphs in the published PDFs using PlotDigitizer software (Huwaldt & Steinhorst, 2020). The first author trained a master’s-level RA until meeting 90% agreement on graphs from two consecutive studies. Subsequently, the RA extracted data from all studies, and we checked agreement on 27%. We calculated point-by-point IRA by comparing the value extracted for each session and defined agreements as data points differing by no more than 2 percentage points. Mean agreement per design was 95.91%, and discrepancies were resolved by reviewing the graphs to determine potential reasons for disagreement and then re-extracting data.
We then calculated effect sizes using the SingleCaseES package (Pustejovsky & Swan, 2018) within the R statistical environment. We calculated within-case SMD between each primary baseline and intervention comparison, excluding additional phases such as maintenance or follow-up. Within-case SMD represents the magnitude of change based on mean differences between conditions relative to the standard deviation (Chow et al., 2023). We chose within-case SMD given its use in previous reviews of interventions designed to improve engagement (Chow et al., 2023; Ledford et al., 2024), its suitability for multiple design types, its appropriateness for studies with fewer than three participants (in comparison to across-case SMD), and its comparability to Hedges’ g. Calculation of SMD requires variability within baseline conditions, which was present for all included comparisons. R code is available within our Supplemental Materials.
Group Design Studies
As part of our quality codebook for group designs, we extracted the following data for each treatment and control group: (a) number of classrooms, (b) post-intervention mean of group engagement, and (c) post-intervention standard deviation or standard error. Of the 10 group studies, 8 authors provided these values within narratives or tables, and we contacted the 2 remaining authors for missing values. We were unable to obtain missing values for Gregory et al. (2014) and thus excluded this study from the analysis. We then used a web-based effect size calculator to generate Hedge’s g for each study (Wilson, 2023).
Meta-Analytic Procedures
We used quantitative meta-analytic procedures to synthesize data across studies using RStudio Version 4.2.1 (R Core Team, 2018). Given the stringency of WWC (2022) standards, which have the potential to exclude a substantial portion of SCD studies (Kratochwill et al., 2023), we did not apply quality ratings as inclusion criteria. Instead, we included all identified studies in the meta-analysis and reported study quality descriptively. We aggregated effect sizes using random-effects meta-analysis with robust variance estimation (RVE), which allows the inclusion of multiple effect sizes per study without knowing the precise correlations between the dependent effect sizes. We set the assumed correlation among effect sizes at rho = 0.8 and conducted sensitivity analyses at varying values of rho to ensure that our findings were robust across different levels of dependency.
We first examined the overall aggregated difference between the baseline and intervention conditions in the reported group engagement in SCD studies. Next, we conducted a moderator analysis using a meta-regression model by regressing the aggregated effect size on four hypothesized moderators: (a) grade level (elementary, middle, high, mixed grades); (b) inclusion of students with disabilities (yes/no); (c) number of total antecedent components; (d) number of total consequence components (excluding group contingency); and (e) group contingency (yes/no). Grade level was included as a moderator to further examine if there were differential outcomes based on elementary, middle, and high school contexts. Given the frequent use of multicomponent interventions across studies included in this review, we included the number of antecedent and consequence components as moderators to explore if the complexity or intensity of behavioral interventions moderated group engagement outcomes. We ran group contingency as an individual moderator (categorical dummy variable for interventions coded as group contingency interventions) rather than including it in the total consequence components because several of the individual components we coded for were mutually inclusive with group contingencies (e.g., teaming, points).
Because published studies are more likely to have larger sample sizes and larger effects than unpublished studies with small samples and/or null effects (Chow, 2018), we then analyzed potential publication bias by fitting another meta-regression model to see if the standard error was associated with effect size magnitude. While examining the data, we found 7 effect sizes larger than 10 (4.3%), which were higher than the majority of the effect sizes. We then conducted a sensitivity analysis by excluding these larger effect sizes and repeated the aforementioned analyses to test if the results were robust.
For the group design studies, we also first examined the overall, aggregated group difference in engagement between the control and intervention conditions using a random effects meta-analysis with RVE. We then examined whether this aggregated group difference differed by grade levels by examining the synthesized effect size for elementary (n = 5) and middle school grades (n = 3), separately. We omitted high school grades because there was only one ES for this sample. Last, we used the same meta-regression model to examine potential publication bias in the group-design studies.
Results
Descriptive Results
Participant and Setting Characteristics
Table 1 summarizes data on study participants and settings, and online supplemental Tables 1 and 2 depict codes for each group design and SCD study, respectively. Across the 50 studies in this review, there were 6,810 total student participants (group n = 5,058; SCD n = 1,752) and 731 teacher participants (group n = 609; SCD n = 122). We coded data on classroom type, teacher certification, grade level, and students with disabilities at the study level. A large majority (80%) of studies occurred in general education classrooms (group k = 10; SCD k = 30). A small number of SCD studies occurred in resource rooms (k = 2), separate schools (k = 6), or special education classrooms (k = 2). In alignment with classroom type, most studies included general educators as implementers (group k = 10; SCD k = 30). Special educators were implementers in two group design studies and nine SCD studies. Most studies occurred in elementary grades (group k = 5; SCD k = 22) or middle school grades (group k = 4; SCD k = 12). High school grades were included in one group design study and nine SCD studies. About half of the study authors (58%) reported that students with disabilities were included in their sample (k = 3; SCD k = 26).
Participants and Settings Characteristics by Design Type.
Note. Some studies included multiple certifications or grade levels; thus, the total counts of these data are larger than the number of studies. SCD = single-case design. SWD = students with disability.
Intervention Components
Table 2 summarizes information on intervention types, antecedent components, and consequence components. Online supplemental Table 3 depicts codes for each group design study, and Supplemental Table 4 depicts codes for each SCD study. Among group design studies, the most common interventions were Class-Wide Function-Related Intervention Teams (CW-FIT; k = 3) and Establish–Maintain–Restore (k = 2). Commonly implemented interventions across SCD studies included CW-FIT (k = 6), the Good Behavior Game (k = 6), general group contingencies (k = 4), and tootling (k = 8). The most frequently reported antecedent strategies across SCD studies included goal setting (k = 26), precorrections (k = 19), and teaming (k = 12). Across the 10 group design studies, the most commonly reported antecedent strategies were precorrections (k = 5) and social-emotional curricula (k = 4). The most common consequence strategies reported across SCD studies included the use of a group contingency (k = 31) and tangible rewards (k = 32). Teacher praise was the most frequently reported consequence strategy in group design studies (k = 6).
Intervention Types and Components by Design Type.
Note. CW-FIT = Class-Wide Function-Related Intervention Teams, SCD = single case design.
This category includes all interventions included in fewer than two studies. See online supplemental Tables 3 and 4 for complete lists of intervention types. bMultiple intervention components were coded per study; thus, the total is greater than the number of studies included.
Study Quality
Table 3 depicts quality coding results. Across the 10 group design studies included in this review, 8 studies (80%) met WWC design standards (5.0) without reservations, and one study (10%) met design standards with reservations. Overall, this indicates strong methodological rigor across the majority of group design studies. Across 67 SCD studies, the majority of studies met design standards with reservations (n = 45; 67.16%). Ten SCD studies met without reservations (14.93%), and 12 SCDs (17.91%) did not meet design standards.
What Works Clearinghouse Quality Rating by Design Type.
Note. WWC = What Works Clearinghouse; Ratings based on WWC Reviewer Standards Version 5.0 (2022). SCD = single case design.
Meta-Analytic Results
Single-Case Design Studies
We estimated the overall weighted mean difference in group engagement between the baseline and intervention conditions based on the 164 effect sizes from 40 studies. The average weighted effect size was 2.28 (SE = 0.16, p < .001), which suggested the interventions had a significant, positive effect on group engagement across all the studies included. Overall heterogeneity in this primary model was high (τ2 = 1.24), and a moderate proportion of the variability represented true between-study heterogeneity (I2 = 67.1%). These indicators warranted additional analyses into the characteristics of heterogeneity within the current analytic sample.
Moderator analysis using meta-regression suggested that after controlling for potential predictors that could explain the variability in the aggregated condition difference, the intervention condition still had a significant, positive effect in improving group engagement (SMD = 2.36, p < .001). That is, after accounting for grade level, inclusion of students with disabilities, number of total antecedents, number of total consequences, and use of a group contingency, the main effect on group engagement remained. None of the predictors in the meta-regression model significantly predicted the aggregated effect for group engagement (-0.33 ≤ b ≤ 0.43, .172 ≤ p ≤ .838), suggesting the hypothesized moderators might not be sufficient to explain the heterogeneity in the sample.
Publication Bias and Sensitivity Analyses
To assess the potential presence of publication bias, we used meta-regression to determine whether there was a significant association between study precision and effect size magnitude. This analysis suggested that publication bias was present in the current analytic sample, given that the individual effect size standard error was found to be significantly associated with the magnitude of the effect size (p < .001). We also constructed a series of sensitivity analyses to describe the robustness of the current dataset. After removing the 7 ESs that were above 10, we repeated the same analyses previously described. We found the intervention condition still had an overall significant, positive effect (SMD = 2.20, p < .001) across 157 ESs in 39 studies. Moderator analysis also suggested that after controlling for the potential predictors, the effect of intervention condition was still significant and positive (SMD = 2.45, p < .001), and still none of the hypothesized moderators significantly predicted the aggregated difference in group engagement between the baseline and intervention conditions (-0.45 ≤ b ≤ 0.21, .268 ≤ p ≤ .987). Publication bias was still estimated to exist (p < .001).
Group Design Studies
We estimated an aggregated weighted difference in group engagement between the control condition and experimental condition using nine ESs in nine group design studies. The average weighted effect size was 1.69 (SE = 0.34, p = .001), suggesting that the intervention condition had an overall positive effect on group engagement compared to the control condition. We then separated the dataset by grade levels and examined the weighted average effect size between control and intervention conditions in group engagement among students from elementary school and middle school grades separately. For elementary school grades, we found that the intervention condition had a significant, positive effect on group engagement (g = 1.62, SE = 0.51, p = .035) based on five ESs from five studies, suggesting elementary school students had an overall higher engagement in the intervention condition compared to their peers in the control condition. However, the aggregated effect size was found to be positive yet not statistically significant for the middle school grades based on three ESs from three studies (g = 2.04, SE = 0.57, p = .070), suggesting that the intervention effect among older children might not be as effective as among younger children.
Publication Bias and Sensitivity Analyses
Publication bias assessment suggested a low chance that publication bias existed in the sample of group design studies, given that the standard error of each ES was not associated with the magnitude of the ES (p = .829). We limited sensitivity analyses for the group design data set to the potential impact of effect size dependency only, given the small number of studies available. To assess the potential impact of effect size dependency, we systematically varied the assumed within-study effect size correlation (rho) across values of 0, .2, .4, .6, .8, and 1, and observed no indication that the estimated effect size changed as a function of these different rho values. The consistency supported confidence in the overall effect size estimates and suggested that the nested nature of the dataset likely did not bias the results or their interpretation.
Discussion
The purpose of this systematic review and meta-analysis was to synthesize studies in which researchers investigated the effects of behavioral interventions on group engagement for students in grades K-12 classrooms. Group engagement can serve as a key measure of teachers’ efficacy in implementing universal behavioral supports (MacSuga-Gage & Simonsen, 2015), yet studies of class-wide behavioral support interventions often report outcomes for individual target students (Chaffee et al., 2017). We aimed to extend the findings of previous systematic reviews (Chaffee et al., 2017; MacSuga-Gage & Simonsen, 2015; Maggin et al., 2017; Royer et al., 2019) by searching broadly across behavioral interventions that included measures of group engagement, both general and special education settings, and both SCD and group designs.
We first examined the characteristics of the study set, including participants and settings, intervention components, and study quality. In alignment with previous reviews of group contingencies and class-wide interventions (e.g., Chaffee et al., 2017; Maggin et al., 2017), most studies occurred in general education classrooms with general education teachers as implementers and in elementary or middle school grades. About half of the total studies included students with disabilities in their sample. Researchers most commonly studied group contingencies or other multicomponent interventions, which also replicated findings from previous reviews (Chaffee et al., 2017; Maggin et al., 2017). Overall, the studies included in this review demonstrated strong methodological quality, with 90% of group designs and approximately 82% of SCD studies meeting WWC (2022) standards with or without reservations. This finding is particularly notable given the stringency of these standards.
We extended the work of previous researchers by coding for specific intervention components. The most common antecedent components were goal setting, precorrections, and teaming, and the most common consequence components were the use of points, tangible rewards, and teacher praise. These individual components were typically combined within the context of group contingencies. We highlight the finding that very few studies have investigated the effect of discrete antecedent or consequence practices (e.g., OTRs, BSP) on group engagement. Although previous reviews have indicated these practices are evidence-based (MacSuga-Gage & Simonsen, 2015; Royer et al., 2019), it is important to note that studies included in those reviews typically reported outcomes for individual students.
Our second research question pertained to the overall effect of behavioral interventions on group engagement. We conducted separate meta-analyses of the effect size estimates for SCD and group design studies, and both analyses indicated positive, statistically significant effects of the included interventions on group engagement. The overall effect size for single case studies was larger than group design studies (SMD = 2.28 vs. g = 1.69), which is a pattern noted by previous research on SCD effect size estimates (Shadish et al., 2014). Comparison with benchmarks for SCD effect sizes of engagement in early childhood (Chow et al., 2023) and elementary settings (Ledford et al., 2024) indicates that the overall effect we identified is larger than median SMD estimates for individual and group outcomes included in those reviews. Our identified effect sizes also align with the large positive effects identified by previous meta-analyses of group contingencies and class-wide behavioral interventions (Chaffee et al., 2017; Maggin et al., 2017). Nonetheless, given the association between publication bias and inflation of effect sizes (Chow, 2018), our results should be interpreted with that context in mind.
To answer our third research question, we investigated whether the overall effect was moderated by participant or intervention characteristics. Moderators included in the analysis of SCD studies included participant grade level, whether students with disabilities were included in the sample, the total number of antecedent components, the total number of consequence components, and the use of a group contingency. None of these characteristics moderated outcomes of the SCD studies. This could be due to characteristics of the studies themselves (e.g., not enough heterogeneity in intervention characteristics) or could be an artifact of our coding system. Given the small overall number of group design studies, we limited our moderator analysis to a comparison of outcomes for studies conducted in elementary versus middle school classrooms and found that studies conducted in middle school classrooms (k = 3) were not associated with an overall significant effect.
Limitations
The results of this systematic review and meta-analysis should be interpreted with the following limitations in mind. First, we detected publication bias in SCD studies. Notably, the appropriateness of methods for traditional bias detection analyses for data from SCD studies is uncertain. Additionally, our ability to compare the effects of published versus unpublished findings was limited. This could have led to inflation of the overall effect estimates. Second, we included all studies in our meta-analyses regardless of WWC quality rating. This approach allowed for broader representation of the existing literature in our meta-analyses; however, it was divergent from the WWC protocol, which recommends limiting syntheses and analyses to studies that meet standards with or without reservations. Third, we used within-case SMD effect sizes. Although we identified within-case SMD as the best SCD effect size for our data set, and its use increased the comparability of our results with previous large-scale reviews that included measures of engagement (e.g., Chow et al., 2023; Ledford et al., 2024), we acknowledge that the interpretability of our findings is limited given the differences between within-case SMD effect sizes and traditional group design effect sizes. Thus, differences in overall effect size estimates may be a product of the design features rather than the intervention itself.
Implications for Research and Practice
The results of this systematic review and meta-analysis indicate implications for both researchers and practitioners interested in behavioral interventions for increasing group engagement. Specific to researchers, there is a need for additional investigations of the effect of discrete practices such as OTRs and BSP on group engagement. There is also a need for additional research on increasing group engagement in resource or special education settings. Although we know that most students with disabilities spend the majority of their school day in general education classrooms (National Center for Education Statistics [NCES], 2024), those students also receive small group, targeted interventions in resource settings (Lemons et al., 2018), and 13% of all students with disabilities receive at least 60% or more of their instruction in special education settings across the United States (NCES, 2024). Although many students within special education settings require individualized behavioral supports, special educators who conduct large and small-group instruction would likely benefit from strategies that increase engagement for groups of students with disabilities.
Our findings indicate a disproportionate number of group design studies compared to SCD studies. We encourage researchers to conduct more methodologically rigorous group design research on behavioral interventions that can increase group engagement and to include direct observation measures. This is critical to increasing the generalizability of findings and their influence on policy. Although outside of the scope of the current study, the variability in how group engagement is measured should be further explored with updated recommendations. For example, we anecdotally noted that researchers used different versions of interval sampling that involved rotation through either individual students or sub-groups of students—these differences in measurement procedure may have important implications for obtained effects in research and feasibility in practice. Researchers have previously identified time sampling with rotation through individual students as a valid measure (Briesch et al., 2015). Future research should closely examine: (a) the relation between measurement procedures and group engagement outcomes in research and (b) methods for training school-based practitioners to adopt and analyze measures of group engagement.
The clearest implication for teachers is that behavioral interventions that include elements of a group contingency currently have the strongest support for increasing group engagement. This includes standardized options; for example, teachers can replicate procedures from published research on interventions such as CW-FIT (e.g., Kamps et al., 2011) or Tootling (e.g., Kirkpatrick et al., 2019). However, teachers may also wish to design a group contingency that is unique to the needs of their classroom and students. Based on the practice components identified in this review, we recommend that teachers include goal setting, precorrections, teaming, points (or other forms of tokens), praise, and tangible rewards. The combination of antecedent and consequence strategies implemented within the context of a multicomponent approach can also provide teachers with a flexible and scalable way to improve group engagement. At the district-policy level, professional development providers may want to consider embedding training on strategies proven to increase group engagement (e.g., group contingencies) in professional development frameworks.
Conclusion
This systematic review and meta-analysis synthesized research on strategies that can improve group engagement. Results indicate that a combination of antecedent and consequence strategies, specifically those provided within the context of a group contingency, is associated with meaningful improvement in group engagement. Overall, effect sizes were strong, but the comparability of findings across design types is limited. Nonetheless, forthcoming research on strategies that improve group engagement across educational settings is warranted. Practically speaking, group engagement can be an outcome measure that indicates the status of Tier 1 implementation within the context of MTSS. Therefore, investment from researchers, school-based personnel, and policy makers is critical to inform data-based decision-making processes that guide the provision of training, coaching, and advanced student-level behavioral supports.
Supplemental Material
sj-docx-1-pbi-10.1177_10983007261435207 – Supplemental material for Behavioral Interventions for Increasing Group Engagement in K-12 Classrooms: A Systematic Review and Meta-Analysis
Supplemental material, sj-docx-1-pbi-10.1177_10983007261435207 for Behavioral Interventions for Increasing Group Engagement in K-12 Classrooms: A Systematic Review and Meta-Analysis by Lauren M. LeJeune, Mark D. Samudre, Hongyang Zhao, Meredith Jeffords and Jason C. Chow in Journal of Positive Behavior Interventions
Supplemental Material
sj-docx-2-pbi-10.1177_10983007261435207 – Supplemental material for Behavioral Interventions for Increasing Group Engagement in K-12 Classrooms: A Systematic Review and Meta-Analysis
Supplemental material, sj-docx-2-pbi-10.1177_10983007261435207 for Behavioral Interventions for Increasing Group Engagement in K-12 Classrooms: A Systematic Review and Meta-Analysis by Lauren M. LeJeune, Mark D. Samudre, Hongyang Zhao, Meredith Jeffords and Jason C. Chow in Journal of Positive Behavior Interventions
Supplemental Material
sj-docx-3-pbi-10.1177_10983007261435207 – Supplemental material for Behavioral Interventions for Increasing Group Engagement in K-12 Classrooms: A Systematic Review and Meta-Analysis
Supplemental material, sj-docx-3-pbi-10.1177_10983007261435207 for Behavioral Interventions for Increasing Group Engagement in K-12 Classrooms: A Systematic Review and Meta-Analysis by Lauren M. LeJeune, Mark D. Samudre, Hongyang Zhao, Meredith Jeffords and Jason C. Chow in Journal of Positive Behavior Interventions
Supplemental Material
sj-docx-4-pbi-10.1177_10983007261435207 – Supplemental material for Behavioral Interventions for Increasing Group Engagement in K-12 Classrooms: A Systematic Review and Meta-Analysis
Supplemental material, sj-docx-4-pbi-10.1177_10983007261435207 for Behavioral Interventions for Increasing Group Engagement in K-12 Classrooms: A Systematic Review and Meta-Analysis by Lauren M. LeJeune, Mark D. Samudre, Hongyang Zhao, Meredith Jeffords and Jason C. Chow in Journal of Positive Behavior Interventions
Supplemental Material
sj-docx-5-pbi-10.1177_10983007261435207 – Supplemental material for Behavioral Interventions for Increasing Group Engagement in K-12 Classrooms: A Systematic Review and Meta-Analysis
Supplemental material, sj-docx-5-pbi-10.1177_10983007261435207 for Behavioral Interventions for Increasing Group Engagement in K-12 Classrooms: A Systematic Review and Meta-Analysis by Lauren M. LeJeune, Mark D. Samudre, Hongyang Zhao, Meredith Jeffords and Jason C. Chow in Journal of Positive Behavior Interventions
Supplemental Material
sj-docx-6-pbi-10.1177_10983007261435207 – Supplemental material for Behavioral Interventions for Increasing Group Engagement in K-12 Classrooms: A Systematic Review and Meta-Analysis
Supplemental material, sj-docx-6-pbi-10.1177_10983007261435207 for Behavioral Interventions for Increasing Group Engagement in K-12 Classrooms: A Systematic Review and Meta-Analysis by Lauren M. LeJeune, Mark D. Samudre, Hongyang Zhao, Meredith Jeffords and Jason C. Chow in Journal of Positive Behavior Interventions
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
