Abstract
This is an update of the Korpershoek et al. meta-analysis of the effects of classroom management interventions on various student outcomes in primary education. This updated analysis includes 22 new studies, resulting in an overall sample of 76 random and nonrandom controlled intervention studies published in the last 20 years (2003–2022). The summary effect was small and significant (average Hedges’ g = .23). Sensitivity analysis with robust variance estimation revealed a slightly smaller average effect (Hedges’ g = .22). The interventions were coded for the presence or absence of four categories of classroom management strategies, focusing on teacher behavior, teacher-student relationships, student behavior, and/or students’ social-emotional development. Recent studies placed more focus on teacher-student relationships than older ones. Moderator analyses yielded lower effects for interventions addressing all four categories of classroom management strategies, tentatively indicating that a too-broad focus seems less effective than more targeted interventions.
Keywords
Effective teaching and learning cannot take place in poorly managed classrooms (Gage et al., 2018; V. F. Jones & Jones, 2012; Marzano et al., 2003; Patall et al., 2023; Van de Grift et al., 2011). For beginning teachers, particularly, managing the class can be challenging (Woodcock & Reupert, 2023). In a safe and orderly learning climate, instructional time and opportunity to learn are maximized (Creemers & Kyriakides, 2010; Edmonds, 1979). And indeed, the meta-analysis of Korpershoek et al. (2016) confirmed that, generally, small positive effects are found of various classroom management strategies and programs on primary school students’ behavioral, social-emotional, motivational, and academic outcomes, and the meta-analysis of Patall et al. (2023) reported average positive effects of a well-organized and predictable classroom structure on students’ behavioral engagement and student academic achievement. As put forth by Chow et al. (2023), “A teacher’s approach to classroom management influences students’ engagement and academic achievement” (pp. 60).
Evertson and Weinstein (2006) defined classroom management as “the actions teachers take to create an environment that supports and facilitates both academic and social-emotional learning” (pp. 4−5). Although teachers can choose from many different actions, roughly five types of teacher actions can be distinguished: (a) develop caring, supportive relationships with and among students; (b) organize and implement instruction in ways that optimize students’ access to learning; (c) encourage students’ engagement in academic tasks; (d) promote the development of students’ social skills and self-regulation; and (e) use appropriate interventions to assist students with behavior problems. These teacher actions can be considered teachers’ classroom management strategies to improve student outcomes.
Instead of specific classroom management theories, classroom management literature largely draws upon principles of various psychological and ecological theories that aim to better understand human behavior and explain how behavioral change may occur. For example, drawing from the field of psychology, classroom management literature has focused on principles of behaviorism (based on the work of Skinner and Pavlov, among others), suggesting that behavior can be shaped and regulated by reinforcement and punishment, and on principles of cognitivism (Piaget, among others), emphasizing the importance of reflection and thought processes to regulate behavior. Drawing from ecological theories, classroom management literature has focused on principles of humanistic theories (based on the work of Rogers and Maslow, etc.), addressing the importance of focusing on the student as a whole and addressing students’ psychological needs, and on principles of democratic theory (e.g., Kohn, Dewey) that stresses the importance of developing a sense of shared responsibility and mutual respect in a classroom. In the past decades, many interventions and strategies have been developed based on these and other theoretical principles, both to support teachers in effectively managing their classrooms and to support students in their behavioral and social-emotional development. These include, for example, the widespread School-Wide Positive Behavior Support (SWPBS) and Good Behavior Game (GBG) programs.
This study presents an update of Korpershoek et al.’s (2016) paper “A Meta-Analysis of the Effects of Classroom Management Strategies and Classroom Management Programs on Students’ Academic, Behavioral, Emotional, and Motivational Outcomes,” which summarized the effects of various classroom management interventions on student outcomes. With this updated meta-analysis, we hope to inform primary schools and teacher training institutes about the current evidence base regarding the effectiveness of a wide variety of different types of classroom management interventions, including commonly used interventions such as SWPBS and GBG, on various student outcomes. These insights are important to promote evidence-informed practice in schools when deciding on suitable interventions to improve, for example, student behavior. Teachers often feel underprepared to deal with challenging student behavior (Gilmour et al., 2022; Shank & Santiague, 2021; Woodcock & Reupert, 2023). Moreover, effective classroom management promotes students’ academic development (Korpershoek et al., 2016), presumably via improved learning conditions (Freiberg et al., 2009), such as improved instruction practices (Gage et al., 2018), a safe and orderly learning climate (Edmonds, 1979; Patall et al., 2023), and increased time-on-task (Brophy, 1986; Creemers & Kyriakides, 2010). Regrettably, beginning teachers generally seem to receive limited training in evidence-based strategies (Cooper et al., 2017; Freeman et al., 2014; Woodcock & Reupert, 2023). Research has shown that teachers’ knowledge about effective classroom management strategies correlates with teachers’ use of these strategies (Cooper et al., 2017; Moore et al., 2017), which emphasizes the importance of providing an up-to-date knowledge base.
We focus particularly on classroom management in primary education, not only because many primary school teachers struggle with managing their classrooms (Woodcock & Reupert, 2023), but also because “young children swiftly acquire and infer norms in a variety of social contexts” (Schmidt & Rakoczy, 2023, p. 193), emphasizing the important foundational stages for developing, for example, prosocial behavior during primary school (see also Carter & Doyle, 2006). Similarly, prior research on schoolwide interventions—for example, antibullying programs—revealed that these programs were more effective in changing student behavior in primary education than in secondary education (see Yeager et al., 2015).
Korpershoek et al. (2016) included 54 (non-)random controlled intervention studies conducted in primary education that were published between 2003 and 2013. Student outcomes were categorized as follows: academic outcomes (e.g., achievement tests), behavioral outcomes (e.g., on-task/off-task behavior, disruptive behavior, self-control, behavioral engagement), social-emotional outcomes (e.g., social skills, social competence, emotion recognition, coping, and empathy), motivational outcomes (e.g., learning motivation, achievement goal orientation), and other relevant student outcomes (e.g., time-on-task, self-efficacy, peer acceptance). The meta-analysis revealed that all students may benefit from classroom management interventions in primary education. Across all interventions, the overall effect on student outcomes was g = .22 (SE = .02, p < .01), which is a small effect size. The overall effect was .17 after taking publication bias into account. There were no significant differences between various outcome categories (academic, behavioral, social-emotional, and motivational outcomes), meaning the interventions generally had positive effects on a variety of student outcomes, not only their behavioral outcomes. Moderator analyses revealed that interventions focused on the social-emotional development of the students were somewhat more effective than interventions without this component, in particular for social-emotional outcomes (see Korpershoek et al., 2016, pp. 668−669).
In the current study, we present an update of the Korpershoek et al.’s (2016) meta-analysis by adding a decade of newly published studies to the original sample of included studies. Since then, new interventions have been developed and evaluated and new publications appeared on previously evaluated interventions. In the Korpershoek et al. (2016) meta-analysis, the effects of SWPBS, GBG, Promoting Alternative THinking Strategies (PATHS), Second Step, and Zippy’s Friends on student outcomes could be compared, as multiple studies evaluated the effectiveness of these interventions. An important question is whether or not the average effect sizes that were reported for these interventions can be replicated in more recent studies, and whether newly developed interventions (also) show positive effects in multiple studies. Replication studies in educational sciences are rare (Perry et al., 2022) but are evidently important for the robustness of the reported findings. Moreover, many of the previously studied interventions have undergone some adjustments and adaptations, largely becoming more comprehensive (see Estrapala & Lewis, 2022, for current trends in PBIS programs), which may result in more positive student outcomes.
The current study builds on several recently published review studies and meta-analyses that focused on specific classroom management interventions or strategies and together form the current body-of-knowledge on effective classroom management. Smith et al. (2019) 1 published a meta-analytic review solely focusing on seven randomized controlled trials of GBG implemented in primary education. They reported small treatment effects of GBG on proximal student outcomes (Hedges’ g = .09–.32). For a recent review of token economy practices as classroom management strategy in primary education, we refer to Kim et al. (2021). These authors reported that token economy practices such as positive reinforcement of desired student behavior are generally effective in changing behavior. The reported average effect size (improvement rate difference) of .83 indicates a large effect on student behavior. For recent reviews on the effects of professional development on teachers’ classroom management behavior, we refer to the recent review studies of Hirsch et al. (2021), Paramita et al. (2020), and Wilkinson et al. (2020). The systematic review of Hirsch et al. (2021) describes seven experimental studies on professional learning and development in classroom management for novice teachers. Generally, practice-based professional development improved novice teachers’ classroom management skills. Paramita et al. (2020) reviewed 18 empirical studies (group comparison studies and single-case studies) and found that most of the effective interventions trained teachers on a specific strategy (e.g., praise good behavior) or a combination of preventative behavior strategies. In Wilkinson’s et al. (2020) comprehensive review of 74 empirical studies, didactic (direct) instruction, coaching, performance feedback, and combinations thereof were the most frequently identified professional development components that resulted in the desired changes in teachers’ classroom management behavior in primary and secondary education.
With the aim of presenting an updated version of the earlier meta-analysis, our leading research questions are as follows:
What is the average effect of classroom management interventions on students’ academic, behavioral, social-emotional, and motivational outcomes in primary education, and which classroom management interventions effectively support and facilitate student learning?
To what extent do the classroom management strategies targeted in the interventions published between 2003–2013 differ from the classroom management strategies targeted in the interventions published between 2014–2022?
Which characteristics of the interventions, samples (incl. SES), and measurement instruments moderate the effects of the classroom management interventions?
For the first research question, it is important to clarify what we interpreted as a classroom management intervention, as this is not so easy to define. Many of the interventions that were included in the earlier meta-analysis did not refer to their own intervention as a classroom management intervention, even though they included many components that fit within the classroom management definition of Evertson and Weinstein (2006). For this updated meta-analysis, we replicate Korpershoek et al.’s (2016) demarcation of classroom management interventions—namely, coding interventions for the presence or absence of four categories of classroom management strategies in the interventions (p. 646). The four categories are based on the five teacher actions for high-quality classroom management distinguished by Evertson and Weinstein (2006), but separate teacher-focused and student-focused actions in a more structured manner (see also Wubbels, 2011). For example, with respect to the first teacher action (“develop caring, supportive relationships with and among students”), we distinguished between “developing caring, supportive relationships with students” and “developing caring, supportive relationships among students.” Both preventive and reactive classroom management strategies are included (Lane et al., 2011). For example, the establishment of rules and procedures and favorable teacher-student relationships are considered preventive strategies, whereas disciplinary interventions such as giving warnings or punishments are considered reactive strategies. The categories are as follows:
Teachers’ behavior-focused. The focus is on improving teachers’ classroom management (e.g., keeping order, introducing rules and procedures, disciplinary interventions) and thus on changing teachers’ behaviors, in line with Evertson and Weinstein’s (2016) teacher actions “organize and implement instruction in ways that optimize students’ access to learning” and “encourage students’ engagement in academic tasks.” Both preventive strategies and reactive strategies are included in this category (e.g., focusing on teacher behavior such as with-it-ness, overlapping, momentum, smoothness, and group alerting; see Kounin, 1970);
Teacher-student relationship–focused. The focus is on improving the interaction between teachers and students and thus on developing caring, supportive relationships, in line with the teacher action “develop caring, supportive relationships with students.” Only preventive strategies are included in this category, following the idea that explicit focus on the use of child-teacher relationships as preventive action promotes students’ development of social and academic competencies (Pianta, 1999);
Students’ behavior-focused. The focus is on improving student behavior, for example, via group contingencies (Kelshaw-Levering et al., 2000) or improving self-control among all students, in line with the teacher action “use appropriate interventions to assist students with behavior problems.” Both preventive and reactive strategies are included in this category, following the idea that behavior can be shaped and regulated by reinforcement and punishment (behaviorism) and the idea that reflection and thought processes promote self-control (cognitivism);
Students’ social-emotional development–focused. The focus is on improving students’ social-emotional development, such as enhancing empathic and prosocial skills, and improving peer relations, thus in line with the teacher actions “develop caring, supportive relationships among students” and “promote the development of students’ social skills and self-regulation.” Both preventive and reactive strategies are included in this category, among others, following the idea that prosocial skills and positive peer relations promote students’ school adjustment and achievement (Espelage et al., 2023; Freiberg et al., 2020; Gifford-Smith & Brownell, 2003).
These categories of classroom management strategies are not considered mutually exclusive, as most interventions have a broader focus and incorporate multiple categories of classroom management strategies. By coding all interventions on the presence or absence of these four categories of strategies, meta-regression analyses can be conducted to explore which combination(s) of strategies moderate the effects of the interventions. Moreover, it is important to note that, over time, multimodal social-emotional learning (SEL) interventions increased in popularity, and many of the interventions included in the former meta-analysis are nowadays considered SEL interventions (see Espelage et al., 2023; Freiberg et al., 2020, for a discussion on the connection between SEL and classroom management). Despite differences in semantics, we considered these interventions to be relevant when classroom management strategies are part of their focus.
With a twenty-year timespan of primary studies included in the present meta-analysis, we can also evaluate focus shifts in the classroom management interventions during this period (see also Korpershoek et al., 2022), as the educational and societal contexts have changed over time. Classroom management in educational practice is always moving, partly due to societal influences such as the recent COVID-19 pandemic that required a transition to emergency remote teaching but also because of changing views on education in general, including, for example, behavioral expectations of students (Sabornie & Espelage, 2023). In this respect, Freiberg et al. (2020) reported a growing trend from what they referred to as “compliance and obedience [. . .] to a model of self-discipline and self-direction” (p. 319) in classroom management and discipline interventions. As a consequence of these changes in educational practice, one could expect that the focus of the interventions may have shifted as well—for example, focusing more strongly on student self-regulation and promoting positive teacher-student relationships, which is why we added research question 2.
Research question 3 focuses on moderation effects. A reanalysis of the effects on student outcomes based on a much larger sample of primary studies (namely, those published within a twenty-year instead of a ten-year timespan), increases the power of the analyses. This yields a more precise estimate of the summary effect of the primary studies (i.e., the average intervention effect on student outcomes) and is especially beneficial for moderator analyses, which suffer the most from low power when the number of primary studies in a meta-analysis is not so high. A higher number of primary studies makes a comparison of the effectiveness of various interventions, or types of interventions, more robust, because each category of interventions is likely evaluated in a higher number of primary studies. The results of the moderator analysis will be less affected by incidental findings. Furthermore, this means that smaller differences between groups will be noticed. Moreover, a higher number of primary studies enables us to conduct meta-regression analyses with multiple predictors, allowing us to analyze the effect of intervention type simultaneously with characteristics of the samples included in those studies.
The quality of classroom management can be considered crucial in creating equal opportunities for learning (Ponitz et al., 2009). In the light of the worldwide increase of the achievement gap between children from lower and higher socioeconomic backgrounds (SES; Liu et al., 2022), it is important to explore whether classroom management interventions are effective in promoting positive outcomes for students from low socioeconomic backgrounds. Low-SES children may benefit more from well-managed classrooms than other children. Schmerse (2020), for example, found that low-SES children benefited the most from high preschool quality (measured by the quality of teacher-child interactions), resulting in higher levels of persistence and, indirectly, academic achievement. Similarly, Ponitz et al. (2009) reported on schools serving low-SES children and found that classroom quality, including the quality of classroom management, positively influenced students’ behavioral engagement, which in turn improved their academic achievement. Hence, in the current study, differential effects of the interventions for students from different socioeconomic backgrounds are presented in addition to generic intervention effects as part of research question 3.
Method
The meta-analysis was designed according to the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) standard (Page et al., 2021). The PRISMA standard has been developed for systematic reviews of studies that evaluate the effects of interventions and provides guidelines to identify, select, appraise, and synthesize studies. In addition, it describes procedures to obtain transparency, completeness, and accuracy in reporting in systematic reviews and meta-analyses. In the following text, we therefore provided a detailed description of each step in the research process.
Literature Search
In this paper, all primary studies included in the Korpershoek et al. (2016) study are combined with eligible primary studies published between 2014 and 2022. New literature searches for studies published between January 2014 and December 2022 were conducted in the online databases ERIC, Web of Science, and PsycINFO, as the meta-analysis of Korpershoek et al. (2016) included studies until December 2013. In this method section, we describe only the procedures that were followed in the new literature search.
Only peer-reviewed publications in English were included. The following combinations of search terms were used (abstract/topic): (classroom management, classroom organi*ation, behavio*r* management, classroom technique, teach* strateg*, classroom discipline, or group contingenc*) in combination with (academic outcomes, academic achievement, performance, on-task, off-task, time-on-task, student engagement, academic engagement, student behavio*r, classroom behavio*r, social-emotional learning, social-emotional outcomes), and in combination with (primary education, elementary education, primary school, elementary school, Kindergarten, pre-K, K-12). Moreover, we searched for specific programs: (Good Behavio*r Game, Colo*r Wheel System, Classroom Organi*ation and Management Program, Daily Behavio*r Report Card, Peacebuilders, Promoting Alternative Thinking Strategies, Positive Behavio*r Support, Consistency Management & Cooperative Discipline, Zippy’s Friends, Second Step, GBG, PATHS, *PBS, Incredible Years).
Inclusion Criteria
To be eligible for inclusion in the current meta-analysis, all intervention studies had to meet five criteria. The first criterion was that all reported intervention studies specifically needed to focus on interventions implemented by teachers in regular, primary-school classrooms. As a second criterion, we focused on interventions targeting (basically) all students in the classroom. Thus, interventions focusing on one or only a small group of students in a classroom were not eligible. Thirdly, studies could only be considered if the outcome variable pertained to measures of academic outcomes (e.g., achievement tests), behavioral outcomes (e.g., on-task/off-task behavior, disruptive behavior, self-control, behavioral engagement), social-emotional outcomes (e.g., social skills, social competence, emotion recognition, coping, and empathy), motivational outcomes (e.g., learning motivation, achievement goal orientation), or other relevant student outcomes (e.g., time-on-task, self-efficacy, peer acceptance). The fourth criterion was that studies had to apply a (quasi-)experimental design with control groups (no treatment or treatment as usual). Eligible studies followed a design in which participants were randomly assigned to treatment and control or comparison conditions, or were matched into treatment and control conditions, and the matching variables included a pretest for the outcome variable, or pretest differences were statistically controlled for using analyses of covariance. A study could also be included if subjects were not randomly assigned or matched, but the pre-posttest design provided sufficient statistical information to derive an effect size or to estimate group equivalence from statistical significance tests.
In total, 1,917 records (ERIC 309, PsychINFO 545, Web of Science 1,057) were found. After deleting duplicates (842) and initial screening of the titles and abstracts to eliminate off-topic papers (882), 219 records were selected for further inspection, of which 73 (33.3%) were assessed by two researchers. A third researcher was consulted when necessary. The researchers met on several occasions to discuss the eligibility of the papers. The remainder of the papers that were selected for further inspection were assessed by one researcher, but the other researchers were consulted when needed. The final decisions were based on complete consensus.
In total, 28 papers published between 2014 and 2022 met all inclusion criteria. These 28 papers reported on 22 new intervention studies. Seven of the 28 papers reported additional outcomes on intervention studies that were already described in one of the other new papers (e.g., Aasheim et al., 2019, 2020, reported on the same intervention study as Aasheim et al., 2018). Moreover, one of the 28 papers reported on two new intervention studies, thus in total 22 new intervention studies are included. Hence, the total sample of included primary studies published between 2003 and 2022 is 76 (54 from the Korpershoek et al. (2016) meta-analysis and 22 from the new literature search).
The other 191 papers were excluded based on any of the following reasons: (a) not reporting on a relevant intervention or not conducted in regular primary education (criterion 1; n = 76), which also includes theoretical and review studies, other types of interventions (e.g., reading interventions, anti-bullying interventions), and studies conducted in schools for students with special educational needs or preschool; (b) focusing on teachers’ or school leaders’ outcomes (thus, not on students’ outcomes) or on individual students with specific behavioral characteristics, such as emotional and behavioral disorders (criterion 2; n = 58); (c) only including student data or student outcomes that we thought irrelevant for this review (e.g., bullying, suicide prevention, executive functioning; criterion 3; n = 10); (d) not meeting the research design criteria (criterion 4; n = 39)—for example, because they used multiple baseline designs or designs without control group, had no equal groups for comparison because they reported on a single case (one teacher), or because it was ethnographic studies—(e) other reasons, namely reporting on the same intervention study as already included papers, without additional relevant student data for this review (n = 2), the full paper was not available (n = 5) or provided insufficient statistical data to be included (n = 1). Figure 1 provides an overview of the search strategy.

Overview search strategy.
Coding of Intervention Studies
We coded the interventions according to the targeted classroom management strategies, thus whether the intervention focused on teacher behavior, teacher-student relationships, student behavior, and/or students’ social-emotional development. For example, the GBG intervention was classified as being focused on teacher behavior and student behavior, and the PATHS intervention was classified as being focused on student behavior and students’ social-emotional development. When the included paper lacked sufficient details on the intervention content, additional information was searched via other scientific papers, reports, and websites. At least two researchers assessed and classified each intervention via the obtained information. Classifications were discussed until full agreement was reached. Interventions that were already included in the Korpershoek et al. (2016) meta-analysis were reassessed, as the focus of the interventions may have changed over time. That is, the content of the interventions may have been adapted to some extent, such as adapted and/or new elements and learning materials. In this process, only one intervention was classified differently in newer studies due to an extended focus. Incredible Years was originally classified as being focused on teacher behavior, student behavior, and students’ social-emotional development. Incredible Years offers a variety of early intervention programs that have been adapted and extended over the years. Following the intervention and implementation details described in the newly included studies, we saw an extended focus compared to the original studies—namely a more explicit focus on teacher-student relationships. Subsequently, in the meta-analysis, we treated the “original” and “new version” as different interventions due to the difference in classification.
We coded intervention characteristics such as the name of the intervention, the country in which the study was executed, SES of the student sample, the grade years of the sample, the sample size, the intervention duration, and publication year. Furthermore, we coded characteristics of the student outcome measures. For the outcome type, we distinguished between academic, behavioral, social-emotional, motivational, and other outcomes. We also coded the name of the measurement instrument and the identity of the rater (i.e., the person who filled in the measurement instrument), distinguishing between students, teachers, observers, and other individuals. We did not find intervention studies that were conducted in online teaching settings.
Data Analyses
To answer our research questions, we performed several meta-analyses. In the following sections, we describe the steps we took to analyze the dataset of 76 intervention studies.
Effect Size Calculation of the Individual Interventions
Based on the statistical information provided in the primary studies, we calculated an effect size and variance for each outcome measure. We first calculated a Cohen’s d and variance according to formulas 15 and 16 as provided by Hedges (2007). These formulas take into account the hierarchical structure of the data, with students nested in classes or schools. When statistical data were only available at the cluster level (i.e., school or class), we first calculated a Cohen’s d at cluster level (formula 11 in Hedges, 2007) and then recalculated this cluster-level effect size into an effect size at the student level (formula 6 in Hedges, 2007). The calculations of the Cohen’s d and its variance required an intraclass correlation. When this information was not included in the primary study, we estimated it at .2 for academic outcomes and at .1 for behavioral, social-emotional, motivational, and other outcomes. These values are in line with those mentioned in the procedures handbook of What Works Clearinghouse (2020). As a last step, we converted Cohen’s d and its variance to Hedges’ g and its variance. Hedges’ g has the advantage over Cohen’s d that it is not biased in small samples (Borenstein et al., 2009). Table 1 presents an overview of the average calculated effect sizes, interventions, and focus of each intervention included in the meta-analysis. The column named Intervention includes the abbreviated names of interventions that were tested in five or more intervention studies in the meta-analysis, which are PATHS, GBG, and the new version (with extended focus) of Incredible Years. All other interventions (including the original version of Incredible Years) are labeled as “other”.
Overview of Intervention Studies and Their Focus and Effect Size
Notes. Some interventions were examined in more than one paper, resulting in more than one reference to a paper per row. Moreover, some papers included more than one intervention group, resulting in multiple rows per paper.
T = teacher behavior focus; R = teacher-student relationship focus; B = student behavior focus; S = students’ social emotional development focus.
Meta-Analysis
We used the R package metafor (Viechtbauer, 2010) to calculate a summary effect and perform moderator analyses with meta-regression models. With this package, we fitted a three-level meta-analysis model with the students in the individual interventions at level 1, the (multiple) effect sizes of each intervention at level 2, and the intervention itself at level 3 (function rma.mv). We followed the guidelines given by Harrer et al. (2021) to perform the meta-analysis.
Sensitivity Analysis
Although the metafor package takes into account the nested structure of the data and, thus, the dependency of effect sizes within interventions, it does not take into account the likely correlation between effect sizes within the same intervention. We therefore also analyzed the data in metafor in combination with the R package clubSandwich (Pustejovski, 2022). This package allows for performing a three-level meta-analysis with an assumed correlation between effect sizes within studies: robust variance estimation. We calculated the average correlation between the first four effect sizes within the interventions, and we used this value, which was .49, as the assumed correlation between effect sizes within the same intervention. As an omnibus test for between-group differences for the moderator models with robust variance estimation, we used the Wald test that is included in the clubSandwich package. By applying the regular three-level meta-analysis as well as three-level meta-analysis with robust variance estimation, we guarded ourselves against potential misspecification of the model with respect to the confidence intervals and p-values (Harrer et al., 2021). Which of the two analysis models is the most correct representation remains unknown, as the correlations between the outcomes within each study are unknown. When both models yield the same finding—for example, both models yield a significant omnibus test of the moderator—we interpreted this as a strong indication for an association between the moderator at hand and the intervention effect. When one model yields a significant omnibus test for a moderator and the other model does not, we interpreted this as a less strong indication of an association between the moderator and the intervention effect.
Analysis of Publication Bias
We examined whether the results were affected by publication bias related to small study effects. Small studies with nonsignificant effects are less likely to be published than large studies with nonsignificant effects. Moreover, in small studies, only very large effects become statistically significant, which may result in an overrepresentation of small studies with very large effects compared with small studies with smaller or no effects (Harrer et al., 2021). In line with Fernández-Castilla et al. (2021), we used the Egger regression test and the Funnel Plot test to examine publication bias in a three-level meta-analysis. Both tests indicate whether the funnel plot is asymmetrical. An asymmetrical funnel plot is a sign of publication bias related to small study bias in the meta-analysis. The Egger regression test examines the effect of the standard error, and the Funnel Plot test examines the effect of the sample size. As recommended by Rodgers and Pustejovsky (2021), we used two types of the Egger regression test: an Egger test for the regular three-level meta-analysis model (the Egger MLMA) and the Egger RVE for the model with robust variance estimation (also called the Egger sandwich test). As it was possible to run both types (MLMA and RVE) for the Funnel Plot test, we also executed both versions for this test.
Results
Descriptive Results
The meta-analysis includes 76 intervention studies, reported in 75 articles and published between 2003 and 2022). Of the 76 intervention studies, 52 were executed in North America (51 United States, one in Canada), 21 in Europe (Norway, Ireland, United Kingdom, Turkey, Lithuania, Belgium, Denmark, Germany, the Netherlands, and Luxembourg), and three in other parts of the world (Jamaica, Iran, and Taiwan). About one-third of the intervention studies were performed in grade one or lower years, another third in grade two or higher years, and the last third were performed in both grade one or lower years and grade two or higher years. Eleven interventions had a duration of 13 weeks or less, 43 had a duration between 13 weeks and one school year, and 22 interventions lasted more than one school year. The size of the experimental group varied from small (up to 100 students in 18 intervention studies), to medium (between 100 and 499 students in 34 intervention studies), to large (more than 500 students in 24 intervention studies).
Not all studies reported students’ SES, and those that did report SES used a variety of indicators. In total, 30 included studies used free or reduced lunch (FRL) status as an indicator. When the sample consisted of more than 40% students with FRL, we classified the sample as low SES. Other indicators include parental education (5 studies), parental income (5 studies), Head Start criteria (generally based on low income of parents, children in foster care, etc.; 5 studies), labeled as mid/high SES by the authors (2 studies), a combination of the above (13 studies), and 16 studies did not provide any information about SES. Over half of the intervention studies (42) focused on low-SES students, and 18 studies on mid- or high-SES students.
There were three interventions that were tested in five or more intervention studies in the meta-analysis. These were PATHS (14 interventions), GBG (8 interventions), and Incredible Years (IY, three original version and six new version with extended focus). Most interventions included student-behavior-focused classroom management strategies (65 interventions) and/or student socio-emotional development–focused strategies (53 interventions). Furthermore, 46 interventions included teacher-behavior focused strategies, and 10 interventions included strategies to improve teacher-student relationships.
The 76 intervention studies together reported results on 407 outcome measures. Most measures pertained to student behavior (191), followed by students’ social-emotional outcomes (117), academic outcomes (64), and motivational outcomes (14). The remaining 21 outcomes pertained to other types of student outcomes, such as self-efficacy, self-confidence, peer relations, and teacher-student relationships. Five measurement instruments were used in at least five different intervention studies to measure the effects. These were the Strengths and Difficulties Questionnaire (SDQ; Goodman, 1997), the Social Skills Rating System (SSRS; Gresham & Elliott, 1990), the Teacher Observation of Child Adaptation-Revised (TOCA-R; Werthamer-Larsson et al., 1991), the Social Competence Scale (SCS; Conduct Problems Prevention Research Group, 1995), and the Teacher Report Form (Achenbach & Rescorla, 2001). For 235 outcome measures, the teacher was the rater, for 122 measures it was the student, for 46 measures an observer rated the outcomes, and for the remaining four measures other individuals were raters.
Summary Effect of the Meta-Analysis
The summary effect of the 76 intervention studies on all types of student outcomes is Hedges’ g = .23 (SE = .029, p < .001), which represents a small, significant effect. The total I2 of 78.28% indicates substantial heterogeneity in effect sizes. A large part of this heterogeneity in effect sizes is due to differences in effect between intervention studies (56.82% is between-study variance and 21.46% within-study variance). The estimated true between-study variance is .043. A sensitivity analysis with robust variance estimation, which takes into account an assumed correlation between outcomes within intervention studies (which we assumed to be .49), yields a slightly smaller summary effect—namely Hedges’ g = .22 (SE = .029, p < .001).
We examined whether the summary effect was influenced by publication bias. Figure 2 shows the funnel plot of standard error by effect size. Studies including small samples have higher standard errors and are thus displayed in the lower part of the funnel plot. We examined whether the funnel plot was asymmetrical. The Egger MLMA regression test did not indicate the presence of funnel plot asymmetry as the effect for standard error as moderator is .263 (SE = .220, p = .232), nor did the Egger RVE: the effect for standard error as moderator is .253 (SE = .277, p = .367). The Funnel Plot test, with the sample size as moderator, also indicated no publication bias (effects for sample size as moderator were .000, SE = .000) for the regular three-level meta-analysis as well as for the model with robust variance estimation (p = .239 and p = .146, respectively). Therefore, it seems that the findings are robust. The power to detect publication bias related to small study effects is, however, not very high, but better methods for three-level models are not yet available (Rodgers & Pustejovski, 2021).

Funnel plot of effect sizes by standard error.
Focus of the Interventions
Compared to the interventions included in the Korpershoek et al. (2016) meta-analysis on classroom management interventions, the newly included interventions focused significantly more often on the relationship between teachers and students (prior meta-analysis: 3.7%, newly included interventions: 36.4%; χ2(1) = 14.59, p < .001). The old and newly included interventions did not differ significantly regarding the other foci—that is, the difference with respect to the focus on teacher behavior was just not significant (prior meta-analysis: 53.7%, newly included interventions: 77.3%, χ2(1) = 3.63; p = .057). There was no significant difference in focus on students’ behavior (prior meta-analysis: 85.2%, newly included interventions: 86.4%, χ2(1) = .018, p = .895). The difference in focus on students’ social-emotional development was larger but not significant (prior meta-analysis: 74.1%, newly included interventions: 59.1%, χ2(3) = 1.66, p = .197).
Moderator Analysis
The heterogeneity in effect sizes indicates that the intervention studies do not share the same true effect. With moderator analysis, we examined whether characteristics of the intervention, the measurement instruments, and the sample influenced the effect. In the Korpershoek et al. (2016) meta-analysis, analyses were done for each outcome type separately. When the omnibus moderator test (F-test) for outcome type yields significant between-group differences, we will conduct the analyses for each outcome type separately as well. For each moderator, we tested a regular three-level meta-regression model and a model with robust variance estimation. When the omnibus test for a moderator was significant, we conducted additional analyses to test which pairs of categories (contrasts) of the moderator differed significantly from each other. For all meta-regression results, the intercept reflects the average effect size for the reference category of the moderator at hand. The effects for all other categories of the moderator represent the difference in effect compared to the reference category.
Outcome Type
Table 2 shows the results of the meta-regression with outcome type as moderator (academic, behavioral, social-emotional, motivational, and other outcomes). In the model with the regular meta-regression as well as in the model with robust variance estimation, none of the outcome types differ significantly from the behavior outcomes (the reference category in this analysis). Furthermore, in both models, the omnibus test of the moderator, the F-test, indicates no significant between-group differences for outcome-type categories. This means that the outcome type does not significantly influence the intervention effect and, therefore, does not explain the heterogeneity in effect sizes of the intervention studies. Further moderator analyses are, therefore, conducted only for all outcome types together and not for each outcome type separately.
Meta-Regression with Outcome Type as Moderator
Note. n = number of outcome measures; K = number of interventions.
Intervention
Table 3 shows the results for the meta-regression in which the three interventions, which were tested in five or more intervention studies in the meta-analysis, are compared with the reference group “all other interventions.” This applied to GBG, IY new, and PATHS, enabling us to compare the effects of these interventions on student outcomes. The omnibus tests of the moderators indicated significant between-group effects for intervention, but only for the model with robust variance estimation. As can be seen in Table 3, the robust variance estimation model suggests that the new version of the IY new intervention has a significantly lower effect compared to the reference group. Additional contrast tests also showed a significant lower effect of IY new compared to PATHS (p = .030) and a just not significant lower effect than GBG (p = .050) in the robust variance estimation-model.
Meta-Regression With Specific Interventions
Note. n = number of outcome measures; K = number of interventions.
Focus
Next, we examined whether the focus of the intervention moderated the effect size. As most interventions had multiple foci (e.g., a combination of teacher behavior–focused and student behavior–focused classroom management strategies), we conducted the analyses for the different combinations of foci. Table 4 shows the results for all focus combinations. The reference category is the group of interventions that had the broadest focus—namely, all four categories of classroom management strategies. The omnibus tests of the moderators indicated significant between-group differences related to the focus of the interventions in the regular meta-regression but not in the meta-regression with robust variance estimation. Table 4 reports the highest effect for the combined focus on teacher behavior and on teacher-student relationships, being about Hedges’ g = .9 more effective than the interventions that combined all four categories of classroom management strategies. However, this effect was not significant in the robust variance estimation model. In addition, it should be noted that only two intervention studies (Fernandez et al., 2015; Soheili et al., 2015) were classified as a combined focus on teacher behavior and teacher-student relationships. The second-highest effect is for the combined focus on teacher behavior and students’ social-emotional development. Only three intervention studies were classified as having this combined focus on teacher behavior and students’ social-emotional development, all reported in one paper (Lynch et al., 2003).
Meta-Regression With Combinations of Foci as Moderator
n = Number of outcome measures; K = number of interventions; T = teacher behavior focus; R = teacher-student relationship focus; B = student behavior focus; S = students’ social emotional development focus.
The findings above with regard to the most effective combinations of foci are based on categories with a very low number of interventions and might be the result of chance. We therefore performed an additional meta-regression for focus. We recoded the variable for the combinations of foci and merged categories with fewer than five interventions into one group, named “all other combinations of foci”. Table 5 shows the results with this recoded version of the variable. The omnibus test again indicates a significant between-groups effect, but now only for the model with robust variance estimation. Table 5 shows that the group with all four foci has a significant lower effect than the group with the combined focus on students’ behavior and on students’ social-emotional development. This effect is only present with robust variance estimation and not in the regular meta-regression. Furthermore, in both models, the group with all four foci has a significantly lower effect than the group “all other foci combinations”. We also tested for differences between other pairs of categories but found no other significant differences.
Meta-Regression With Combinations of Foci as Moderator; Categories With Less Than Five Interventions Merged Together
n = Number of outcome measures; K = number of interventions; T = teacher behavior focus; R = teacher-student relationship focus; B = student behavior focus; S = students’ social emotional development focus.
Measurement Instrument
Table 6 shows the results for the meta-regression with the five most used measurement instruments (SDQ, SSRS, TOCA-R, SCS, and TRF), with all other measurement instruments (combined) as the reference category. The omnibus tests of the moderator were not significant, indicating that the measurement instrument did not influence the intervention effect. The measurement instrument thus does not explain the heterogeneity in effect sizes of the intervention studies.
Meta-Regression With Measurement Instrument as Moderator
Note. n = number of outcome measures; K = number of interventions.
Rater
Table 7 reports the results for the meta-regression with the rater as moderator (we distinguished teachers, students, observers, and others). For the regular meta-analysis, no significant moderating effect of the rater was found; however, for the model with robust variance estimation, the omnibus test indicated significant between-group differences. This indicates that the rater explains part of the heterogeneity in effect sizes of the intervention studies. As can be seen in the rightmost column in Table 7, significant differences between teachers-as-raters and others-as-raters were found. Additional analysis in which we tested the other contrasts also indicated a difference between students-as-raters and others-as-raters (p = .006) and between observers-as-raters and others-as-raters (p = .004) in the model with robust variance estimation. However, it should be noted that all four outcomes that were based on others-as-raters stemmed from one intervention study in which students’ classmates were the raters. Therefore, the previous conclusions should be interpreted with caution and have limited generalizability.
Meta-Regression With Rater as Moderator
Note. n = number of outcome measures; K = number of interventions.
Country
We compared the interventions executed in the United States to those executed in other countries. Both the regular meta-regression (Hedges’ g = .034, SE = .060, p = .572) and the meta-regression with robust variance estimation (Hedges’ g = .022, SE = .067, p = .740) yielded a nonsignificant effect. This indicates that the effects of the interventions found in the United States are comparable to those found in studies conducted in other countries.
Students’ SES
Analysis of the effect of students’ SES showed no influence of SES on the intervention effect. The regular meta-regression yielded an effect of Hedges’ g = −.009 (SE = .058, p = .879) and with robust variance estimation an effect of Hedges’ g = −.025 (SE = .047, p = .606) for mid- and high-SES students compared to low-SES students. This indicates that the effects of the interventions were comparable for low-SES students and mid- and high-SES students. Intervention studies with no information on students’ SES were left out of this analysis.
Duration of the intervention
Table 8 shows the results for the model in which the duration of the intervention served as moderator. For the regular meta-regression model, the omnibus test of the moderators yields significant between-group differences related to the intervention duration. For the regular meta-regression model, Table 8 shows significant differences between interventions shorter than 13 weeks and interventions with a longer duration, the shorter interventions showing larger positive effects on student outcomes. The difference in effect between interventions with a duration between 13 weeks to 1 year and interventions with a duration of more than 1 year is not significant.
Meta-Regression With Duration of Intervention
Note. n = number of outcome measures; K = number of interventions.
Publication Year
A meta-regression with publication year as moderator showed no significant relationship between the year of publication of the paper and the reported effects. 2 For the regular meta-regression, the effect of publication year was Hedges’ g = .001 (SE = .005, p = .826), and for the meta-regression with robust variance estimation, it was Hedges’ g = −.000 (SE = .005, p = .968). This means that the average effects of the interventions were comparable over the years.
Multiple Moderators
Last, we performed two meta-regression models with multiple moderators to find out if the significant between-group effects on the omnibus test that we found for the specific interventions (GBG, IY new, PATHS, and all other) and for the focus of the interventions (the different combinations thereof) in fact are a result of a relationship with other moderators that also (seemed to) affect the intervention effect. We ran two separate models: one with multiple moderators for the specific interventions and another with combinations of foci. To reduce the number of parameters in the model with foci, we used the recoded variable for the combinations of foci with categories with fewer than five interventions merged into one group. We included other moderators with a significant between-group difference in the omnibus test, albeit in the regular meta-regression or the model with robust variance estimation. This applied to the rater of the measurement instrument and for the intervention duration. As the between-group difference for the rater was the result of a category with a deviating effect based on just one intervention study, we decided to exclude the rater from the multiple moderator analysis. In addition, we did include SES as a moderator as prior research suggested that low SES students potentially benefit more from high-quality, teacher-student interactions (Schmerse, 2020), which is an aspect of good classroom management, and we are specifically interested to learn whether classroom management interventions have potential for reducing the achievement gap between low- and high-SES students.
Table 9 shows the results for the first meta-regression model with multiple moderators, in which the specific interventions, the intervention duration, and the SES of the student sample are the moderators. Here, we took IY new as the reference category, as the effects of that intervention seem to differ most from the other specific interventions in the earlier comparisons. In the regular meta-regression presented in Table 9, the differences in effects between the specific programs (GBG, PATHS, and “all other programs”) and IY new are not significant when taking intervention duration and students’ SES into account. In the robust variance estimation model, we do find significant differences in effects between PATHS and IY new and between “all other interventions” and IY new when considering intervention duration and students’ SES. The differences between the effects of the specific programs are slightly higher than in the earlier presented model, without taking into account the duration of the intervention and students’ SES. The effects of these last two moderators did not change notably. That is, shorter interventions are more effective than longer interventions, including when the other moderators (specific interventions and students’ SES) are taken into account. Moreover, the intervention effects did not vary across SES groups when the specific interventions and intervention duration are taken into account. The omnibus test of the moderators for the model with intervention, intervention duration, and SES is significant for the meta-regression with robust variance estimation and just not significant for the regular meta-regression. This means that in the model with robust variance estimation, these three moderators together explain a significant part of the heterogeneity in effect sizes of the intervention studies.
Meta-Regression With Intervention, Intervention Duration, and SES
n = Number of outcome measures; K = number of interventions.
To be able to include all intervention studies in the model, we included the intervention studies with a missing value on SES as a separate category in the meta-regression model.
In Table 10, the results for the second meta-regression model with multiple moderators are presented, in which the combinations of foci addressed in the intervention, the intervention duration, and the SES of the student sample are the moderators. The table shows that the lower effect that we previously found for the group of interventions that combined all foci also exists after taking into account the intervention duration and students’ SES. The differences in Hedges’ g between the reference category “all foci” and the other categories we distinguished for focus even slightly increased, and the p-values indicate significantly lower effects with all categories distinguished for the robust variance estimation. The effect for intervention duration seems about the same as above and for SES we again found no effect. Shorter interventions are again more effective than longer interventions, including when the focus of the interventions and students’ SES are taken into account. The intervention effects do not vary across SES groups, including when the focus of the interventions and the duration of the intervention are taken into account. For the model with focus, intervention duration, and SES, the omnibus test of the moderators is significant for the meta-regression with robust variance estimation and is not significant for the regular meta-regression. This means that in the model with robust variance estimation, these three moderators together explain a significant part of the heterogeneity in effect sizes of the intervention studies.
Meta-Regression With Foci, Intervention Duration, and SES
n = Number of outcome measures; K = number of interventions; T = teacher behavior focus; R = teacher-student relationship focus; B = student behavior focus; S = students’ social emotional development focus.
To be able to include all intervention studies in the model, we included the intervention studies with a missing value on SES as a separate category in the meta-regression model.
Discussion
Effectiveness of Classroom Management Interventions
The first research question was: What is the average effect of classroom management interventions on students’ academic, behavioral, social-emotional, and motivational outcomes in primary education, and which classroom management interventions effectively support and facilitate student learning? The meta-analysis of 76 whole-classroom interventions on classroom management confirms that classroom management interventions can effectively support and facilitate academic, behavioral, social-emotional, and motivational student outcomes in primary education. In the current meta-analysis, we found a small, significant average effect of Hedges’ g = .23 (.22 with robust variance estimation) for studies published between 2003–2022. This effect equals the average effect found in the meta-analysis of Korpershoek et al. (2016), in which 54 interventions published between 2003–2013 were included. Also in line with the prior meta-analysis, we did not find differential effects for outcome type, suggesting that the interventions impacted a broad range of student outcomes, including academic performance, behavior, social-emotional development, and motivation.
The effects of the commonly used interventions GBG, PATHS, and Incredible Years could be summarized across multiple intervention studies, revealing more robust findings regarding their effectiveness based on rigorous research designs. These three interventions were evaluated in at least five intervention studies. We compared the effectiveness of these three programs and included a category “all other interventions”. The omnibus test of the moderator indicated significant between-group differences. We found indications that the new version of Incredible Years was less effective than PATHS and the category “all other interventions”. The lower effects were significant for the meta-regression with robust variance estimation and just not significant in the regular meta-regression. The new version of Incredible Years had a just not significantly lower effect compared with the GBG for the analysis with robust variance estimation, but in the regular meta-regression, the effect did not come close to significance.
What is important to note here is that the new version of Incredible Years was the only intervention included in the meta-analysis that was categorized as having a focus on all four categories of classroom management strategies (i.e., a focus on teacher behavior, teacher-student relationships, student behavior, and student social-emotional development). The results of the moderator analysis, in which we compared the combinations of foci that were present in five or more primary studies, yielded significant between-group differences and suggested that interventions that had a focus on all four categories of classroom management strategies were less effective than interventions that focused on fewer categories of classroom management strategies. The differences were significant for the model with robust variance estimation and mostly reached near significance for the regular meta-regression model. A tentative conclusion could be that programs that combine all four foci have a too-broad focus and therefore are somewhat less effective than more targeted programs, but with only one program in this category, we feel that we cannot draw such a conclusion. There might be interventions available to schools that have this broader focus as well, but that were not included in our meta-analysis, for example, because the studies did not meet our inclusion criteria, or because there were no experimental studies available concerning the intervention. Because the new version of Incredible Years was the only intervention that focused on all four categories, we cannot generalize these results to other interventions that focus on all four categories. For completeness, we checked implementation fidelity of the studies evaluating the effectiveness of the new version of Incredible Years (Aasheim et al., 2018, 2019, 2020; Ford et al., 2018; Hickey et al., 2015; Morris et al., 2013; Murray et al., 2018; Reinke et al., 2018), but there were no clear implementation issues in these studies. More research is needed on this comprehensive program to confirm this tentative conclusion.
Increased Focus on Teacher-Student Relationships
The second research question was: To what extent do the classroom management strategies targeted in the interventions published between 2003 and 2013 differ from the classroom management strategies targeted in the interventions published between 2014 and 2022? All interventions were coded on the presence or absence of four categories of classroom management strategies to explore which combination(s) of strategies moderate the effects of the interventions. Compared to the 54 intervention studies included in the 2003–2013 meta-analysis, the newly added 22 intervention studies published between 2014 and 2022 showed an increased focus on teacher-student relationships. This finding is in line with Freiberg et al. (2020), who reported a growing trend from compliance and obedience to self-discipline and self-direction in classroom management interventions. The increased focus on teacher-student relationships seems in line with the increasing belief that positive teacher-student relationships are an indispensable prerequisite for a positive learning environment and improved student outcomes—that is, multiple meta-analyses have indeed revealed that positive teacher-student relationships are positively associated with various student outcomes, including cognitive, social-emotional, and behavioral outcomes (Cornelius-White, 2007; Endedijk et al., 2022; Roorda et al., 2011).
Moderators of the Effects of Classroom Management Interventions
The third research question was: Which characteristics of the interventions, samples (incl. SES), and measurement instruments moderate the effects of the classroom management interventions? As described previously, we found significant moderator effects for the specific interventions and for the combinations of foci. The omnibus tests of the moderator were not significant for outcome type, measurement instrument, country, and publication year, indicating that these aspects do not moderate the effects of the classroom management interventions. We found a significant moderator effect for rater, but this effect was based on a category of rater (namely: classmates), which occurred in only one intervention. Therefore, we must be hesitant in drawing conclusions. We recommend future studies to investigate the similarities and dissimilarities between different raters on various student outcomes in intervention studies to explore this further. Moreover, we found a significant moderator effect for duration. Interventions that lasted less than 13 weeks were more effective than interventions with a longer duration, but this effect was only present in the models without robust variance estimation. This finding is somewhat counterintuitive, as behavioral change (both among teachers and students) generally takes time, but it could also be the case that intervention effects wear off over time when the team and students lose interest in the intervention activities. More in-depth evaluation of the implementation of the included interventions is needed to understand this finding better, also because our general indicator of duration in weeks may not have captured true differences in intensity (e.g., varying from low intensity of one hour per week to high intensity of continuous implementation throughout the day).
Following the findings of Schmerse (2020) and Ponitz et al. (2009) that low-SES students may benefit more from well-managed classrooms, we additionally evaluated whether students’ SES affected the intervention effect. As we found no significant moderator effect for students’ SES, the interventions were generally not more (or less) effective for students from lower socioeconomic backgrounds as compared to students from higher socioeconomic backgrounds. This again confirms the statement of Korpershoek et al. (2016) that “all students may benefit from a classroom management intervention” (p. 669). Including new classroom management interventions in the updated meta-analysis did not change the prior finding that across all intervention studies, the intervention effects were the same for students from different socioeconomic backgrounds. However, not finding differential effects may have been caused by the way SES was measured in the primary studies. Free or reduced lunch (FRL) is not the most precise measure of students’ SES (Harwell & LeBeau, 2010), whereas this was the information most often provided in the primary studies. It may be that the poor measurement of students’ SES has affected our findings, and that this is why we did not find an effect of students’ SES on the intervention effect. Both Schmerse (2020) and Ponitz et al. (2009) did not use FRL as an indicator for SES, which may explain the difference in findings.
Limitations and Future Directions
Educational practice is always in development, and up-to-date insights are continuously needed to enable evidence-informed decision-making regarding the selection process for suitable interventions and implementation in schools. We could summarize the effects of commonly used interventions only for three specific programs—namely, for GBG, PATHS, and Incredible Years (new version). The number of programs that were evaluated in at least five intervention studies was very limited, despite the overall sample of 76 intervention studies. We did not find any new experimental studies on SWPBS, Zippy’s Friends, the Color Wheel System, or the Consistency Management & Cooperative Discipline (CMCD) intervention that met our inclusion criteria. Particularly regarding SWPBS, this was somewhat unexpected (although some studies mentioned that PBS interventions had previously been implemented in the participating schools, e.g., in Reinke et al., 2018), as the program is still widely used in schools. The meta-analysis of Korpershoek et al. (2016) reported no significant average effects of this intervention on student outcomes. This does not necessarily mean that SWPBS, Zippy’s Friends, the Color Wheel System, and CMCD are not effective, but it does mean that the scientific evidence for their effectiveness on student outcomes in current educational practice is limited.
Relatively few studies that met our inclusion criteria were published between 2014 and 2023 (22) as compared to the 54 intervention studies that were published between 2003 and 2013. This could mean that fewer researchers are conducting randomized controlled trials to study the effect of widespread classroom management interventions, a general decrease of attention among scholars (and presumably school policy advisors and administrators) on classroom management research in general, and/or the belief that enough evidence has been collected in earlier years, but the true reasons are difficult to pinpoint. The decrease in conducted intervention studies is worrying, as some of the interventions included in the meta-analysis are widely used, without knowing if recent studies continue to confirm previously found positive student outcomes in the changed educational and societal contexts. Replication studies are necessary to evaluate the robustness and trustworthiness of research findings (Perry et al., 2022).
Another limitation of the present meta-analysis is that we limited our search to peer-reviewed publications as a general quality assurance of the primary studies. As a result, potentially relevant research reports that did not undergo peer review were being excluded (e.g., the RCT report on GBG from the Education Endowment Foundation; Humphrey et al., 2018).
One might wonder whether the COVID-19 pandemic could have impacted the interventions and subsequent findings, but this generally does not seem to be the case. Most of the primary studies included in our meta-analysis were published before the pandemic started. Only five of the included studies were published in 2019, one study in 2021, and one study in 2022. As the first lockdowns were in 2020, the studies published in 2019 could not have been affected by lockdowns. We checked the other two most recently published articles, but in neither of the two is there any mention of COVID, corona, or lockdown. Both articles unfortunately do not mention in what year the data were gathered, so we do not know if the pandemic might have influenced the implementation and success of these two interventions.
Additionally, a limitation as well as a strength is that we only included effect sizes stemming from RCTs and pretest-posttest control group designs, while the effects of many interventions were often also evaluated through other research designs (e.g., quasi-experimental settings, multiple baseline designs). We realize that this stringent inclusion criterion does not do justice to the broad range of relevant papers, but we chose this criterion as a measure of methodological quality and rigor, as these are the only research designs that can provide strong empirical support for the effectiveness of interventions, eliminating, for example, maturation effects and other confounding variables.
Educational Implications
The current meta-analysis may help education administrators and school teams to compare the effectiveness of different types of interventions and make evidence-informed decisions about the implementation of a classroom management program suitable for their particular context (Nelson & Campbell, 2017). In a similar vein, the current meta-analysis may inform teacher educators and improve their classroom management curriculum for pre-service teachers (see also Cooper et al., 2017; Freeman et al., 2014; Moore et al., 2017; Woodcock & Reupert, 2023). Scott (2017) emphasized that effective classroom management and effective instruction go hand in hand, stressing the importance of teacher training in a wide range of teaching behaviors and skills. Therefore, we hope that teacher educators incorporate this updated overview in their teacher training programs, as it takes time to develop classroom management skills (e.g., Reinke et al., 2014; Samudre et al., 2022). Many preservice teachers and beginning teachers are struggling with their classroom management (e.g., Dicke et al., 2014; Melnick & Meister, 2008; Van den Bogert et al., 2014; Wolff et al., 2017; Woodcock & Reupert, 2023). Moreover, negative experiences with classroom management and resulting teacher stress and low levels of self-efficacy are important reasons why many teachers burn out and/or leave the teaching profession within 5 to 10 years (Aloe et al., 2014; Dicke et al., 2014; Gilmour et al., 2022). Teacher training on effective classroom management strategies is, therefore, essential, as well as providing preservice teachers with an up-to-date knowledge base on classroom management research.
Footnotes
Notes
Authors
HANKE KORPERSHOEK is full professor of educational sciences, in particular educational innovation and school improvement, at the GION Education/Research Institute of the University of Groningen, the Netherlands,
HESTER DE BOER is a researcher at the GION Education/Research Institute of the University of Groningen, the Netherlands,
JOLIEN M. MOUW is an assistant professor at the GION Education/Research Institute of the University of Groningen, the Netherlands,
