Abstract
This meta-analysis describes the effectiveness of school-based social and emotional learning (SEL) programs for K–12 students’ prosocial behavior. Prosocial behavior, defined as voluntary behavior intended to benefit others, is a critical social competency. To date, the field lacks a clear synthesis of the effectiveness of SEL programs in promoting students’ prosocial behavior due to broad or disparate operationalizations of prosocial behavior in prior meta-analyses. Clarity is needed regarding the effectiveness of SEL programs for students’ prosocial behavior in order to inform and guide development of SEL programs that aim to improve students’ prosocial behavior. The current meta-analysis includes 66 studies and 157 effect sizes involving 52,914 youth. A summary effect size (Hedges’ g = .24, RVE SE = .03, 95% CI = [.17, .30], p < .001) indicated that participation in SEL programs is associated with higher levels of prosocial behavior among K–12 students. Estimated heterogeneity of effects is large, with most effects ranging from −.06 to .53. We investigated 14 moderating variables such as approach, school level, urbanicity, and dosage. Implications for future research and practice are discussed.
Keywords
School-based social and emotional learning (SEL) programs support the development of students’ intra- and interpersonal skills to bolster their physical, emotional, psychological, and social well-being (Cipriano et al., 2023). Before the COVID-19 pandemic, nearly 90% of U.S. school districts reported investing in or planning to invest in SEL programs (Yettick, 2018). This trend was partly driven by a significant national increase in students’ disrespectful behavior toward teachers from 2000 to 2020 (Irwin et al., 2022). Following the COVID-19 pandemic, students’ social-emotional well-being (Racine et al., 2021; Rogers et al., 2021), academic engagement, and sense of school community have declined (Branje & Morris, 2021), and teacher stress and exhaustion have increased (Pressley et al., 2021). Educators have also reported that implementing SEL is increasingly important (Becker et al., 2023). A recent EdWeek poll found that student misbehavior continues to be a primary cause of stress for teachers (Prothero & Solis, 2024). This may lead to even greater adoption of SEL programs in U.S. schools. Thus, there is a pressing need to understand the effectiveness of SEL programs.
SEL Programs
Currently, there is no consensus in the SEL literature on what constitutes an SEL program (Cipriano et al., 2023). In the present study, we define SEL as the deliberate attempt by educators to support the social and emotional (SE) competencies of students in school contexts. SE competencies generally refer to an individual’s ability to understand, communicate, and get along with others, as well as the ability to recognize, manage, and express emotions effectively (Collaborative for Academic, Social, and Emotional Learning [CASEL], 2023). SE competencies can be contextually specific based on cultural norms or developmental appropriateness (Jones et al., 2017b).
Some researchers differentiate an SEL program from SE strategies, sometimes referred to as “kernels” of SEL. They argue that kernels may be more potent because they are focused, active components that are less burdensome to implement than a comprehensive program (Jones et al., 2017a). We do not make this distinction in order to be inclusive of a variety of SEL approaches, including brief, strategy-focused interventions and those outside a manualized, didactic curriculum that may promote prosocial behavior (e.g., Bergin et al., 2023; Weissberg et al., 2015). Furthermore, SEL interventions vary in target audience (e.g., universal Tier 1, small group Tier 2, or intensive Tier 3), dosage and duration, content, and targeted skill(s) (Bergin et al., 2023; CASEL, 2023; Weissberg et al., 2015). Thus, a variety of interventions that create a classroom or school community fostering SE skills can constitute SEL. Our review does not limit SEL to a specific approach, but it is limited to SEL interventions that target prosocial behavior.
Prosocial Behavior
In this study, prosocial behavior is conceptualized as any voluntary behavior intended to benefit others or promote harmonious relationships. This definition aligns with the term’s early (Radke-Yarrow et al., 1983) and continuing use in developmental psychology (Eisenberg et al., 2015). However, the term has been defined in various ways across fields. In an effort to organize these varied definitions, Pfattheicher et al. (2022) distinguish between three different perspectives. An intentionalist perspective emphasizes behavior intended to benefit another person regardless of the actual consequences. A consequentialist perspective emphasizes the beneficial outcomes of one’s behavior for another person regardless of intention. A societal perspective may not require that prosocial behavior benefits others but emphasizes whether society values or approves of the behavior (i.e., aligns with social conventions). Pfattheicher et al.’s framework does not place any definition above others, but it is useful for situating the present study and differentiating it from prior related meta-analytic work discussed later. The present definition is best aligned with a combination of intentionalist and consequentialist perspectives. It includes behaviors such as helping, cooperating, applauding, sharing, advocating for, comforting, and defending others, among other behaviors (Bergin, 2014; El Mallah, 2020).
Many SEL frameworks include prosocial behavior to some degree within broader categories of behavior. In the Ecological Approaches to Social Emotional Learning (EASEL) taxonomy, prosocial and cooperative behavior is embedded under “Social/interactional skills” (Jones et al., 2017b). In the Collaborative for Academic, Social, and Emotional Learning (CASEL) framework, prosocial behavior is embedded under “Social Skills” within the domain of “Relationship Skills.” For example, when categorizing the content of elementary school SEL programs using the CASEL framework, researchers identified several prosocial behaviors under “Social Skills” such as giving compliments, taking turns, saying kind words to encourage or console someone, helping, sharing, compromising, being polite, and apologizing (Lawson et al., 2019). Furthermore, many states in the U.S. have adopted SEL standards that include prosocial behavior (Bergin et al., 2025), as have other countries (e.g., Canada, New Zealand, India’s Mulyavardhan program). Examples of prosocial behaviors included in state SEL standards in the U.S. are Alaska’s social awareness standard 3B, “demonstrates consideration for others” (Anchorage School District, 2024), California’s belonging standard 4.C.1–5, “shows appreciation to peers” (California Department of Education, 2024), Connecticut’s social-emotional domain standard S/E2, “demonstrates respect for others” (Connecticut State Department of Education, 2024), Florida’s interpersonal relationships standard 20, “gives assistance” (CPALMS, 2024), and Hawaii’s strengthened sense of Aloha standard 4.f., “gives joyfully without expectation of reward” (Hawaii Public Schools, 2015).
In addition to being a key social competency, prosocial behavior is an important outcome of SEL programs because it promotes K–12 students’ and teachers’ well-being in two major domains. First, prosocial behavior contributes to students’ and teachers’ overall SE well-being. Prosocial students are more likely to be calm, happy, and liked by classmates and teachers, and less likely to be depressed because their prosocial behavior serves as the basis for positive relationships and mitigates stress (Curry et al., 2018; Hui et al., 2020; Oberle et al., 2023). When students are prosocial, teachers report greater job satisfaction and teaching efficacy (Aldridge & Fraser, 2016). This is an especially pertinent issue in the U.S. which has faced unprecedented levels of teacher burnout and job shortages in recent years (Doan et al., 2023; Marshall et al., 2024).
Second, prosocial behavior promotes students’ academic success (Brass et al., 2022; Wang & Fredricks, 2014). Prosocial students are more likely to be engaged in class and earn higher grades and test scores across K–12 education (e.g., Collie et al., 2018; Fisk & Lombardi, 2021; Oberle et al., 2023; Wentzel et al., 2004). For example, in one study, students who were more prosocial in 6th–7th grade had higher achievement in 12th grade (Curlee et al., 2019). Increased academic achievement may result from prosocial students’ tendencies to show interest in schoolwork, work independently, listen, stay on task, and model these behaviors to their classmates (Bierman et al., 2009; McClelland & Morrison, 2003). Students’ prosocial behavior also contributes to a positive classroom climate, which supports the learning of all students in a class (Goodnough & Cashion, 2006; Holbrook & Kolodner, 2013).
Prosocial behavior is also important to later career success. Employers (Rios et al., 2020) and the World Economic Forum (World Economic Forum, 2020) have called for workers with greater prosocial skills, such as the ability to collaborate and behave respectfully toward others. Yet, top managers believe that currently just 30% to 40% of new hires have enough of these skills (Brackett & Cipriano, 2020). Examining how school-based programs might foster prosocial skills is critical and may become even more important as jobs in the global workforce increasingly require teamwork (Savitz-Romer et al., 2015).
Related Previous Meta-Analytic Reviews
Given that fostering students’ prosocial behavior is an important outcome, several reviews have attempted to summarize SEL programs’ effects on prosocial behavior. However, prior reviews have used different definitions. Reviews that used the construct “positive social behavior” have had mixed results. In a review of seven widely used SEL programs conducted by two U.S. agencies (the Institute of Education Sciences and the Centers for Disease Control and Prevention) and Mathematica Policy Research, there was no significant effects of participating in an SEL program on multiple outcomes, including altruistic and “positive social behavior” (Ruby & Doolittle, 2010). However, Durlak et al. (2011) found an effect size of g = .24 at posttest (range of .16–.32) and .17 for a subset (k = 12) at follow-up (at least 6 months after program completion) in 86 (out of 213) studies of universal (Tier 1) SEL programs that measured “positive social behavior.” This category of behavior was described as “getting along with others” during daily behavior, which most likely included some aspects of prosocial behavior. Taylor et al. (2017) followed 28 of the same programs from Durlak et al.’s (2011) study that measured follow-up effects 6 months to 18 years later (mean of 1.7 years). They found an overall effect size of .13 for long-term follow-up for “positive social behavior,” which they defined as including prosocial behaviors (e.g., cooperation and helping) as well as other behaviors such as problem-solving skills, which would not be considered prosocial behavior under the present study’s definition.
Other reviews have used the term “prosocial behavior” but with varied and broad definitions. Sklad et al. (2012) found an overall effect size of d = .39 in six (out of 75) studies of Tier 1 SEL and behavioral interventions. Their definition of prosocial behavior included behaviors intended to help others, but also included other behaviors and cognitions, such as empathy and problem-solving. Although classic and contemporary research finds some relationship between prosocial behavior and empathy, the associations are typically small to moderate because empathy and prosocial behavior do not consistently co-occur (e.g., Eisenberg & Miller, 1987; Marshall et al., 2020).
A few years later, Wigelsworth et al. (2016) found an overall effect size of g = .37 for efficacy studies (i.e., controlled research studies) and g = .18 for effectiveness studies (i.e., large-scale implementation in authentic settings), but with a high degree of heterogeneity in 39 (out of 89) studies of Tier 1 SEL programs in the UK and the Netherlands. Their definition of prosocial behavior included “social awareness” and “social problem-solving” using Denham’s (2006) framework of SE competencies. Recently, Cipriano et al. (2023) found an overall effect size of g = .18 for 250 Tier 1 SEL programs. Their definition of prosocial behavior included responsibility, assertiveness, coping with unfairness, understanding consequences of actions, and problem-solving. The broad definitions of these meta-analyses best align with a societal perspective of prosocial behavior (Pfattheicher et al., 2022). Prosocial behavior that specifically benefits others is only moderately related to general social competence or social skills (Spinrad & Eisenberg, 2014), perhaps in part because general social skills can be put to either prosocial or antisocial purposes.
Using a definition similar to that of the present study, Mesurado et al. (2019) found an overall effect size of g = .23 for SEL programs on prosocial behavior. They identified 10 studies (six published, four dissertations) in a variety of countries (e.g., Canada, US, Spain, Ireland, Lithuania), eight of which were randomized controlled trials. They found a high level of heterogeneity with effect sizes ranging from −.07 to .68. However, due to the small sample of studies they were unable to investigate moderating factors related to heterogeneity. Shin and Lee (2021) also used a similar definition as the present study, finding an overall effect size of g = .44 and a high level of heterogeneity in a sample of 33 studies in Korean and English. However, they did not limit their review to school-based interventions (e.g., research labs, psychiatry facilities). Their results were inconclusive and called for future studies that include more moderating variables, including dosage and deliverer. The present study addresses this call.
Although these existing eight reviews have been meaningful and advanced the field, the effect of K–12 SEL programs on prosocial behavior requires additional clarification. The present study contributes to the literature in four important ways. First, our definition of prosocial behavior focuses on behavior (rather than prosocial goals, attitudes, or emotions) that is other-oriented (i.e., aligns with an intentionalist and consequentialist perspective, as well as many state SEL standards). See Supplementary Table S4 in the online version of the journal) for a list of the different prosocial measures used in the studies from the present review. Second, we did not limit SEL programs to those that follow a manualized, didactic curriculum approach, but rather included programs that focus on school structure, or on teacher training to improve student well-being. We also included programs that operated solely at Tier 2 or 3 (i.e., targeting a group of students needing specific support). Most prior meta-analyses included only Tier 1 universal programs (e.g., Wigelsworth et al., 2016) or Tier 1 programs that also included Tier 2 and 3 components (e.g., Cipriano et al., 2023). Third, unlike most prior reviews, we did not require that studies include a control group to be included. This allowed for inclusion of studies with smaller Tier 2 samples of short, strategy-focused interventions with pre–post designs. Fourth, we examined a wide range of potential moderating variables, discussed next.
Potential Moderating Variables
To investigate which features of SEL programs may be important, we explored several moderating variables across four categories: sample, program, methodological, and publication characteristics, of the SEL studies. We used three criteria to select moderating variables: (1) those of practical interest to program designers, and educators seeking to adopt programs (e.g., number of sessions, who delivers the program, approach), (2) those indicated by previous meta-analyses or our own experience in designing and delivering SEL as potentially meaningful, and (3) those that are reported frequently enough for analysis among included studies (Durlak et al., 2022).
Sample Characteristics
Sample characteristics may include school level (i.e., elementary, middle, and high school), the percentage of students qualifying for free or reduced-price school lunch, urbanicity, and the representation of gender and racial diversity. Durlak et al. (2011), and Shin and Lee (2021) found that SEL programs were equally effective for elementary, middle, and high school-age students, though there are fewer programs that target older students than those that target younger students. Yeager (2017) argued that SEL programs may be less effective for older students. However, in a review of five meta-analyses on positive adjustment (e.g., interpersonal relationships and academic achievement) and risk behavior (e.g., drug use and disruptive behavior), Domitrovich et al. (2017) concluded that two sample characteristics)—age and SES (i.e., low, middle, and mixed low-to-middle)—were significant moderators in one of the meta-analyses with greater effects for older students and lower SES students. Thus, clarity is needed regarding how developmental level might influence prosocial outcomes of SEL programs. Domitrovich et al. (2017) also found that gender (i.e., the proportion of male or female students), ethnicity (i.e., the proportion of different races and ethnicities), and location (i.e., urban, suburban, or rural settings) were not significant moderators. 1 In our review, we investigated school level, urbanicity, and free or reduced-price lunch rates (F/RPL; a proxy for socioeconomic status, Domina et al., 2018) in order to confirm or clarify these scant results. We did not include gender and ethnicity due to low rates of reporting in included studies.
Program Characteristics
Program characteristics include approach, intervention tier, dosage, duration, and deliverer. Approach refers to whether an SEL program takes a curricular, interactional, structural, or combined approach (Cipriano et al., 2023; Dusenbury et al., 2015; Lawson et al., 2019). Curricular approaches are standalone, fully developed instructional programs offered through direct lessons on specific SE competencies. They are didactic programs that become part of the school or class schedule. The focus is on training students. In contrast, interactional approaches focus on training educators. They indirectly change students’ behavior by transforming the way educators interact with students and support students’ interactions with each other (e.g., Bergin, 2018). That is, educators incorporate SEL into existing daily teaching practices (e.g., using inductive rather than power-assertive discipline, or praising students when they behave prosocially toward classmates). Structural approaches transform school policies and procedures to better support students’ SE competencies (e.g., replacing suspensions with restorative justice practices, or providing service-learning opportunities). A combined approach includes instances where at least two of these approaches are implemented within the same SEL program. Findings from Domitrovich et al. (2017) suggest that SEL programs that use a combined approach may be the most effective at improving school climate. As such, we expected that effects would be larger for interactional or combined approaches because lessons are embedded into daily interactions with teachers, potentially delivering higher doses of SEL compared to a curriculum-only approach that may be delivered one class period per week.
Intervention tier refers to the subpopulation of students who are targeted by an intervention according to a multi-tiered system of support (Eagle et al., 2015). Tier 1 interventions are universal, targeting all students (Steed & Shapland, 2020). The majority of students’ needs are met at the universal level through effective instructional strategies and positive behavioral classroom support. If students do not respond to the universal supports, they can be referred to Tier 2 interventions that target small groups of students with moderate levels of specific SE needs, or Tier 3 interventions that target students with more intensive behavioral needs (McDaniel et al., 2017). Prior reviews (e.g., Domitrovich et al., 2017) only included Tier 1 universal interventions or Tier 1 interventions that also included Tier 2 and/or 3 components (Cipriano et al., 2023), with one exception (Shin & Lee, 2021). The inclusion of a continuum of SEL supports allows the investigation of differing student needs as well as the improvement of the efficiency and continuity of SE intervention (Jones & Bouffard, 2012). Therefore, we include all tiers of intervention to expand previous work.
Dosage refers to the total amount of time students spend in an SEL program. Duration refers to the period of time an SEL program is implemented. Prior meta-analyses have included only studies that had a minimum number of sessions or duration (e.g., Cipriano et al., 2023). Our review includes SEL interventions of any length because short (e.g., less than half of the school year) interventions could potentially increase prosocial behavior. We included duration as well as dosage because duration can be estimated for interactional or structural approaches, whereas only curricular approaches can estimate the amount of time students are exposed to SEL instruction (i.e., dosage). Further, this enabled a consistent coding scheme for all intervention tiers.
Deliverer refers to who delivers the content of SEL programs to students (e.g., researchers/developers, teachers, other school personnel). The Wigelsworth et al. (2016) meta-analysis suggested that effects may be larger for programs when the program developers lead or are involved in delivery (i.e., efficacy studies). However, others have found that programs can achieve similar (Barnes et al., 2014) or better (Durlak et al., 2011) outcomes when conducted by school staff compared to those outside the school system. In light of these conflicting results, we investigated the effect of the deliverer of SEL programs on students’ prosocial behaviors.
Methodological Characteristics
Methodological characteristics include study design (whether studies used a randomized controlled trial (RCT), quasi-experimental design (QED), or single-group pre–post design), the type of prosocial behavior outcome measure used (e.g., student-report, teacher-report), baseline equivalence, and report of implementation fidelity. In intervention research, quasi-experimental studies tend to have larger effects than rigorous RCTs. For example, a study of SEL effects on academic achievement found that quasi-experiments had higher effects than RCTs, and rigorous RCTs did not show effects (Corcoran et al., 2018). Luo et al. (2022) also found that quasi-experiments had higher effects than RCTs on preschoolers’ social competence. Staines and Cleland (2012) argue that RCTs can underestimate effects and are biased toward the null hypothesis. However, as noted by Cook et al. (2008), this does not imply that random assignment is unimportant. Instead, it highlights that factors beyond random assignment, such as attrition rates and baseline equivalence, must also be considered when assessing methodological characteristics. Well-controlled randomized trials with low attrition and baseline equivalence provide the most unbiased estimates of treatment effects, which are crucial for achieving unbiased meta-analysis results.
Measurement is a challenge in the prosocial field. Prosocial behavior can vary by reporter and method (El Mallah, 2020); thus, it is important that our review examines whether results are moderated by the type of prosocial measure used. For instance, one could argue that, ideally, prosocial behavior should be measured through student-report because only students themselves are privy to their intent to benefit others and to the full range of their prosocial behavior. However, students’ self-report can be subject to social desirability, resulting in ceiling effects with little variability (Davidson et al., 2018). Peer-report (such as peer nominations or peer ratings) can be subject to reputation bias (Hymel, 1986; Lansu & van den Berg, 2022) because peers tend to view classmates in a trait-like way such that even if behavior changes as a result of intervention, peer-report may not change over time. Thus, both student- and peer-report limitations may underestimate effects of interventions. Teacher-report also has important limitations (Duckworth & Yeager, 2015) but could inflate results due to bias in reporting behavior change resulting from interventions in which they are involved. Correlations between peer and teacher ratings of prosocial behavior tend to be modest in the .20–.30 range (e.g., Veenstra et al., 2008). Observations may be less biased than reports but only take brief snapshots of behavior without representing students’ range of behavior and are time-intensive and therefore costly.
Evaluation of study quality is an important part of a literature synthesis (Pigott & Polanin, 2020). We accounted for two aspects of study quality: whether the study established baseline equivalence and reported implementation fidelity. Because our sample included single-group pre–post designs, these indicators help address potential sources of bias (Tipton et al., 2019). In order to reduce bias in the estimated effect sizes, all studies were retained regardless of these quality indicators (Harrer et al., 2021).
Publication Characteristics
Publication characteristics include the availability (i.e., publication or presentation) year and publication status (e.g., published, unpublished) of each study (S. J. Wilson & Lipsey, 2006). This study focuses on a more precise prosocial behavior definition than previous meta-analytic work. Hence, we use a wide range of availability years and included publication status to investigate publication bias (R. Rosenthal, 1979). See Supplemental Table S3 in the online version of the journal, for correlations among moderators.
The Current Study
The purpose of our study was to summarize the effectiveness of school-based SEL programs on K–12 students’ prosocial behavior, a key outcome of SEL, and to document the potential role of moderating variables. We carefully conceptualized prosocial behavior and systematically focused on prosocial behavior as a distinct social competency. We also accounted for within-study dependency and addressed the multiplicity issue caused by multiple individual moderator analyses. We hope this serves as a timely roadmap for future SEL program design, implementation, and effectiveness research. We had five research questions (RQ):
Method
Literature Search
A multistep literature search procedure was implemented from July 1, 2021, to February 11, 2022. In Step 1, a systematic literature search was conducted using the following online databases: EBSCOhost, which includes Academic Search Premier, Open Dissertations, Education Full Text (H.W. Wilson), ERIC, Professional Development Collection, APA PsychArticles, and APA PsycINFO. We used the following Boolean logic to simultaneously search these databases: “school program,” “school intervention,” “classroom program,” “classroom intervention,” “teacher program,” “teacher intervention,” “student program,” and “student intervention,” as well as “prosocial,” “helping,” “kindness,” and “defending.” Although common in K–12 school settings, the terms “cooperation” and “compliance” were not used because including these search terms yielded irrelevant program adherence studies. Also common in K–12 school settings, the terms “collaboration” and “sharing” were not used because including these search terms yielded irrelevant studies focused on teacher outcomes rather than student outcomes. The terms “volunteering” and “donating” were not used because these are prosocial behaviors that are typically studied outside of school settings. No restrictions were made based on year of availability. The Step 1 search procured 15,975 possibly relevant articles.
During Step 2, because we identified a large number of articles in Step 1 (15,975 relevant articles), we were not able to search every author on Scopus. Hence, we followed Memmott-Elison et al. (2020) search process to identify prominent authors from Step 1 (i.e., those who were named authors on three or more identified studies) and reviewed their published work on Scopus to identify additional relevant studies. Step 2 yielded 12 additional possibly relevant articles.
During Step 3, in order to address the potential overrepresentation of certain programs or samples by only searching prominent authors from Step 2, we compiled 20 review and meta-analysis articles that were identified during Step 1 and Step 2. We also completed a thorough Google Scholar search using the search phrase “social-emotional learning review meta-analysis” and identified 46 other possibly relevant review and meta-analysis articles that had not yet been located. The reference sections in these review and meta-analysis articles were skimmed based on titles and abstracts by a trained graduate student and a postdoctoral fellow to locate additional possibly relevant studies. Conflicts were resolved through discussion. Step 3 yielded 44 possibly relevant articles.
During Step 4, we emailed 107 corresponding authors who were named authors on three or more identified studies (including review articles) to ask for access to their unpublished data and any research that is currently in preparation, in press, or has been presented at conferences. Step 4 yielded 5 possibly relevant articles.
The multistep literature search concluded on February 11, 2022, and yielded a total of 16,036 possibly relevant studies. After removing 32 duplicates, we obtained a total of 16,004 possibly relevant studies.
Identification of Studies
Studies were retained in the current meta-analysis based on the following six eligibility criteria. First, studies had to include an analysis of the effects of SEL programs, which we define as interventions that aim to enhance students’ social and emotional skills (Greenberg et al., 2017). Second, the SEL programs evaluated within studies had to be school-based, in that they were delivered or implemented in school settings (e.g., public school, private school, and alternative educational placements) and held during school hours (we did not include any before- or after-school programs). Third, studies had to contain an outcome assessment of prosocial behavior operationalized as voluntary behavior intended to benefit others and promote harmonious relationships (e.g., helping, applauding, and supporting others). Fourth, only studies with samples of K–12 students were included because SEL programs that target other populations (e.g., preschoolers, college students) occur in significantly different contexts and may not provide accurate comparisons. Fifth, a study had to use one of the following research designs and provide adequate statistical information: (1) an experimental or quasi-experimental design that compared groups receiving one or more interventions with one or more control groups with pretest and posttest measures on prosocial behaviors or (2) a pre–posttest design with measures on prosocial behaviors using the same participants, including one- and multiple-group designs. We included pre–post single-group designs because excluding these articles by focusing on controlled effect sizes may miss a substantial part of the existing evidence. Sixth, studies had to be available in English.
Because our search yielded a large number of abstracts, our initial screening process focused on reviewing each title and abstract. Due to limited information in these titles and abstracts, our initial screening criteria were broader than the subsequent screening procedures. Once we identified abstracts that met our initial screening criteria, we reviewed the full-text articles and conducted a more detailed secondary screening. The process is outlined later.
Initial Screening
A trained graduate student and postdoctoral fellow reviewed the titles and abstracts of all identified studies based on three criteria. First, studies that did not include prosocial behavior as an outcome were excluded. Second, studies that focused on populations other than K–12 students were excluded. Reviewers met frequently to discuss and resolve any discrepancies. A total of 410 abstracts were retained upon initial screening.
Secondary Screening
After initial screening, two trained graduate students and two postdoctoral fellows reviewed the full text of identified studies and screened using three criteria. First, studies that did not align with our operationalization of school-based SEL programs were excluded. In order to be as inclusive as possible, we did not limit the number of studies that implemented the same SEL program because several studies using different samples have examined the effectiveness of well-established programs (e.g., Second Step). Second, studies that did not align with our operationalization of prosocial behavior or outcome measurement were excluded. Third, studies that did not align with our research design criteria or did not provide adequate statistical information (e.g., only provided regression coefficients without means and standard deviations) were excluded. We excluded multiple regression coefficients because studies may have controlled for different covariates in their models, and differences in covariates across studies can affect the interpretation of regression coefficients (Harrer et al., 2021). Fourth, when multiple reports (e.g., a dissertation and a journal article) were based on the same sample of students and identical methodology, we retained the report that was first available and excluded later duplicate reports from the same study. Pairs of reviewers met frequently to discuss and resolve any discrepancies. After these exclusions, 66 studies (54 via databases and registers and 12 via other methods) qualified for inclusion in this meta-analysis. Figure 1 illustrates the inclusion and exclusion decisions.

Flowchart of the search and screening process.
Study Coding Procedures
After obtaining our final selection of 66 studies, six graduate students and postdoctoral researchers were trained as coders to extract information pertinent to our research questions. The coders met frequently to discuss disagreements and reach a consensus; all 66 studies were coded by two or more coders. Studies were coded for sample characteristics (i.e., school level, F/RPL, and urbanicity), program characteristics (i.e., approach, dosage, duration, deliverer, and intervention tier), methodological characteristics (i.e., study design, measurement type, baseline equivalence, and implementation fidelity), and publication characteristics (i.e., availability year and publication status). We provide coding details in Supplementary Table S1 (online only).
Interrater reliability across coders was analyzed using Fleiss’ (1971) generalized kappa given its computational utility across the number of raters and coded variables. We used the irr R package (Gamer et al., 2012) to calculate Fleiss’ kappa (Landis & Koch, 1977). We found almost perfect agreement with Fleiss’ generalized kappa of .96 (Viera & Garrett, 2005). Pairs of coders met frequently to discuss and resolve any discrepancies, and a third coder arbitrated when needed. After discussing and resolving all discrepant ratings, we arrived at 100% agreement.
Effect Size Calculation
We calculated effect sizes (ESs) using Hedges’ g (1981), which includes a small sample bias correction to the effect size estimate to account for small studies. For studies that reported both pretest and posttest data, we followed Morris’ (2008) guidelines to account for the correlation between pretest and posttest scores (ρ). If studies did not report this information, we calculated ρ from paired t-tests using formulas from D. B. Wilson (2016). For single-group studies, the pre–post effect size was calculated as the difference between the posttest mean and pretest mean divided by the standard deviation of the pretest mean, adjusting the variance with the pre–post correlation (Morris & DeShon, 2002). The equations for calculating the ESs can be found in Lipsey & Wilson (2001) and D. B. Wilson (2016).
Meta-Analytic Procedures
Our dataset consisted of 157 effect sizes drawn from 66 primary studies. Because most studies contributed multiple effect sizes (e.g., multiple measures of prosocial behavior from the same sample), we used a correlated effects (CE) model with robust variance estimation (RVE; Hedges et al., 2010; Pustejovsky & Tipton, 2022) to account for within-study dependence among effect sizes. Under the CE model, effect sizes within a study are assumed to be correlated, and RVE is used to obtain standard errors that are robust to misspecification of the dependence structure. To examine the overall treatment effect on prosocial behavior, we estimated a random-effects model to calculate the overall pooled effect size estimate. To examine the moderator effects, we applied a forced-entry approach by entering all moderators into the model simultaneously to estimate their effects concurrently. Under this approach, each regression coefficient represents the effect of a moderator on the treatment outcome while controlling for all other moderators in the model. This method helps address the problem of multiplicity and control the likelihood of Type I and Type II errors (Tipton et al., 2019). All the analyses were conducted using the robumeta R package (Fisher & Tipton, 2015), and incorporated Tipton and Pustejovsky’s (2015) small-sample bias correction. Additionally, if the adjusted degrees of freedom were less than 4, we used a strict alpha level of .01 to better control Type I error (Tanner-Smith et al., 2016).
Results
Overview of the Studies
Descriptive information on the characteristics of the included studies is provided in Table 1. A total of 157 effects from 66 studies were included in our final sample. We provide detailed summaries of the sample, program, methodological, and publication characteristics of included studies in Supplemental Table S2 (in the online version of the journal). The references for the studies used in this analysis are available in the supplemental material in the online version of the journal.
Overview of the characteristics of 66 intervention studies
Note. F/RPL = free or reduced-price lunch rates. No Tier 3 studies were included.
RQ1: Overall Treatment Effect
Our primary purpose was to examine the effects of school-based SEL programs on students’ prosocial behaviors. The pooled random-effects estimate of the effect on prosocial behaviors was statistically significant and positive (
Moderator Analysis
To examine whether the effect of SEL programs on prosocial behavior was moderated by sample, program, methodological, and publication characteristics, we conducted a mixed-effects meta-regression analysis with all moderators added to the model simultaneously. Table 1 provides the frequencies of each of the moderator codes. Table 2 presents all coefficients related to moderator analyses. All “Not reported or unclear” moderator categories were not interpretable.
Estimates of the multiple meta-regression analyses
Note. Coeff. = coefficient; SE = standard error; 95% CI = 95% confidence interval; df = degrees of freedom; F/RPL = free or reduced-price lunch rates.
p < .05.
RQ2: Moderating Effect of Sample Characteristics
School Level
Most studies were conducted with elementary school children (N = 37; 56%), followed by studies that included a combination of age groups (N = 15; 23%). Relatively few measured the prosocial behavior of middle school (N = 10; 15%) and high school students (N = 4; 6%). Moderator analyses indicated that compared to elementary children, SEL programs had similar effects on the prosocial behavior of middle school (b = .14, p = .329), high school students (b = .11, p = .619), and combined age samples (b = −.05, p = .737).
Free or Reduced-Price Lunch (F/RPL) Percentage
Results indicated that effects were not statistically significantly different between samples with more than half of students qualifying for F/RPL (51–100%) compared to samples with less than half of students qualifying for F/RPL (0–50%; b = −.14, p = .473). However, the analysis of free or reduced-price school lunch percentage is based on a small sample because most studies (71%) did not clearly report this variable.
Urbanicity
Of the studies that provided information about the locale of the schools where the research was conducted, most were from urban (N = 22; 33%) or suburban areas (N = 14; 21%). Nine studies included schools from multiple types of areas, and even fewer were from exclusively rural areas (N = 4; 6%). Results indicated that effects for rural (b = .01, p = .986), suburban (b = .09, p = .521), and combination areas (b = −.10, p = .541) were not statistically different from samples from urban areas.
RQ3: Moderating Effect of Program Characteristics
Approach
A curricular approach was the most common across the included studies (N = 51; 77%). Some studies used an approach that combined curricular and interactional or structural approaches (N = 12; 18%), and only three studies exclusively used an interactional approach. Effects of curricular (b = .20, p = .391) and curricular combined with structural or interactional approaches (b = .28, p = .207) were not significantly different from SEL programs that only used an interactional approach.
Dosage
Dosage was calculated by multiplying each study’s average number of distinct sessions or lessons with the average length of each session/lesson in minutes. The dosage of SEL programs varied widely, ranging from only 45 minutes (Pearl & Dulaney, 2006) to 175 hours (Lewis et al., 2016). In order to make meaningful comparisons, we split the studies based on whether they lasted above or below the median dosage in this group of studies, which was calculated to be 540 minutes, or 9 hours. Findings suggested that effects were smaller for studies that included more than 9 hours of SEL programming compared to those implemented for 0-9 hours (b = −.30, p = .026). This moderate difference indicated that SEL programs with a lower dosage were associated with greater prosocial behavior than programs with a higher dosage.
Duration
The duration of SEL programs reported in these studies also varied widely, ranging from one day (Pearl & Dulaney, 2006) to six years (Lewis et al., 2016). Similar to other meta-analyses (Cipriano et al., 2023), we categorized programs into those lasting less than half a school year (up to one semester; N = 35; 53%), between half and one whole school year (N = 17; 26%), and longer than one school year (N = 10; 15%). Findings showed that effects were smaller for programs implemented for longer than one school year (b = −.34, p = .029) compared to programs implemented between half and one school year. However, effects of programs implemented up to one semester (b = −.23, p = .169) were not significantly different from programs lasting between half and one school year. That is, programs longer than one school year yielded lower effect sizes on prosocial behavior than programs lasting under one school year.
Deliverer
Most SEL programs in the included studies were conducted by students’ classroom teachers (N = 27: 41%) or their teacher in partnership with other people (N = 15; 23%). Other interventionists included program staff (N = 12; 18%), members of the research team (N = 7; 11%), or other school personnel such as school counselors (N = 5; 8%). Findings showed that compared to SEL programs delivered only by students’ classroom teacher, effects on prosocial behavior were similar to SEL programs delivered by other school personnel such as school counselors (b = −.05, p = .832), university researchers (b = −.23, p = .335), program staff and others (b = .28, p = .138), and teachers combined with others, such as program staff or researchers (b = .13, p = .260).
Intervention Tier
The majority of studies implemented universal Tier 1 interventions (N = 59; 89%), 11% (N = 7) implemented Tier 2 interventions and none implemented Tier 3 interventions. Findings indicated that effects of studies delivered at Tier 2 were not statistically different from Tier 1 studies (b = −.03, p = .798).
RQ4: Moderating Effect of Methodological Characteristics
Study Design
Studies were approximately evenly split between those that used randomized controlled trials (RCTs; N = 31; 47%) compared with those that did not, including quasi-experimental designs (QED; N = 26, 39%) and single-group pre–post designs (N= 9, 14%). Studies that used a QED (b = −.12, p =.285) and single-group pre–post design (b = −.34, p = .065) yielded similar effects on prosocial behavior compared to studies that used an RCT.
Prosocial Behavior Measure
Studies mostly used teacher-report (N = 26; 39%) or student-report (N = 17; 26%) to measure prosocial behavior. Some studies included multiple measures of prosocial behavior (N = 14; 21%). Few studies used other measures such as peer-report only (N = 4; 6%), classroom observation only (N = 3; 5%), or parent report only (N = 2; 3%). The most common measure of prosocial behavior came from the student version of the Strengths and Difficulties Questionnaire (SDQ; Goodman, 2001; N = 9 studies) followed by the teacher version of the Social Skills Rating System (SSRS; Gresham & Elliott, 2008; N = 6 studies). Effects were similar across different types of prosocial behavior measures. Specifically, effects of prosocial behavior measured via student-report (b = .19, p = .130), parent-report (b = .30, p = .197), teacher-report (b = .32, p = .059), observation (b = .25, p = .373), or more than one type of measure (b = .16, p = .328) were not statistically significantly different from prosocial behavior measured via peer-report.
Baseline Equivalence
Most studies met baseline equivalence (N = 48; 73%); only 18 studies (27%) did not meet baseline equivalence (this includes 9 studies that did not use a control group). Effects from studies that did not meet baseline equivalence were not statistically significantly different from studies that met baseline equivalence (b = −.16, p = .198).
Implementation Fidelity Report
Implementation fidelity was coded as a binary indicator of whether this information was measured and reported in the studies. Results indicated that most studies (N = 41; 62%) did not provide fidelity of implementation information, or the information was unclear. Only twenty-five studies (38%) provided this information. Effects from studies that did not report implementation fidelity were not statistically significantly different from studies that reported the fidelity of implementation (b = .08, p = .434).
RQ5: Moderating Effect of Publication Characteristics
Available Year
Results indicated that effects from studies conducted from 1991–2000 (b = .16, p = .538), 2001-2010 (b = .04, p = .802) and 2011–2020 (b = .09, p = .556) were not statistically significantly different from studies conducted more recently (2021–present).
Publication Status
Effects of SEL programs on prosocial behavior were similar for studies that were published in peer-reviewed journals and studies that were not peer-reviewed (b = .09, p = .556).
Publication Bias Analysis
To reduce the risk of distortions of our meta-analyses due to publication bias and outcome reporting bias, we used an Egger’s Sandwich test proposed by Rodgers and Pustejovsky (2021). Egger’s Sandwich takes into account multiple effect sizes nested within studies. If Egger’s Sandwich test shows a statistically significant result, it suggests the presence of publication bias due to the asymmetry of the distribution of effect sizes or small study bias. Results of this analysis indicated a symmetric distribution of effect sizes for prosocial behaviors (β = .22, SE = .19, p = .298). Thus, there is no clear evidence of publication bias in the current meta-analyses.
Discussion
This meta-analysis found that school-based SEL programs are effective at improving K–12 students’ prosocial behavior. Among the 157 effects from 66 studies included in this research synthesis, participation in SEL programs was associated with higher levels of prosocial behavior, with an overall effect of .24 (Hedges’ g, hereafter g). As we discussed previously, prosocial behavior is an important social competency because it promotes students’ overall social and emotional (SE) well-being, mitigates stress, is the foundation of positive relationships, promotes academic achievement, and prepares students for success in their later careers. In addition, as students become more prosocial, their teachers feel greater efficacy and are more likely to enjoy teaching. These outcomes are of high priority as the world recovers from the COVID-19 pandemic that has resulted in decreases in both student and teacher SE well-being, an increase in student disengagement in school, and an increase in teacher burnout (Becker et al., 2023).
The interpretation of an effect size requires judgment and depends on the area of research and the context (Schäfer & Schwarz, 2019). While there is disagreement about what is a “large” effect size, some have suggested that, for education programs that include entire schools, .25–.30 is large and .40–.50 is very large (Funder & Ozer, 2019; Kraft, 2020; Lipsey et al., 2012). Thus, we might interpret the results of this meta-analysis as medium to large in size for educational research. Even a “small” effect can have practical importance if the stakes are high, the outcome is important, or if a behavior is repeated over and over in the context of interacting with other people, such as prosocial behavior in a classroom (Funder & Ozer, 2019). Furthermore, Taylor et al. (2017) found that students who had greater SE assets at the conclusion of an SEL program were more likely to have long-term indicators of well-being into adulthood, including better family cohesion, increased educational attainment, higher income, fewer arrests, and less likelihood of receiving mental health or substance abuse services. Thus, they found significant long-term advantages at both the individual and community level, which augments the practical importance of the present effects.
Our results align with other meta-analytic findings (Lipsey et al., 2012). Prior meta-analytic reviews that have addressed the effects of K–12 SEL programs on the broader outcome of “positive social behaviors” that may have included some prosocial behavior have ranged from null to large effects: Ruby & Doolittle (2010) found no effects; Durlak et al. (2011) found an effect of g = .24 at posttest and .17 at follow-up; Sklad et al. (2012) found an effect of g = .39; Wigelsworth et al. (2016) found an effect of g = .37 for efficacy trials and .18 for effectiveness trials; Cipriano et al. (2023) found an effect of g = .18. These results are not directly comparable with our results because these prior reviews did not use the same operationalization of prosocial behavior. Rather, prior operationalizations of “positive social behaviors” aligned with a broad societal perspective on prosocial behavior (any behavior valued by society) rather than an intentionalist or consequentialist perspective (Pfattheicher et al., 2022). This broader societal perspective may include prosocial behavior intended to help others, but also behaviors such as problem-solving skills, confidence, and empathy that would not be included in the intentionalist or consequentialist perspective. Only two other meta-analyses have focused on a similar conceptualization of prosocial behavior. In a review of 33 studies, Shin & Lee (2021) found an overall effect size of g = .44, but they did not limit their review to school-based programs (e.g., research labs, psychiatry facilities). In a review of 10 studies in schools, Mesurado et al. (2019) found an overall effect size of g = .23, which converges closely with our results.
SEL Programs Vary in Effectiveness
Our results suggest substantial between-study heterogeneity in effect size, ranging from −.06 to .53, that is greater than would be expected by sampling error alone (D. A. Rosenthal et al., 2006). Mesurado et al. (2019) also found substantial heterogeneity, with effect sizes ranging from −.07 to .68. However, they did not investigate moderating variables due to the small number (10) of included studies. Other meta-analyses discussed in this article that measured the broader concept of positive social behaviors rather than prosocial behavior per se also found substantial heterogeneity (i.e., Cipriano et al., 2023; Durlak et al., 2011; Sklad et al., 2012; Wigelsworth et al., 2016).
What might account for this heterogeneity, suggesting that some programs are more effective than others? Our moderator analysis explored 14 variables that may influence this heterogeneity. We explored three sample characteristics (i.e., school level, urbanicity, and F/RPL percentage), five program characteristics (approach, dosage, duration, deliverer, and tier), four methodological characteristics (study design, measure type, baseline equivalence and implementation fidelity), and two publication characteristics (decade of availability and publication status). Of these 14 moderating variables, only two were linked to differences in effect sizes for programs: dosage (total program session time for curricular programs), and duration (length of program delivery).
Some moderating variables that we expected to influence effect sizes had only a small number of studies in some categories. For instance, of the 66 included studies, only four specifically targeted high school students, seven programs operated at Tier 2, four were delivered in exclusively rural areas, and three used an interactional approach as compared to a curricular or combined approach. Free or reduced-price lunch status was also reported in only 19 studies. Thus, our results related to these variables should be interpreted with caution.
More Is Not Necessarily Better
For these 66 studies, dosage and duration were linked to effect size. We found that effects were larger for programs that implemented less than 9 hours of SEL curriculum, compared to programs that implemented more than 9 hours. Similar results were found by Cipriano et al. (2023) in their meta-analysis that included other behavioral outcomes. Thus, a modest number of sessions may be adequate, or ideal, to affect change in students’ prosocial behavior. It is important to keep in mind that this result pertains to a curricular approach, rather than interactional or structural approaches, which do not have a measured dosage. We also found that effects on prosocial behavior were larger for programs that lasted 5–9 months, compared to programs that lasted more than one school year.
Our results suggest that “more” (>9 hours of curriculum) and “longer” (>9 months) are not associated with larger effects on students’ prosocial behavior for curricular SEL programs. Others have also found that greater dosage and duration are not necessarily associated with greater effect size (Corcoran et al., 2018). Indeed, Sklad et al. (2012) also found that programs lasting less than a year had larger effect sizes than programs more than a year in length. Lengthy programs may be difficult to sustain with fidelity. Programs with more hours of SEL curriculum may lead to implementation fatigue, and may stress teachers with already overfull academic curriculum demands. However, treatments of too short duration or low intensity may show smaller or insignificant effect sizes (Diekstra & Gravesteijn, 2008). More research is needed to identify an “optimal” dosage and duration for promoting prosocial behavior among curricular programs (Yeager et al., 2018).
Minimal Variation in SEL Attributes
Most of our included studies were Tier 1 (89%), used a curricular-only approach (77%), and were delivered at the elementary level (56%) by classroom teachers (64%), either solely or with the help of others. Our results suggest that none of these attributes are associated with effect size. However, our results should be interpreted with caution because the small sample size of interventions with other attributes may diminish our ability to detect significant differences in the effects of these program attributes on prosocial behavior. Greater diversity of intervention attributes is needed in the field in order to explore the effects of these attributes. We discuss these variables next.
Multi-tiered systems of support (MTSS) within schools intend to identify struggling students and provide increasing levels of support based on need (Steed & Shapland, 2020). Tier 1 interventions are universal, targeting the entire student body, whereas Tiers 2 and 3 target small groups of students who need greater intervention intensity. Earlier meta-analyses focused most on Tier 1 (universal) programs (Cipriano et al., 2023; Durlak et al., 2011; Sklad et al., 2012; Taylor et al., 2017; Wigelsworth et al., 2016). Our review included all tiers, but Tier 1 programs delivered by teachers were the norm for programs focused on students’ prosocial behavior. Tier 1 programs that promote prosocial behavior among the entire student body do not replace but may enhance and support Tier 2 and 3 mental and behavioral health interventions by creating a positive, inclusive school climate (August et al., 2001; Posamentier et al., 2023). Furthermore, some argue that Tier 1 programs are likely to have a greater overall social benefit because they may prevent Tier 1 students from later needing Tier 2 (Greenberg et al., 2017) or Tier 3 (Bradshaw et al., 2012) interventions in some cases. Ideally, all three tiers operate together in schools.
Programs that support students’ prosocial behavior may be curricular, interactional, structural, or combined (Bergin et al., 2023; Lawson et al., 2019; Weissberg et al., 2015). We expected that effects would be larger for interactional or combined programs for two reasons. First, they are experienced by students throughout each school day, resulting in higher dosage. Second, curricular approaches stress teachers by adding onto an already full academic curriculum, even though teachers can successfully implement curricular-based SEL (Durlak et al., 2011; Sklad et al., 2012). Our results did not support this expectation. However, more research is needed to confirm this result because there were only four studies that used an interactional approach, and this variable was challenging to code due to incomplete descriptions among included studies.
Deliverer refers to who is administering the SEL program. Some studies have found larger effects for SEL and bully-prevention programs when administered by researchers rather than teachers. The advantage of researcher-as-deliverer in these other studies may be due to either bias in which the deliverer and evaluator are the same, or due to high fidelity resulting from deep expertise and careful monitoring by researchers (Polanin et al., 2012). For example, Wigelsworth et al. (2016) found large effects for positive social behavior when the program developers (typically university researchers) were involved with, but did not necessarily lead, the program. Our results did not support this expectation (see also Cipriano et al., 2023). Perhaps in our review, teachers were as effective as researchers because the outcome of interest was students’ prosocial behavior, which is affected by quality of teacher–student relationship (Bergin, 2018). It is plausible that any advantage conferred by research expertise in the deliverer may be offset by the advantage of an ongoing relationship with the teacher-as-deliverer.
Addressing What Works for Whom
In this review, we asked whether there were groups of students for whom SEL programs are more effective in promoting prosocial behavior? Three moderating variables were explored – school level (i.e., elementary, middle school, high school, or multi-level), percentage of students who qualified for free or reduced-price lunch (F/RPL), and urbanicity (i.e., rural, urban, suburban, or combined locale). There were no statistically significant differences found. That is, for these 66 studies, students in these SEL programs had similar levels of prosocial behavior regardless of these sample characteristics.
Our results regarding school level are similar to Sklad et al. (2012) who concluded that SEL programs are effective for elementary, middle, and high school students, but may be slightly less effective for younger students. This issue merits further investigation because Durlak et al. (2011) found a negative relationship (r = −.27) between student age and SEL outcomes. SEL programs can be challenging to implement in middle and high school settings for a variety of reasons (e.g., when and where to implement curriculum since students rotate to different classes and teachers throughout the day). According to Yeager (2017) and Yeager et al. (2018), effective SEL for secondary students needs to meet their psychological needs to feel respected, be admired in the eyes of others, do good in the world beyond the self, belong, and be part of a positive school climate. These needs may often be left unmet if programs simply adapt an elementary curriculum to a secondary school audience without these developmental considerations. Nevertheless, it is also plausible that some programs that require advanced cognitive development, such as Forgiveness Education (Rapp et al., 2022), may be more effective for secondary rather than elementary students. In our meta-analytic review, results regarding high school should be interpreted with caution because few studies occurred in high school.
Our result regarding F/RPL status is similar to Taylor et al. (2017) who concluded that SEL programs are effective across SES and ethnicity groups. Yet, other previous research suggests that increasing prosocial behavior may be especially effective for students in poverty. A positive school climate facilitated by students’ prosocial behavior can mitigate the negative effect of poverty on academic achievement (Berkowitz et al., 2017). In another study, high-poverty schools narrowed the achievement gap when good instructional practices were combined with prosocial behavior during group work (Ladd et al., 2014). In our meta-analytic review, results regarding F/RPL should be interpreted with caution because most studies (71%) did not clearly report this variable.
It is important to note that many studies in our review did not report sample characteristics. Among studies that did report sample characteristics, programs were most likely to serve elementary schools, be above the median in the percentage of students qualifying for F/RPL and located in urban settings (see Table 1). Over a decade ago, Durlak et al. (2011) pointed out that rural students were underrepresented in SEL program development and research. This challenge continues today. In our review only 6% of studies and in the broader Cipriano et al. (2023) review only 5% of studies were located in rural settings. Thus, included studies significantly underrepresented rural students and middle and high school students.
Too few studies in our review reported diversity (e.g., ethnoracial, gender, and SES) characteristics of students. Rowe and Trickett (2018) also found that student diversity characteristics were inconsistently reported across articles, with most studies not testing for moderating effects, and among those that did, reporting inconsistent effects. Cipriano et al. (2023) argued that collecting such information is important in order to know whether students are being inclusively and equitably served by SEL and to help us understand who is benefiting from intervention. We concur with their conclusions and argue that reporting demographic information is an important step to move the field forward.
Measurement of Prosocial Behavior
We found no difference in effects by the source of prosocial behavior measurement. In our review, most included studies (65%) used either student self-report or teacher-report to measure prosocial behavior. In the Taylor et al. (2017) meta-analysis, most (74%) studies used student report. Indeed, most measures of SEL outcomes use student-report or teacher-report, despite important limitations (Duckworth & Yeager, 2015). Lack of representation of other types of measurement may have partly contributed to our finding of no differences across prosocial measurements.
Measurement is a challenge across the SEL field. Concerns have been raised about the use of student and teacher reports. While student-report can be appropriate for assessing internal and subjective experiences, it may be biased due to social desirability, and teacher reports may be biased as program deliverers. Osher et al. (2016, p. 663) wrote, “The field needs practical measures with psychometric evidence that enable comparisons among studies and samples that can replace or supplement student reports, teacher reports, and indirect measures of social-emotional skills (e.g., disciplinary infractions).” In contrast to the wide availability of SEL programs, there are too few social-emotional measures that are usable, feasible, and scalable (McKown & Taylor, 2018). Many studies also rely on only one source for their outcomes; in the present meta-analysis, only 14 studies included prosocial measurements from multiple reporters (e.g., teacher- and student-report). McKown and Taylor (2018) point out that hundreds of millions of dollars have been invested in rigorous development of achievement tests, but similar investment has not yet been made in assessment of social-emotional competencies. Fortunately, the U.S. Institute of Education Sciences has prioritized development of better measures of SE competencies—such as prosocial behavior—that are appropriate for program evaluation, with current projects underway (Taie & Goldring, 2020).
Effects Are Robust to Study Design and Publication Date
We found that among included studies, effects were robust to study design. Approximately half of included studies were randomized controlled trials (RCTs; 47%) and the remaining 53% used non-RCT designs such as quasi-experiments and single-group pre–post design. In education intervention research, quasi-experiments tend to find somewhat larger effects sizes than RCTs. For example, in a study of SEL effects on academic achievement, Corcoran et al. (2018) found that quasi-experiments had larger effects than RCTs, and rigorous RCTs did not show effects. Staines and Cleland (2012) argue that in quasi-experiments “estimates of efficacy are typically too large because of group differences in pretreatment motivation favoring the treated group” (p. 37) but that RCTs may underestimate effects. Yet, in our review there was no difference between study types. Whereas RCTs provide the most unbiased estimates of treatment effects, our findings may have important implications because quasi-experiments can be less costly to implement and more palatable to educators who do not want to be placed in a wait-list control group.
Regarding study quality, our results were somewhat counter intuitive—there were no differences in effects based on methodological design quality (e.g., baseline equivalence, or report of implementation fidelity). One possible reason for this finding is that the studies that did not meet baseline equivalence and used single-group design used mainly researcher-developed measures, which might produce larger effects because they are closely aligned with the goals of the SEL instruction (Scammacca et al., 2007). Regarding implementation fidelity, it is notoriously difficult to measure this in school-based research. Only about one-third of the studies in this review reported implementation in a quantifiable way. Without this information, it is difficult to ascertain the true effectiveness of an intervention. Publishing this information should be a priority for future intervention studies.
We found that among included studies, effects were also robust to decade they were published and whether they were published in a peer-reviewed journal or not. Most included studies were published (76%). R. Rosenthal (1979) pointed out the “file-drawer problem,” which is that studies are less likely to be published if they have small or nonsignificant effect sizes. Such publication bias continues to be a problem (Polanin et al., 2016). However, as discussed previously, we conducted analysis that suggested no proximate threat of missing study bias in the current meta-analysis.
Implications for Practice
A key implication for practice from our meta-analytic review is that SEL programs can increase students’ prosocial behavior with a medium–large effect size (g = .24). This is important because students’ prosocial behavior is associated with both students’ and teachers’ social-emotional well-being, as well as students’ academic engagement and achievement (e.g., Aldridge & Fraser, 2016; Wentzel et al., 2004).
Another key implication is that there was substantial variation among the included studies. This means that practitioners and policymakers who seek to increase students’ prosocial behavior cannot assume that all programs are equally effective. Two characteristics of SEL programs that were associated with effect size were dosage and duration for curriculum-based programs. A moderate number of program sessions (less than 9 hours) may be effective for curriculum-based programs, but many sessions (more than 9 hours) are not necessarily better for promoting prosocial behavior. In addition, curriculum-based programs that go beyond a school year are not necessarily better than programs lasting 5-9 months. No other program characteristics—intervention tier, deliverer (e.g., teacher vs. researcher or counselor), approach (e.g., curricular vs. interactional)—were associated with effect size. Nor were any sample characteristics—school level (e.g., elementary vs. middle or high school), urbanicity (urban vs. rural), and F/RPL percentage. However, there was either too little variation among these variables, or insufficient reporting, to draw strong conclusions about the role of these characteristics. Almost all programs were Tier 1, delivered by teachers using a curricular approach in elementary schools. More diversity of programs is needed to conclude with confidence whether approach, school level, deliverer, student demographics and locale are linked to effect size.
Limitations
There are several limitations of the current meta-analytic synthesis. First, our meta-analyses can only establish an association between a moderator variable and the reported effect size. This association does not imply that the moderator is the cause of the effect. Thus, assumptions of causation should be made cautiously, in the context of other evidence. Our moderator tests were intended to be exploratory and should be substantiated by experimental research isolating each of the moderator variables.
Second, meta-analyses are dependent on the quality of included reports. In this review, potentially relevant variables could not be examined as moderators. We did not code the quantity and quality of deliverer training and technical assistance because few studies report this variable, despite the importance of ensuring the deliverer has proficiency in the target skills and has emotional support to conduct the program (Durlak & DuPre, 2008; Jennings & Greenberg, 2009). We did not code sample characteristics (e.g., ethnoracial and gender characteristics) because few studies report these variables, which limits our knowledge of the generalizability of results. This is characteristic of the field at large (Cipriano et al., 2023; Rowe & Trickett, 2018).
Third, we cannot ensure that our literature search methods were entirely comprehensive and exhaustive. We did not include non-English studies and excluded some studies that did not have adequate statistical information. We attempted to obtain this information by reaching out to the authors. We did not receive any replies in some instances, while in others, the authors conveyed that the data was unavailable. By omitting those studies, there is potential to systematically bias the results and reduce the accuracy of results (Jüni et al., 2002).
Fourth, we did not include regression coefficients as effect sizes in this meta-analysis. Regression-adjusted estimates can provide more precise treatment effect estimates by accounting for baseline differences between groups (What Works Clearinghouse, 2022). As a result, excluding these estimates may have limited the number of eligible studies and reduced the precision of the effect size estimates.
Fifth, we used a correlated effects (CE) model with robust variance estimation in the meta-regression. The CE model accounts for dependence among multiple effect sizes from the same study by assuming correlated effects but does not explicitly model the hierarchical nesting of effect sizes within studies. The degrees of freedom in the meta-regression are more closely tied to the number of effect sizes than to the number of studies, allowing the inclusion of a relatively large number of moderators. Therefore, the findings from this study should be interpreted in light of the assumptions of the CE model, and future research may consider examining whether results are robust under alternative modeling approaches such as the correlated and hierarchical effects (CHE) model.
Lastly, our findings indicating no effect for implementation fidelity are not an indicator of its lack of importance. We did not code the level of implementation achieved (e.g., satisfactory, low, and high) and steps that could increase effective program implementation (e.g., pre-program training and ongoing support). Given previous research demonstrating the importance of fidelity for predicting interventions’ efficacy (e.g., Cipriano et al., 2023), we urge caution in interpreting results.
Future Research
In their recent comprehensive meta-analysis, Cipriano et al. (2023, p. 19) state that the SEL field is both “oversaturated” and “underdeveloped.” We concur that it is underdeveloped with regard to promoting prosocial behavior in K–12 students. In our discussion previously, we have pointed out issues that need to be attended to in future research. These include use of precise and high-quality measures of prosocial behavior that are amenable to intervention and that avoid the overuse of student self-report and teacher report. Studies also need to provide clear information about sample characteristics.
Another key area for future research is to investigate interventions that use approaches other than the common curricular approach. It is not clear whether the moderating variables we discuss previously, such as dosage and duration, are relevant to SEL interventions that take other approaches. For example, approaches that focus on changing the way teachers interact with students are not measured in terms of dosage, and it is likely that for these approaches longer duration is more effective for permanent behavior change among teachers. Such approaches may have more sustained effects on students, helping to prevent any fade-out effect.
A meta-analysis is a quantitative summary of a body of work (Pigott & Polanin, 2020). It does not answer some important questions of how, why, or for whom interventions work. One area for future research, given the heterogeneity we found, is to unpack the black box of interventions designed to promote K–12 students’ prosocial behavior. Future research should address the content of interventions and identify active ingredients. Our meta-analysis can provide a quantitative foundation for such investigation and for developing new evidence-informed programs (Embry & Biglan, 2008).
Conclusion
Our systematic meta-analysis is the first to specifically focus on SEL’s effect on prosocial behavior in K–12 students, and also examined several moderating variables (see also Mesurado et al., 2019; Shin & Lee, 2021, for a smaller, multilingual reviews). Reviewing 157 effects from 66 studies, we found that SEL is associated with prosocial behavior with a medium–large effect. It is effective among elementary, middle, and high school students in various locales when compared with those who did not receive the SEL programs.
In addition, SEL programs vary greatly in their effect size related to prosocial behavior. Among the 14 moderating variables we examined, dosage and duration were the only program characteristics associated with effect size. Some variables that we expected to be associated with effect size were not, but sample sizes were too small to have confidence in those results. Most included studies focused on Tier 1 programs delivered by teachers in elementary schools using a curricular approach. Few involved high schools or rural students or used an interactional or structural approach. Few reported demographic information about the sample. We hope that this review provides timely discourse about the importance of programs for promoting prosocial behavior among K–12 students and the need for more detailed research in this area.
Supplemental Material
sj-docx-1-rer-10.3102_00346543261438462 – Supplemental material for Social and Emotional Learning Programs and Students’ Prosocial Behavior: A Meta-Analysis
Supplemental material, sj-docx-1-rer-10.3102_00346543261438462 for Social and Emotional Learning Programs and Students’ Prosocial Behavior: A Meta-Analysis by ChenYu Hung, Nicole R. Brass, Lindsay Brockmeier, Christi Bergin, Madison Imler and Sha’Breon Luper in Review of Educational Research
Footnotes
Notes
Authors
CHENYU HUNG, PhD, is currently a researcher at American Institutes for Research based in Arlington, VA, USA; email:
NICOLE R. BRASS, PhD, is currently a research associate at RMC Research Corporation based in Denver, CO, USA; email:
LINDSAY BROCKMEIER, PhD, is a postdoctoral researcher at the University of Missouri in the Educational, School, and Counseling Psychology Department, 211A Hill Hall, Columbia, MO 65211, USA; email:
CHRISTI BERGIN, PhD, is a research professor at the University of Missouri, College of Education and Human Development, 109 Hill Hall, Columbia, MO 65211, USA; email:
MADISON IMLER, MS, is a doctoral student at University of Missouri, College of Education and Human Development, 211 A Hill Hall, Columbia, MO 65211, USA; email:
SHA’BREON LUPER, MEd, is a counselor at Compass Health Network in the NAVIG8 adolescent substance use treatment program, 3501 Berrywood Dr, Columbia, MO 65201; email:
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
