Abstract
Social and emotional learning (SEL) programs are widely used, yet concerns have arisen about whether the evidence for these programs extends to students of color (SOC). The data in this study include published articles (n = 158) of trials (n = 97) of SEL interventions (n = 32) from the CASEL SELect list of evidence-based SEL programs. Using racial frames common in intervention research, we examined the extent of SOC representation in SEL intervention trials, how authors attend to race/ethnicity in their analyses, and whether and how these analyses show evidence that these programs benefit specific racial/ethnic groups. While doing so, we discuss the complex nature of race and racism in SEL research. Eight interventions provided some evidence that they benefit Black students and four showed some evidence that they benefit Hispanic/Latiné students. No trials provided evidence of benefit to any other groups of SOC. Findings suggest that while representation of SOC in SEL trials has improved, additional research is needed to understand to whom the evidence for SEL program effectiveness applies.
Keywords
Social and emotional learning (SEL) has been defined as the process of acquiring skills and competencies to develop supportive relationships, solve interpersonal problems, and recognize and regulate emotions (Collaborative for Academic, Social, and Emotional Learning [CASEL], 2023). Grounded in the theories and frameworks of prevention science (Greenberg, 2004), resilience (Masten & Obradović, 2006), and emotional intelligence (Cherniss et al., 2006; Mayer et al., 2004), SEL has focused on developing the social and emotional skills and competencies of students that can influence learning, academic success, and behavioral health (Durlak et al., 2015; Elias, 2006; Greenberg et al., 2003; Gueldner et al., 2010; Zins et al., 2007). Discussions about SEL delivery most often describe the implementation of student-facing SEL programs. Programs typically refer to sequenced, explicit lessons focused on SEL that use active learning strategies (Durlak et al., 2011). SEL programs are often described as a “promising approach” to promoting SEL in school contexts (M. T.Greenberg, 2023; Lawson et al., 2019). SEL programs, however, are implemented in educational environments that serve children from a wide range of racial, ethnic, and cultural backgrounds, who are situated differently in a society that is stratified by racism. Despite the promising evidence that many SEL programs are efficacious (Corcoran et al., 2018; Durlak et al., 2011; Sancassiani et al., 2015; Taylor et al., 2017), questions remain as to whether the available evidence of efficacy is generalizable to youth who have been racialized 1 into specific racial and ethnic categories (Castro-Olivo & Merrell, 2012; Hoffman, 2009; Rowe & Trickett, 2018).
Learning what works for whom is a goal across many fields of intervention research (Roth & Fonagy, 2013). Within SEL, researchers have made strides toward determining what works and have translated scientific findings into accessible catalogs of programs that meet minimum standards of evidence for efficacy (CASEL, 2013, 2015; Hirsch et al., 2023). The second part of the question, for whom, has yet to be answered with confidence, particularly across students categorized into specific racial and ethnic groups (Rowe & Trickett, 2018).
Given the epistemological origins of intervention research, researchers typically pursue an objective—rather than a constructed—truth about what works for whom. Similarly, these studies implicitly treat race as an inherent attribute of a person rather than a status designation in a constructed societal system. Despite the positivistic orientation of this literature, this body of scholarship typically recognizes that the pursuit of objective evidence is shaped by the frames and biases that researchers bring to the work. Therefore, it is relevant to note that scientific funding for intervention trials is awarded to predominantly White researchers (Ginther et al., 2011), and racial and ethnic minoritized 2 (REM) groups are often underrepresented in intervention trials (Aisenberg, 2008; Cardemil, 2010a; Lau, 2006; Mak et al., 2007; McBeath et al., 2010). Even if REM groups are included in a study, it is unclear how often scholars analyze results specific to racial and ethnic groups in order to understand for whom the program works. Thus, to understand the state of the literature with regard to the generalizability of SEL interventions for specific racialized groups, this review examines the existing evidence for SEL programs as it applies to specific racialized groups and what the evidence means in the context of race and racism. This review leverages a familiar post-positivist approach in order to best take up and speak to existing intervention research but also seeks to include a critical consciousness of the use of race in intervention research and its implications. Specifically, this review (1) explores the post-positivist concept of generalizability and its connection to SEL interventions, (2) describes race and racism in the context of SEL programming, (3) examines the nature of student representation across specific racialized groups within SEL trials, (4) elucidates approaches to examining specific racialized groups in intervention analyses, (5) summarizes the extent and quality of the evidence of efficacy for specific SEL programs with REM groups, and (6) grapples with the implications for research and practice of SEL.
Generalizability of SEL Interventions
In a post-positivist paradigm, a determination of generalizability is, at a minimum, predicated on the representativeness of the sample relative to the population (Aisenberg, 2008; Creswell, 2013; McBeath et al., 2010). Given that some racialized groups have historically been underrepresented in intervention trials (Aisenberg, 2008; Mak et al., 2007; Patterson et al., 2016), conclusions about generalizability may be unfounded when insufficient students from specific racialized groups are included in efficacy trials (Aisenberg, 2008; Blair & Zinkhan, 2006). Sample representation is primary and necessary, but not a sufficient condition for generalizability across specific racialized groups.
We identify additional necessary conditions for generalizability within the prevailing post-positivist framework of intervention research. In addition to having trials with representative samples, establishing generalizability across specific racialized groups requires evidence of intervention effectiveness for specific groups (Aisenberg, 2008; Cardemil, 2010b; Kirmayer, 2012). In other words, we must consider the possibility that a change in overall mean scores does not mean that any benefit of the SEL program is uniform across subgroups of individuals who are racialized differently. Alternatively, statistically significant improvement overall could actually mean that some groups of students experience large positive effects while other groups of students experience moderate harms—yet, this variance is often left unexplored. Issues with generalizability across diverse racialized groups may appear in the form of inequality in outcomes or when different student groups engage in the intervention to different degrees (Lau, 2006). The “burden of proof” (Cardemil, 2010b, p. 42) lies with the intervention developer to demonstrate evidence for generalizability through specific evidence of intervention effects for specific racialized groups (McBeath et al., 2010). The question “To whom do the results of this trial apply?” is routinely asked in medical trials in order to meet this expectation (Rothwell, 2005). We can apply this question to education as well and offer the present study as one attempt to do so.
Race, Racism, and Challenges With Generalizability
Before asking to whom do the results of this SEL trial apply, it is important to explore the concepts of race and ethnicity and understand their interconnectedness to racism. Ethnicity can be understood as groups of people who characterize themselves, or who are characterized by others, as having commonality along a range of characteristics including nationality (e.g., tribe, community, or geographical region), culture, language, parentage, ancestry, citizenship, and/or religion (Betancourt & López, 1993; Burton et al., 2010; Cokley, 2007). To define race, we first acknowledge that the extant literature on the theories and conceptualizations of race is replete with tensions and arguments (e.g., Banton, 2020; Ray & Seamster, 2016; Winant, 2000). Although there is no formally agreed upon definition of race, many race scholars conceive race to be a social construct historically created to give social importance to perceived characteristics (Betancourt & López, 1993; Cardemil, 2010b; Zuberi & Bonilla-Silva, 2008). According to Hesse (2016), “Race is not in the eye of the beholder or on the body of the objectified. Race is an inherited western, modern-colonial practice of violence, assemblage, superordination, exploitation and segregation. Race is constitutively and unequally relational, regulatory and governmental, demarcating the colonial rule of Europe over non-Europe” (p. viii). Following this, racism is understood as any action backed by power that subordinates a group based on, for example, phenotypic, linguistic, or cultural differences that amass institutional power for the dominant group (Hesse, 2007; Leonardo & Grubb, 2018; Roberts et al., 2008; Rosa & Flores, 2017). Racism in schools, therefore, positions students in a social hierarchy and underpins existing inequitable schooling conditions such as unequal access to funding, resources, or learning opportunities (Carter et al., 2017; Milner, 2012), as well as unequal student experiences of school violence, bias-based bullying, microaggressions, stigmatization, and discrimination (e.g., Anderson et al., 2019; Mulvey et al., 2018; Nasir et al., 2017). Acknowledging that race is socially constructed does not render the lived experiences of racism as any less real. Discriminatory treatment of REM individuals results in social and material consequences; it characterizes a pervasive shared experience, and thus, categorizations of race can be useful in codifying systemic outcomes across different groups (Saleem et al., 2020; Smedley & Smedley, 2005; Tam et al., 2023). In fact, the resulting educational disparities between racial groups are well documented (Barton & Coley, 2010; Ladson-Billings, 2006). Using racial and ethnic categories as a proxy for racialized experiences and racism in intervention research provides important insights.
However, there are noteworthy limitations of characterizing people by race for studies of treatment effects (Shapiro et al., 2024; Tan et al., 2023). For example, using race to characterize people reifies the “colonial apparatus” (Mahmud, 1999, p. 1227) that created racism and risks misrepresenting “race” as empirically verifiable (Hesse, 2007). Race scholars broadly consider racial categories to be designed by people wielding power (e.g., activists, bureaucrats, media) and offered, assigned, and rearranged at the discretion of the powerful (e.g., administrators, researchers; Mora, 2014). In other words, participants in intervention research are rarely afforded the ability to self-identify their race beyond choosing from administrator or researcher-selected categories. Categorizing people into broad groups (often called “essentializing”) further ignores within-group heterogeneity that may exceed between-group heterogeneity, perpetuating a reductionist racialization of people. It is imperative, however, to explicitly examine evidence of program effectiveness among students who have been racialized as students of color. In the subsequent sections of this paper, in addition to using the term REM to refer to specific racial and ethnic groups that have been minoritized (i.e., treated as distinct from and less important than a dominant group) in the United States, we also use the term “students of color” to reflect the dominant discourse in the literature that we are synthesizing. Both terms are used with the intention of bringing collective consciousness to the effects of SEL programs on students who experience racism. As scholars, we hold the complexity of seeking to analyze the evidence underlying claims of universality while recognizing the profound problems of racial categorization. Since differential experiences of racism exist among different REM groups, when possible, we discuss specific REM groups to make specific evidentiary claims.
Criticisms of Color-Evasive SEL Programming
Several relevant criticisms of current SEL programming exist that provide a compelling rationale for why the generalizability of SEL programs across racialized groups should not be assumed. In recent years, there have been increasing calls for a critical interrogation of SEL programming based on the common use of color-evasive approaches. Color evasion is a term coined by Annamma et al. (2017), building on critical theories (e.g., Bell, 1995; Bonilla-Silva, 2017), who claim the avoidance of race is a form of racism—“a way to willfully ignore the experiences of people of color, and makes the goal of erasure more fully discernible” (Annamma et al., 2017, p. 156). Simmons (2021) writes that SEL programs fail to address sociopolitical realities and put “White hegemony on display” (np). Kaler-Jones (2020) offers a warning about SEL programming that centers White, cisgender, patriarchal values and cautions that when enacted without attention to systemic inequities, SEL programming can be another form of “policing, punishment, and criminalization of Black and Brown students” (np). Ginwright (2015) explores how SEL programs rarely focus on building consciousness of racism or promote actions that address social conditions harmful for students of color (p. 8). Camangian and Cariaga (2022) assert that SEL programs’ “hyper-focus on rote skill development, culturally hostile curricular, and community irrelevant assessment” prohibit young people—especially youth of color—from imagining life beyond the conditions in which they currently live (p. 3). Jagers et al. (2019) critique the ways in which many popularized versions of SEL focus on personal responsibility; instead, they argue, we must shift toward transformative approaches that prioritize equity. Given that epistemological differences in the way problems are identified and operationalized may impact the design and effectiveness of a particular intervention strategy (Kirmayer, 2012), these arguments compel us to determine the extent to which SEL interventions offer any discernible benefit to specific racialized groups.
Some scholars argue that SEL programs that fail to acknowledge the realities of racism may work to maintain systemic inequality. For example, Gregory and Fergus (2017) found that “colorblind” notions of SEL fail to interrupt racial differences in school discipline. Specifically, they found that across three school districts in three different states, Black student exclusionary discipline rates (e.g., suspension) remained substantially higher than those of White students, even after several SEL initiatives and programs were implemented. These authors caution that focusing on student SEL alone, without explicitly attending to the racialized experiences that students of color face, risks “sustain[ing] a White cultural frame” and perpetuating inequalities (p. 127). Despite these concerns, popularized SEL programs, most of which do not explicitly acknowledge the racialized experiences of students in school, have been widely adopted. Ramirez et al. (2021) conducted a qualitative content analysis of SEL programs for pre-kindergarten through fifth-grade students. Through their review of thirty-three SEL programs, the authors found that SEL programs rarely (a) included practices and skills that incorporate cultural knowledge, assets, and experiences of students from diverse communities, or (b) acknowledged or addressed social injustices, inequality, prejudices, or exclusions that students of color may face. Thus, we must determine the strength of efficacy evidence for widely used SEL programs among students who experience racism.
The Tension of Acknowledging Racism Without Reifying Race
To be clear, most critics of color-evasive SEL programming do not advocate for the further racialization of students, classifying students as part of a racial capitalist project, or subjecting students of color to targeted or segregated racialized interventions. Rather, the aforementioned critiques largely call for decentering white supremacy and interrogating racism within SEL research and practice. This creates tension for the aims and methods of the current paper, in which we seek to explore assumptions of racial and ethnic universality in SEL intervention research (i.e., that current SEL programs are beneficial for all students, despite students being differentially situated in a society organized by racism). This exploration is one way to discover the extent to which constructed racial hierarchies implicitly shape the SEL discourse. Failing to critically examine the evidence underlying the assumptions of universality could be viewed as a reflection or expression of racism. On the other hand, the best tools that we have currently for exploring the assumption of universality in the existing SEL intervention research relies on the use of racial categories that, by nature of their use, perpetuate the racialization of students of color. These racial categories flatten the racial and ethnic diversity among those racialized within each category and reify a system designed to create a racial hierarchy.
Holding this tension, we have elected to conduct our analysis of to whom the results apply reflecting the racial frames dominant in SEL intervention research. We use racial categories, and associated language, to articulate our questions, conduct our analysis, and report our results. We do so to unpack assumptions about generalizability prevalent in SEL intervention research and consider the applicability of the evidence used to justify the widespread implementation of specific SEL programs in schools.
Exploring Race in SEL Intervention Analyses
There are varying ways that intervention researchers in the positivist traditions consider racial and ethnic categories when conducting intervention trials and examining effectiveness. Familiar approaches include (1) matching intervention and comparison groups on racial composition and/or testing for equivalence of racial composition; (2) treating race as a covariate in models assessing the main effect of the intervention on outcomes; (3) examining moderation of intervention effects by race; and (4) conducting subgroup analyses to examine the effects of the intervention for specific subgroups defined by race and ethnicity.
Trials that establish group equivalence acknowledge that racial demographics may play a role in intervention outcomes. Testing for racial group equivalence in randomized clinical trials is commonly done, although not necessary or recommended, for the purpose of showing that “randomization worked” (Moher et al., 2010). It is also commonly done in nonrandomized trials to understand similarities and differences in the racial composition of groups. Justification for these analyses reveals an underlying understanding of the possibility of race being a confounder of condition with respect to study outcomes. For instance, if a measure of an SEL outcome was strongly associated with race, and the racial composition of intervention and control groups differed, it is possible that this difference in racial composition, rather than the intervention itself, may account for differences in the SEL outcome. When there is concern that race may be a confounder, race is commonly included as a control variable in statistical analyses that assess group differences in outcomes. Including race/ethnicity as a control or covariate when it is strongly associated with outcomes can also increase power and precision of estimates of intervention effects, particularly in the case of school randomized trials where there are differences in the racial compositions of schools (Raudenbush, 1997). These approaches have clear benefits, but neither racial group composition equivalency testing nor including race as a covariate when assessing main effects of interventions on outcomes, in and of themselves, clarifies for whom an intervention is effective.
Interaction tests examine whether the intervention effects differ by race. For example, an intervention might have larger effects for Black students than White students, and this would be reflected in a race-by-condition effect in a regression model predicting a study outcome. A common decision-making rule used by researchers is to first assess the statistical significance of the race-by-condition interaction and, if the interaction is significant at p < .05, either (a) conduct follow-up subgroup analyses to estimate intervention effects separately for different racial groups or (b) derive those estimates from the model that includes the interaction term. However, intervention trials are typically powered to detect the main effects of the intervention for the full sample, making tests of moderation almost always underpowered to detect substantially different intervention effects sizes (Mak et al., 2007). For instance, a trial with .8 power to find a main effect of an intervention of d = .2 would have .8 power to detect an interaction effect roughly twice as large (i.e., d = .4), when the sample is split evenly between the two racial groups (Gelman, 2018). In other words, the typical SEL program evaluation is unlikely to find a positive interaction effect of the intervention by race, even when there are differences, because the studies are not typically designed to detect them. Furthermore, even when a significant interaction by race is found, we only establish that there is a reliable difference in intervention efficacy by race but not for whom an intervention is effective.
Subgroup analyses provide the strongest evidence that an intervention works for specific racial groups in racially diverse samples (Bloom & Michalopoulos, 2013; Supplee et al., 2013). Although a crucial follow-up step after finding a significant interaction effect, doing subgroup analyses need not be conditional on a statistically significant interaction effect (Farrell et al., 2013). Subgroup analyses provide important information to practitioners choosing interventions who want to know whether a particular intervention is efficacious for the racial/ethnic groups represented in their school or district.
Reviews of Race and Ethnicity in SEL Intervention Research
Few intervention studies have explicitly and empirically taken up questions related to race, and racism by proxy, in its connection to SEL programs. This claim is buttressed by five articles that each reviewed the intervention literature and assessed, in various ways, the extent to which evaluation studies of SEL programs have considered race. The first article, by Garner et al. (2014), was intended to illustrate the utility of a heuristic focused on how “sociocultural competence impacts the development and delivery of programs” (p.165). The authors extracted studies from a widely disseminated 2013 CASEL report and identified three programs (Resolving Conflict Creatively [Aber et al., 2003]; Michigan Model for Health [O’Neill et al., 2010]; Tools of the Mind [Barnett et al., 2008]) that reported differences in levels of outcomes by race/ethnicity. They did not, however, systematically review whether studies reported on the extent to which program effects differed by race/ethnicity. Although this article is helpful in other ways, reporting on race disparities in an outcome at post-test (e.g., youth of color had lower social-emotional skills) may contribute to deficit narratives about youth of color. Further, and most central to our inquiry, it does not help us understand whether the intervention worked for a specific racial group. This review, therefore, does not enable conclusions about the applicability of the evidence on SEL programs to students of color.
Second, Murano et al. (2020) conducted a meta-analysis of social-emotional learning interventions implemented with preschool-aged children. These authors analyzed moderation by study sample characteristics, categorizing individual studies as “majority-minority” (more than 50% of the sample were presumably not white). These authors found that studies with “majority-minority” samples did not differ in their effect sizes relative to studies not characterized in this way. While this review assessed that studies with majority students of color samples were not differentially effective relative to other studies among preschool students, this does not help us understand whether specific interventions are effective for students of specific racial and ethnic groups.
Third, Rowe and Trickett (2018) conducted a review of the articles contained in Durlak et al.’s (2011) highly cited meta-analysis of SEL programs to determine how diversity characteristics are reported in SEL intervention studies. They examined the sample characteristics (by race/ethnicity, gender, socioeconomic status, and disability) of these 117 published studies to assess the generalizability of the meta-analysis findings to diverse groups. While this study intended to assess the generalizability of existing evidence more explicitly and systematically than Garner et al. (2014), several issues limited their conclusions about the applicability of the evidence to students of color. By design, Rowe and Trickett’s (2018) review was limited by the original sample of articles included in Durlak et al.’s (2011) meta-analysis; the newest study included was published in 2007 (Durlak et al., 2011), rendering the findings outdated relative to all literature currently available. Furthermore, it was limited by the inclusion criteria (i.e., articles with effect sizes) of the original meta-analysis (Durlak et al., 2011). Although this was indeed the most appropriate unit of analysis for the original meta-analysis, secondary analyses reporting on moderation or subgroup findings may have been excluded from Rowe and Trickett’s study. Despite these limitations for our current purposes, these authors made an important contribution to our line of inquiry. They found that only 75 articles (64%) included in the meta-analysis reported on student race/ethnicity. They also discovered that, of the articles describing race, 56 articles used samples where the majority of participants were from one racial/ethnic group (61% majority White, 30% majority Black, 2% majority Native American/Alaska Native, and 2% Latino). Approximately two-thirds of the samples included three or more racial groups, rendering a muddy picture regarding to whom the results apply. In addition to calling for better reporting on sample representation, they note the lack of evidence available to support conclusions about generalizability to diverse groups.
Fourth, a review of twelve meta-analyses of SEL programs led by Durlak et al. (2022) examined the evidence of SEL’s influence on student adjustment and well-being and potential moderators of program effectiveness. These authors concluded that studies offered “limited clarity” about SEL programming effectiveness across racial/ethnic groups (p.777). Specifically, these authors found that 3 of the 12 reviews examined race/ethnicity moderation and that none found any evidence of significant moderation. As we discussed previously, it is unclear whether the underlying studies were designed to contribute to tests of moderation, and moreover, testing for moderation does not provide evidence that the programs benefit specific racial/ethnic groups of students. As a next step, close attention is needed to review the quality of underlying studies to determine what conclusions can be drawn from findings of nonsignificant interaction terms. As with other reviews, this review does not, in and of itself, determine or describe the benefit of SEL programs to specific racial and ethnic groups.
Finally, Cipriano et al. (2023) conducted a systematic review of 269 studies to investigate how race (and disability) are included and reported in SEL intervention studies (e.g., sample, outcome, moderator). With regard to disability, they found that 28% of articles reported on the disabilities of participants, but only 6% included disability characteristics as a moderator or covariate. With regard to race/ethnicity, these authors found that 81% of articles reported on race/ethnicity of participants, but only 28% included race/ethnicity as a covariate or moderator. These authors define moderating variables as those that “influence the level, direction, or presence of a relationship between independent and dependent variables” (p. 81); the authors did not specify whether moderation, as specified, was indicative of a formal test of an interaction term. Appropriate to their stated scope, Cipriano et al. do not report summary descriptives of races/ethnicities represented in the included articles, specify whether intervention by race interactions or subgroup analyses were conducted in existing studies, or describe the known results of SEL program trials for specific student racial groups. As with the other reviews that make the useful contribution of counting articles with particular design characteristics, and calling on researchers to do better, this in and of itself does not yet answer the substantive question of what we know about what works for whom in SEL.
Our review of the literature suggests little is currently known about the extent to which SEL interventions benefit participants from specific racial/ethnic groups. Each of these previous reviews represent critical additions to the literature, yet the results from these reviews do not shed light on which widely implemented SEL programs show evidence that they benefit Black, Latiné, or other racial/ethnic groups of color. In other words, existing research on SEL does not answer the question of for whom SEL programs work, having not previously attended to the question of to whom the evidence applies.
Present Study
The present literature review aims to evaluate the existing evidence base for generalizability of SEL interventions across diverse racial groups. The present work seeks to build upon prior reviews in several ways. First, we build upon the work of Garner et al. (2014) by using a common sampling frame, reviewing all articles reporting on interventions from the CASEL SELect list of evidence-based SEL programs for preschool and elementary-aged students, as this is the list most widely used by practitioners to inform their SEL program selection. We build upon the work of Rowe and Trickett by revisiting one of their research questions (to systematically review and summarize racial and ethnic representation in SEL outcome studies), with updates from the last 15 years of scholarship. We build on the work of Murano et al. (2020), Durlak et al. (2022), and Cipriano et al. (2023) by attempting to understand what evidence is available about what works for whom. Murano et al. (2020) looked at moderation at the level of the study, while Durlak et al. (2022) and Cipriano et al., (2023) looked for evidence of moderation at the student level, without making use of moderation analysis or a subgroup analysis to actually generate evidence of efficacy for specific racial and ethnic groups of color. This paper intends to identify such evidence; examining this evidence is the crux of this paper and a question left unanswered by previous reviews. To do so, we innovate from the stated methodologies from previous reviews; our unit of analysis is the trial, which is a more relevant unit of analysis for our research questions relative to examining articles (which does not account for the repeated use of the same sample) or interventions (which may use many different samples). Although our design ultimately only makes modest departures from dominant methodological approaches, the tenants of our review disrupt typical narratives (Boveda et al., 2023) and the pervasive assumption that existing CASEL SELect programs benefit all and shifts the burden of proof from a failure to detect differences to an imperative to generate evidence regarding the effectiveness of SEL programs. This review will be guided by the following research questions:
RQ1) What is the extent of racial and ethnic diversity of students in trials of SEL programs?
RQ2) How are race and ethnicity treated when analyzing their role in interventions?
RQ3) Which evidence-based SEL programs show evidence of effectiveness for students of color?
Methods
We utilized the CASEL review of elementary and preschool SEL interventions as the source of evidence-based SEL programs to include in our sampling frame (CASEL, 2013). The CASEL SELect list is widely used by schools and school districts choosing SEL programs. To be included on the CASEL SELect list, programs are required to be well-designed, classroom-based interventions that target SEL competencies, provide explicit SEL instruction and opportunities to practice skills, train implementers and support implementation, and demonstrate efficacy with at least one high-quality evaluation study that included a comparison group and had a pre- and posttest (CASEL, 2013). Studies in support of the intervention had to show “at least one carefully conducted evaluation that documents positive impacts on student behavior and/or academic performance” (CASEL, 2013, p. 7). The SELect list provides references for articles used to make the determination regarding the level of evidence (CASEL, 2013). The quality of the findings or the variety of outcomes reported on by intervention researchers will not be covered by this review; we rely on CASEL’s determination of what programs meet their criteria for being evidence-based. To answer our research questions, we depart from traditional systematic review methods in our sampling strategy and search procedures (Alexander, 2020). To make our questions “answerable” (Alexander, 2020) for a practitioner audience, we utilized the CASEL list of interventions rather than using systematic search strategies for SEL interventions, as other existing reviews of SEL interventions do. Our sample of interventions was predetermined, altering our search procedure as outlined next. Our innovative approach to this review is designed to address questions that traditional systematic reviews are not designed to answer by examining the quality and extent of evidence specifically shown to benefit students of color (Boveda et al., 2023).
Sample
The CASEL SELect list of interventions at the preschool and elementary school levels contains a total of 34 interventions. The CASEL SELect list was first published in 2013, with a supplement added in 2018 (CASEL, 2019), before becoming a dynamic list posted online. All interventions on the CASEL website as of September 2019 were included in a search for relevant studies about the intervention. All articles listed on the CASEL website supporting inclusion on the SELect list and their reference lists were screened for our review. Intervention titles, in quotations, were then used as key words (e.g., “Tools for Getting Along”) in a literature search conducted for each intervention using Google Scholar (i.e., 34 unique search terms were each entered exclusively and sequentially). Keywords searched for each intervention are reported on Table A in online supplementary materials. Google Scholar was used because its algorithm was best aligned with the goals of the study (e.g., prioritizing articles matching the intervention titles; mirroring a process accessible to practitioners). A supervised search was conducted by two trained graduate research assistants (nonauthors) from September 2020 through May 2021, the results of which were systematically checked by the fourth author. An additional search was completed by reviewing the reference lists of the identified articles (a process known as “citation chaining”). A resulting 755 articles were assessed for eligibility through full text review. Table A in online supplementary materials reports on counts of included and excluded articles per intervention. Columns correspond to categories typically seen in a PRISMA diagram.
Articles were included that were studies of tier 1 (universal) school-based interventions, conducted with children ages 3–11 (roughly preschool and elementary school) in the United States, published in English-language peer-reviewed journals, and reported child outcomes (i.e., studies exclusively about teachers or implementation outcomes were excluded) relative to a comparison or control group. Only peer-reviewed articles were included to enable our focus to be on our research questions rather than the overall quality of the study. Since race and ethnicity mean different things to different racial and ethnic groups across the globe, and our interest is in the dynamics of race and racism particular to the U.S. context, we elected to exclude papers generated through international trials from our analysis. Inclusion criteria were applied, and any disagreements in applying inclusion criteria were resolved through a consensus process where the first or fourth author reviewed the article, discussed, and came to a mutually agreed upon decision. The pattern of excluded articles is as follows: (1) Articles that were not a quantitative study of intervention effects relative to a comparison group were excluded (n = 207). Many articles about the included interventions reported on qualitative examinations of intervention characteristics (e.g. Horsch et al. [2002]), focused on implementation outcomes (e.g. Low et al. [2016]) or reported on methods using the same sample as other articles identified by the review. (2) Another group of studies did not report intervention effects on student-level outcomes (n = 165) but instead reported on outcomes at the teacher or parent level. For example, the intervention Incredible Years (IY) was originally developed as a parenting program (Pidano & Allen, 2015), and reports exclusive to parenting outcomes (e.g., Webster-Stratton et al., 2010) were excluded by this review—hence the large number of IY studies excluded. (3) Many interventions on the CASEL list have been tested with older students; articles that included no preschool or elementary school students in the sample were excluded (n = 28). (4) Many interventions have also been adapted to be implemented in various international contexts including PATHs, Incredible Years, and Second Step. Samples generated internationally were excluded (n = 46). For example, articles studying the Incredible Years in the Netherlands (Leijten et al., 2017) or PATHS in Croatia (Novak et al., 2017) were excluded. One intervention (Zippy’s friends, Mishara & Ystgaard, 2006) was eliminated because all articles described studies conducted with international samples. (5) CASEL allows gray literature when evaluating programs to be included on the SELect list. However, we elected to exclude articles that have not been peer reviewed (n = 30). The intervention Reading with Relevance was eliminated because we found no peer-reviewed articles on the intervention. (6) Articles that did not describe a tier 1 universal implementation of the intervention were excluded (n = 115) since we were interested in population-level implementations. Therefore, articles examining tier 2 and tier 3 populations were excluded (e.g., Graziano & Hart [2016] included only students with externalizing problem behaviors in their sample studying Peaceworks). Of the 755 articles originally identified, 164 articles on 32 programs met criteria to be included in this review.
Two trained and supervised graduate research assistants (nonauthors) read and highlighted each article to extract information about the racial representation of the sample and the strategies for analyzing race. In most cases, keywords were sufficient to identify the strategies (“group equivalence,” using race as a “control variable,” or “covariate,” “moderation,” “interaction”). In the case of four articles, where the analytic strategies used were less clear, we consulted with a statistical methodologist (third author) to ensure articles were correctly categorized. For example, one study (Nix et al., 2013) used multiple group modeling to test moderation of their logic model rather than using an interaction term to test for moderation.
The unit of analysis most relevant to research question one (i.e., what is the extent of racial and ethnic diversity of students in trials of SEL programs?) was the trial. The collection of 164 articles reported on 97 trials. Many interventions were tested with multiple trials (the same intervention tested in multiple samples), and some trials were described by numerous articles. We determined the trial by analyzing the sample descriptions. Trials generally included the same sample (or subsamples of the main sample). Table 1 reports the articles included for each intervention by trial, with sufficient detail to enable replication. Strategies for analyzing racial differences (RQ2) may be discussed in one or more articles per trial.
Review Approach
To determine the extent of racial and ethnic diversity of students in SEL trials (RQ1), articles were reviewed for the racial/ethnic groups represented in their sample, with results reported by trial. Details regarding the race/ethnicity of the sample were directly extracted from the text of the article. As may be expected, racial categories were described differently across studies, and sometimes race was reported as ethnicity. For consistency, and following the contours of commonly used U.S. Consensus characterizations (while acknowledging the downsides of doing so), we selected and used the following racial group names for our reporting: Black/African American, Hispanic/Latiné, Asian American/Pacific Islanders, Native American/Native Hawaiian, White, and other/multiracial groups. African American and Black were assumed to be the same racial group; European American, Caucasian, and White were considered to be the same; and Hispanic and Latino/Latinx/Latiné were assumed to refer to the same racial/ethnic group, as these groups were considered as a race in some studies and an ethnicity in others. Any deviations from this categorization are noted in the presentation of our findings. Nearly every article included an “other” category, which varied in the racial groups it included, and whether it was defined at all. For example, O’Neill et al. (2016) describe their sample as “54% White, 38% African American, and 8% of other or mixed ethnicity” in a study of 1,983 students. This trial, therefore, contributes 1,071 White students, 754 Black students, and 156 students categorized as “other/multiracial” to our findings. When race was reported separately for comparison and treatment groups, or for different locations within a trial, the total sample diversity was calculated (total number of students of each racial group across locations and conditions/total number of students). Some articles only reported racial group percentages, making a specific calculation of representation impossible; in these cases, the sample diversity was estimated by averaging percentages. For some longitudinal studies, sample sizes changed over time. When this was the case, sample information was reported from the article with the earliest publication date, unless more detailed information was available in a more recent article.
To determine how race and ethnicity were used in the analysis of interventions (R2), the methods and results sections of articles were reviewed. Each strategy that researchers used to evaluate the association between race and intervention effects was extracted from the reviewed articles. Strategies, in the case of this article, refer to the statistical analysis techniques used to account for the influence that race/ethnicity might have on outcomes. All articles were initially marked for the use of any analytic strategy that included race/ethnicity. Among articles using any strategy to examine race/ethnicity analytically, dummy codes were applied to document the presence or absence of four analytic strategies: (1) checking for group equivalence based on race/ethnicity, (2) including race as a covariate in analyses, (3) testing for moderation with interaction terms, and (4) performing subgroup analyses. For example, Ialongo et al. (2019) note that they “tested for interactions between intervention status and . . . ethnicity” in their trial of PATHs and thus would be coded affirmatively for moderation testing. Articles that included only one racial group were excluded from this process, as they would have no reason to use further strategies. Analytic strategies were not mutually exclusive, so a trial was coded for every strategy reported on in relevant articles.
To determine which evidence-based SEL programs have evidence of effectiveness for students of color (RQ3), we identified all programs buttressed by a trial with either (a) a sample of nearly all students of color (>75% one racial group) that demonstrated positive effects, or (b) a multiracial sample that conducted a subgroup analysis to demonstrate positive effects for students of color.
Results
Table 1 contains results from the literature review of CASEL SELect programs (CASEL, 2013, 2019). The number of trials and the total number of participants in each intervention is reported in Table 1, including the articles associated with each trial. In the following sections, results are organized by research question. Each begins with a summary of what was found before proceeding through the details.
Strategies for addressing race/ethnicity in SEL intervention trials
Research Question 1: What Is the Extent of Racial and Ethnic Diversity of Students in Trials of SEL Programs?
There is racial and ethnic diversity in trials of SEL programs, yet important gaps in racial and ethnic representation remain. A total of 112,905 students were included in 97 SEL trials, with racial/ethnic identifiers provided for 85,213 (75%) students from 76 (81%) trials. Figure 1, on the left side, reports the percentages of racial/ethnic group representation in trials of SEL programs. Each row represents a trial, ordered by the percentage of the sample identified as White. Figure 1, on the right side, shows the percent of racial/ethnic group representation in terms of the sample size on a log scale. The log scale allows for the difference in sample sizes to be seen without losing the ability to assess racial representation in the trials with small samples. White students are the largest racial group represented in SEL trials (31% among students with known races), less than their proportion in the U.S. population.

Racial and ethnic representation in SEL intervention trials.
Yet, among the entire sample, 18% (n = 20,136 from 18 trials) of students had an unknown race/ethnicity. There were many ways in which the 18 trials did not report on the race of specific students, which may have been appropriate for their purposes or what was possible under their circumstances. Some trials either did not report on race at all, some reported on the racial diversity of a related population (e.g., the diversity of the entire school system rather than the sample, the diversity of the geographic area in which the schools were located), and some provided a range of the racial backgrounds across their various school sites (e.g., “between 5% and 100% of students were minorities,” Rivers et al., 2013, p. 79). Within this group, 7% (n = 7,556 from 3 trials) of students were classified as non-White, with race not otherwise specified (e.g., reported as “minority”), which collectively included 69% of White and 31% “minority.” In Figure 1, articles that do not include information about race/ethnicity are noted in gray. In general, articles have improved over time in their reporting of the racial and ethnic representation of their samples.
Of the 79 trials where race was known, the racial composition of the cumulative sample was as follows: 35% (n = 30,106) White, 28% (n = 23,823) African American, 23% (n = 20,003) Hispanic or Latiné, 5% (n = 3,926) Asian American, 1% (n = 1,084) Native American or Native Hawaiian, and 4% (n = 3,551) “other or multiracial.” As evidenced by the orange bars on the left of Figure 1, 58% of trials (from among those where racial/ethnic composition is described) were conducted with White student pluralities (i.e., the largest group represented). Among only the trials where racial/ethnic composition is described (n = 79), 44% percent of trials were conducted with White majorities, and 57% of trials were conducted with students of color majorities.
Trials that are powered to detect an intervention by race interaction effect or conduct a subgroup analysis need to have large samples of racial subgroups represented. As previously noted, forty-five trials included more than 50% students of color. Within trials that included more than 50% students of color, Black/African Americans and Hispanic/Latiné had the highest representation. Seventeen trials included more than 50% African American students (12,048 students in these trials). Seven trials included more than 50% Latiné students (7,182 students in these trials).
No trials had a majority of Asian American/Pacific Islander, Native American/Native Hawaiian, or multiracial students. Asian American/Pacific Islander students are represented to some extent in at least 26 trials (they are often included in the “other” category) but often in very small numbers; they represented more than 20% of the sample in only four trials, reaching a maximum percent within any sample of 30% in one trial of Positive Action (Beets et al., 2009; Snyder et al., 2010, 2012, 2013; Washburn et al., 2011). The distinctions between Asian Americans of different ethnicities or countries of origin is rarely reported, despite there being differences in the experiences and academic and behavioral disparities between different Asian ethnic subgroups. One trial of PATHS reported 2% of the sample to be Filipino Americans (Greenberg et al., 1995), and one trial of Positive Action documented 5% of the sample to be Japanese students (Beets et al., 2009; Snyder et al., 2010, 2012, 2013; Washburn et al., 2011).
One intervention trial of Competent Kids, Caring Communities included 19 students categorized as Arabic (students were described as Arabic, but it is not clear whether they spoke Arabic or identified as Arab; Linares et al., 2005). Meanwhile, one trial of I Can Problem Solve reported having 9 students from various Middle-Eastern cultures (Ciancio et al., 2001). Moreover, one trial of the Open Circle Program specified 2% of the sample to be Caribbean students living in the United States (Liang et al., 2008; Taylor et al., 2002). One trial of The Incredible Years had a sample that included 8% African students living in the United States (Webster-Stratton et al., 2008).
Research Question 2: How Are Race and Ethnicity Treated When Analyzing Their Role in Interventions?
We found that race and ethnicity are analyzed in SEL trials in many ways, but they are infrequently analyzed in ways that provide evidence of efficacy for specific racial or ethnic groups. Among the 79 trials that described the race of their sample, the majority (n = 62; 78%) used an analysis strategy that could generate a claim relevant to students of color (see Table 1). The nature and strength of these claims, however, varied considerably. Twenty SEL intervention trials (26% of the trials that described student race) contained a sample of nearly all students of color (<10% White); of these, 12 were mostly one racial/ethnic group of color (>75%). No additional analysis strategy is needed to determine whether these trial results apply to a specific racialized group (to the extent that any sample is expected to generalize beyond the location and historical moment, and the sample was sufficiently large and diverse in regard to other, intersecting identity characteristics). Of trials containing a single racial group, the PATHs trial had the largest sample (n = 5,611; Ialongo et al., 2019), while the rest had smaller samples ranging from n = 123 to 435.
Multiracial Trials
Alternatively, some SEL intervention trials contained a meaningful mix of racial/ethnic groups in their sample. These are the trials that require additional analysis to understand to whom the results of this trial apply. There were multiple strategies that researchers utilized when examining race in their analyses. The majority of these strategies fall into five categories: (1) no articulated analytic use of race, (2) establishing equivalence of the racial composition of participants between the intervention and control/comparison groups; (3) including race as a covariate in analyses; (4) testing for moderation of the intervention effects by race; and (5) analyzing racial subgroups. Table 1 reports on the trials that use each strategy.
No Articulated Analytic Use of Race
Among the trials known to use multiracial samples (n = 67), 52% (n = 35) did not discuss student race beyond reporting demographics of their samples (i.e., did not use race in any analyses). Forty-eight percent of multiracial trials used at least one of the four strategies outlined below to incorporate race into their analysis.
Group Equivalence
Forty-six of the 67 multiracial trials (69%) reported that they considered race/ethnicity when examining the equivalence of the comparison and intervention groups. It is possible that many more trials checked for group equivalence but did not report doing so in their paper.
Covariates
Thirty-eight of the 67 multiracial trials (56%) included race as a covariate in analyses.
Moderation
Thirteen of the 67 multiracial trials (19%), studying 12 interventions, tested whether an interaction between race and intervention condition was a significant predictor of outcomes. For our specific purposes, the tests of moderation identified within these 13 trials can be characterized as (a) missing information, (b) conducting school and not student-level analysis, (c) having small sample sizes, or (d) capable of revealing differences. Each is described in turn below.
Missing information
Four trials produced studies that stated that race/ethnicity interactions were tested, but they did not provide enough information to interpret their results. First, in an article describing a trial of I Can Problem Solve, authors reported that they tested moderation by race/ethnicity in their sample of 248, but the authors did not report results (Finlon et al., 2015). Second, in a trial of the Caring School Community that included 5,331 students who identified as White (49%), African American (20%), Hispanic (23%), or Asian (8%), the authors reported:
There was no consistent pattern of interactions between program status and these background factors [including race/ethnicity], and, overall, no more interactions than would be expected by chance. Thus, there was no indication that children with different background characteristics responded differently to the CDP program. (Solomon et al., 2000, p. 25)
Third, the trial of Michigan Model for Health reported testing race-by-condition interactions in a sample of 2,512 students who identify as White (54%), African American (38%), or other (8%), but the study did not report the results of these models (O’Neill et al., 2010). To accurately test differential effectiveness, a time × condition × ethnicity interaction would need to be tested. Fourth, a trial of the Resolving Conflict Creatively program included 11,160 students who identified as 40% African American, 41% Hispanic, 15% White, or 5% other. These authors reported that with regard to race-by-condition tests, “significant interaction effects were few and weak and lacked a discernable pattern,” details of which are said to be available from the first author (Aber et al., 2003, p. 341).
School-level moderation
Two trials examined race/ethnicity interactions at the school level (Flay et al., 2003; Low et al., 2015). While school-level analyses of race/ethnicity moderation are helpful to examine how school-wide demographics may influence intervention effectiveness, they do not provide information relevant to our purpose: the benefit of the intervention to specific student groups. For one trial of Positive Action, stronger effects were found in schools with higher proportions of African American students (Flay et al., 2003). However, based on school-level analyses, we do not know if African American students received these benefits to similar or larger degrees than their White counterparts. It is possible from what is reported that White students received most of the benefit, thus lowering violence incidents on average, but that disparities increased. A trial of Second Step that included 7,300 students who identified as 44% Caucasian, 25% Latiné, 12% Asian, 7% African American, 3% Native American, 1% Native Hawaiian or Pacific Islander, 7% more than one race, and 17% unknown also reported testing moderation at the school-level (percent White) and found no evidence of moderation (Low et al., 2015).
Small sample sizes
Two trials produced studies (DiPerna et al., 2016; Linares et al., 2005) that examined interactions with sample sizes that would generate highly uncertain estimates of race/ethnicity-by-condition interactions. Stated in terms of power, the studies lacked .8 power to detect differences in intervention effects equal to or smaller than what Cohen categorized as a medium effect size (i.e., d = .5). First, in a trial of the Competent Kids, Caring Communities program, including 119 students who identified as White (37%), Hispanic (19%), Asian (19%), or Arab (16%), Linares et al. (2005) tested for race/ethnicity moderation and reported no statistically significant differences in intervention effects. Second, in a trial of the Social Skill Improvement System of 432 students (73% White, 18% Black/African American, 2% Asian, 5% Hispanic or Latiné, and 2% other race), race-by-intervention interaction terms were tested and, again, none were found to be significant (DiPerna et al., 2016). Given the small sample sizes and uncertainty of estimates of interaction effects, the lack of statistically significant interactions in these studies does not provide strong support for the absence of meaningful differences in intervention effects by race or ethnicity or clear information on the generalizability of intervention effects across racial/ethnic subgroups.
Capable of revealing differences
Only five trials, of four programs, tested interaction terms with samples that were seemingly large enough to detect differences in intervention effects by racial/ethnic groups that were, to use Cohen’s (1992) suggested categorization, medium or small in size. In a trial of 4Rs with 1,184 students, 46% Hispanic/Latiné and 41% African American, the authors tested race × intervention interaction terms and found no significant differences in intervention effectiveness between groups (Jones et al., 2011). Two PATHS trials tested for race/ethnicity moderation. The Nix et al. (2013) study of 356 students who identified as Latiné (17%), African American (25%), or other (58%) examined race/ethnicity differences in pathways of logic model testing. These authors found that literary skills were associated with learning engagement for White students, while emotional understanding was linked to learning engagement for African American students. Other pathways tested revealed no significant differences. In a trial of PATHS with a sample of 5,611 students that was 90% African American, Ialongo et al. (2019) found no significant intervention-by-race/ethnicity interactions. In a trial of the Incredible Years that included 1,817 students, 76% African American and 22% White, interaction terms were tested and found to be nonsignificant (Reinke et al., 2018). A trial for Tools for Getting Along tested race moderation in a sample of 2,131 students, 30% African American and 70% White or other, and found no significant differences (Barnes et al., 2016). Although these tests of intervention × race interaction effects reveal the (largely, lack of) differential effects by race, they do not in and of themselves, provide evidence of efficacy for any specific racial or ethnic subgroups.
Subgroup Analysis
Two trials reported that they conducted subgroup analyses. The trial of Too Good for Violence Program analyzed intervention effects separately for White, Hispanic, and African American students and found that all groups had improved scores at post-test than at pre-test (n = 999) (Hall & Bacon, 2005). However, this test did not report on improvements in the intervention condition compared to the control group across race/ethnicity, nor did it account for randomization occurring at the school level. Therefore, we only found one study that examined subgroup differences in intervention effect by race (Bavarian et al., 2013), which will be more fully described in the next section.
Research Question 3: Which Evidence-Based SEL Programs Show Benefit for Students of Color?
We found some evidence to suggest that seven CASEL SELect interventions benefit Black students, one CASEL SELect intervention benefits Black boys, and four CASEL SELect interventions benefit Hispanic/Latiné students. We found no evidence that CASEL SELect interventions benefit other racial/ethnic groups of color. We arrived at these results by analyzing two sets of evidence that could be used to understand which evidence-based SEL programs show benefit for students of color: (1) studies that show positive effects on a sample of one racial group of color, or (2) studies in racially/ethnically diverse samples that show positive intervention effects among subgroups. Evidence from mono-racial (>75%) studies demonstrates the efficacy of Tools of the Mind, I Can Problem Solve, Second Step, and Mind Up for Hispanic/Latiné students, as well as High Scope, INSIGHTS into Children’s Temperament, I Can Problem Solve, PATHS, Ready to Learn, Incredible Years, and Second Step for Black students. One trial of Merrell’s Strong Kids in a sample of 39 Black female students found no significant improvements on teacher rated SEL assets (Ryan et al., 2016). As previously noted, only one intervention was studied through subgroup analyses and reported on results. This study of Positive Action showed evidence of an effect for Black boys. In this study of Positive Action (n = 510), researchers examined subgroup differences for Black boys and girls and White boys and girls on 14 outcomes. The intervention effect on reading was larger for Black boys than other groups, such that the intervention effects were not significant for other groups, and no other significant interaction effects were detected (Bavarian et al., 2013). In sum, 10 of the 34 CASEL SELect Programs (29%) have some evidence of efficacy specific to students of color in the United States.
Discussion
In the present study, we found that there is some evidence that REM students benefit from SEL programs. Of the 97 trials of 32 reviewed CASEL SELect interventions, 7 report some evidence published in the peer-reviewed literature that the interventions benefit Black students, and 4 report some evidence of beneficial effects for Hispanic/Latiné students. However, there is no evidence to suggest that existing SEL interventions on the CASEL list specifically benefit students from other REM groups. One additional trial provides evidence that the intervention benefits African American boys on one of the 14 outcomes studied. To summarize, evidence of SEL efficacy for specific REM groups is lacking in the current literature at this time, but it is largely due to the omission of inquiry rather than observations of null or iatrogenic effects. This is an opportunity for more SEL researchers to thoughtfully engage in this inquiry.
Racial and Ethnic Representation
Compared to previous research, this review found that studies of SEL interventions describe the race/ethnicity of their samples to a greater extent and include more REM students in the samples. The proportion of articles describing the racial/ethnic composition of students in the Rowe and Trickett (2018) sample was 64%, while Cipriano et al. (2023) found that 81% of articles reported on student race/ethnicity. Despite using a different sampling strategy and method for our review, similar to Cipriano et al., we found that 81% of trials described the racial/ethnic composition of students.
Rowe and Trickett (2018) reported that 61% of articles had majority White samples. In comparison, we found that 43% of trials that reported race/ethnicity had majority White samples, while 53% of all trials either did not report race or had majority White samples. In our review of CASEL SELect intervention trials, we found that overall 35% of participants were identified as White, 28% African American, 23% Hispanic or Latiné, 5% Asian American, 1% Native American or Native Hawaiian, and 4% as “other or multiracial.” No prior reviews reported overall racial representation of the entire body of SEL trials in the way that we have. Intervention researchers should continue and accelerate the trend toward increasing representation in SEL trials, testing the efficacy of SEL programs in trials that include diverse REM students.
Next Steps: Beyond Representation
Including a specific racialized group in an SEL trial is insufficient for assuming findings are generalizable to a specific racialized group. Increasing REM representation within White-dominant initiatives can serve to legitimate and sustain White dominant frames. For this reason among others, representation in itself is not enough. Race must be analyzed in SEL trials in order to generate evidence of efficacy for REM groups.
This review found that the most common use of race in the analysis of data from SEL trials was as a covariate. By treating race/ethnicity uncritically as a control variable, we risk overlooking the unique experiences that youth of color have in schools. At a minimum, researchers should think more deeply about why they are partitioning variance related to race (and subsequently racism) out of their models, and explicitly acknowledge, in writing, the ways that racism may explain their results. Given the history of mistreatment and exploitation of communities of color in research, which has led communities of color to mistrust research (Bajaj & Stanford, 2021; Jaiswal, 2019), intervention researchers have a well-earned obligation to do more than partition variance from their models but to also provide evidence of efficacy for specific REM groups.
Therefore, we should use analytic strategies to understand to whom the evidence applies. Not doing so is a common error of omission (Shapiro et al., 2024); many of the interventions included in the present study likely had the data available but either did not test or did not report findings on the extent to which these programs benefit students from specific racialized groups. We recognize what-works-for-who aims may not have been among the articulated aims of the original trials, but this review may serve as a catalyst to prioritize these research aims in future SEL intervention research.
Specifically, we found that only 8% of reviewed SEL trials tested race/ethnicity interactions. More precisely, only 19% of multiracial trials tested race/ethnicity interactions, where moderation testing is most warranted. Rowe & Trickett found that 15% of articles tested for race/ethnicity moderation, and half of the articles that reported moderation (4 of 8) found significantly different effects of the intervention by race/ethnicity. This indicates limited progress in testing for race/ethnicity interactions over time.
Cipriano et al. found that 28% of articles included race/ethnicity as a covariate or moderator and did not report how many found significantly different effects of the intervention by race/ethnicity. This paper extends upon previous reviews by also examining the quality of moderation tests and reporting on the findings. We found that many reports of moderation testing were either missing information, not relevant to the purpose of generating evidence of efficacy for specific racialized groups, or had high uncertainty and low power. Of the 13 trials that reported on race by condition moderation, we believe five were sufficiently powered to detect medium effect sizes. Of the five trials, only one trial (20%) found one significantly different logic model pathway of the intervention by race/ethnicity. This evidence is too sparse to draw any conclusions as to whether CASEL SELect programs, in general, do or do not have differential effects by race and ethnicity. But knowledge of differential effects, without further analysis and interpretation, would be insufficient for informing us as to whom the evidence of efficacy applies anyway. Providing analyses for subgroups of students is the strongest evidence of efficacy for specific racialized groups in multiracial trials. Unfortunately, we only found one subgroup analysis in the literature we reviewed.
Our findings contribute to the literature expressing concerns about the color evasiveness of SEL intervention research. While there has been improvement in describing the race/ethnicity of student samples and the overall racial and ethnic representation in SEL intervention trials, our findings suggest that the field of SEL intervention research needs to now more deeply consider how we treat race/ethnicity in intervention analyses. Intervention researchers should report enough information about racial representation and the full results from subgroup and interaction tests to allow for a future meta-analysis to analyze SEL intervention effects for specific racialized groups.
Overcoming Race Evasion in SEL Research and Practice
Results from the present study underscore the lack of substantive engagement with race and racism in SEL intervention research. The stakes of this omission are high, as there is potential for SEL programs to re-entrench existing disparities and perhaps even cause harm when the reality of racism is not explicitly considered (Camangian & Cariaga, 2022; Frohlich & Potvin, 2008; Kaler-Jones, 2020; Simmons, 2021). Similarly, there are risks when race is incorporated into SEL intervention research in uncritical ways, such as using White racial frames to guide and interpret the analysis. When intervention researchers do conduct subgroup analyses and test interactions, they should (1) consider the aforementioned tensions around assigning and using socially constructed racial categories, (2) think critically about what racial frames are embedded in their work and how they shape evidentiary claims, (3) grapple with and strengthen the ways in which their work engages with the realities of racism in process and in substance, and (4) foreground students who experience racism as part of efforts to dislodge dominant racial frames.
Even as we make suggestions for what can and should happen in regard to race and racism in SEL intervention research, we acknowledge that we are discussing and treating race as something that is verifiable that needs to be “attended to” and “validated” by researchers. Yet, young people—and the skin color with which they live—do not exist because the truth-seeking researchers acknowledge them. Further, their experiences do not become any more real after truth-seeking researchers analyze and report on them. Their existence and experiences are not confined by their racial/ethnic identity but rather the racism that has refused their place in reality. While we take up our own suggestion to grapple with the hard truths of the racial frames that researchers have co-constructed, we also aim to synthesize and reflect upon the existing SEL intervention literature, relying in the present review on dominant notions of race and the use of racial categories therewithin. In this work, we have tried to balance these complexities in ways that provide useful information for practice and illuminate the next steps for research.
As the evidence from SEL intervention research grows and additional programs build on the knowledge of this body of research, our standards of evidence should go beyond race and ethnic representation and a failure to detect differences in intervention effects to an imperative to generate evidence regarding REM groups. To inform SEL program selection, scholars should provide information about the representativeness of the study sample and any available evidence as to the extent that the intervention benefits students from specific racial and ethnic groups. Our field should carefully consider the types of evidence explored in the present study and the extent to which each can be used to demonstrate within-group effects. Curated lists of programs, like the CASEL SELect list, may play a critical role in providing accountability for consensus evidentiary standards. CASEL, or other list curators, could identify and promote SEL programs that have the strongest evidence that they benefit students who experience racism and students from REM groups—or at minimum—examine the quality of evidence underlying universality claims and transparently report on the state of evidence for REM groups. We should also go beyond the traditions of post-positivist frameworks for research in determining what works for whom and consider a role for an “equity-focused analysis” of SEL programs and research that considers structural racism and avoids deficit narratives of difference (see Boyd et al., 2023). These changes would help the field move from considering the generalizability of SEL programs to considering the social and emotional conditions and systemic solutions required for educational equity. Yet, until society has evolved such that racism is no longer the dominant driving force of inequity in our education system, we will need to continue using racialized categories, with a critical acknowledgment of their limitations and harms, to understand and seek to remedy the impact of racism on research and practice.
Limitations
There are a number of limitations that should be considered when interpreting the results of this review. The sample of articles includes interventions on the CASEL SELect programs as of September 2019, and many SEL programs have been added since this time. Our review method of using Google Scholar facilitated locating relevant articles, but the possibility of Google changing its algorithm at any time may limit the replicability of the search. This review captures articles from a limited period of time (through May 2021), and the SEL intervention literature is growing quickly and in positive directions. These new interventions include exciting innovations and racial analysis strategies that are responsive to many of the pre-existing critiques but are beyond the scope of this study. For example, using multiple group-modeling techniques, which are better powered to detect subgroup effects and can account for the intersectionality of identities, may be an improved analytic strategy for conducting subgroup analysis (Lee et al., 2023). We have used Cohen’s criteria for effect sizes, although others may find other metrics more useful (Kraft, 2020). Further, given the divergence of our method from traditional systematic review methods, we did not pre-register our review with any outside organization.
The findings represented in this review are limited by what was reported within the intervention trials. We report on racial categories offered or assigned by researchers that flatten within-group diversity, and we are unable to report which primary studies may have approached racialization differently. We then further flattened identity for the sake of our aggregation, in ways that uphold colonial systems of racialization. It is possible that analyses we recommend have been conducted but were not reported on or are included in reports that were excluded by our criteria. We have reported on all of our analytic choices in summarizing prior studies with publicly available data (see note for Figure 1), but future studies may want to adjust our data-handling decisions. Many intervention developers are engaging in research on the interventions they developed or receiving a salary predicated on the success of a program publisher. They face financial or reputation disincentives for exploring whether their interventions, presumed to work for all students, may not work for some groups. This could consciously or unconsciously constrain inquiries and drive a publication bias, which is unassessed in this review. Our decision to exclude gray literature that was not peer-reviewed may have also unintentionally excluded articles without resources to meet peer-review criteria but that prioritized REM groups.
Future Directions
To move beyond mere representation, many have called for SEL program content to center the cultural wealth, assets, and existing funds of knowledge of REM students, but, to date, few programs have done so (Ramirez et al., 2021). Omitting these elements may hinder, rather than help, the social and emotional development of REM students. But in order to test and, if warranted, mainstream such programs, we cannot ignore racism. By avoiding talking about, grappling with, and interrogating the realities of racism, we run the risk of “willfully ignor[ing] the experiences of people of color” and failing to confront the consequences of racism in schools and society (Annamma et al., 2017, p. 157).
We echo the calls to advance strengths-based, asset-filled SEL that cultivates critical consciousness (Heberle et al., 2020), interrupts harms, and transforms the system of education. As a first step, educational leaders can utilize the results of this paper to select interventions with knowledge about which programs have evidence of effectiveness that applies to their student population. For example, if results demonstrate that a particular SEL program only has evidence of effectiveness for White students, a school that enrolls mostly REM students may elect not to adopt that program until further evidence is provided. Additionally, while our review focused exclusively on outcome studies, future studies should explore the extent to which including cultural and sociopolitical elements to a greater extent in SEL interventions leads to improved outcomes for specific or all racialized groups—or by improving outcomes for some—provides greater benefits to the overall population. We cannot adequately explore this until we improve upon our current approaches for asking and answering questions about what works for whom in SEL intervention research.
Conclusions and Implications
SEL programs have shown evidence that they improve a variety of outcomes including promoting academic success and behavioral health and reducing behavioral problems, violence, aggression, and bullying (Durlak et al., 2011; Taylor et al., 2017). While having SEL programs that are efficacious for students overall is important, it is also critical that SEL programs more deeply consider what impact they have on REM groups. SEL trials must be examined with greater scrutiny to understand not only what works for whom but also to whom the results of the trial apply. With regard to external validity, Rothwell, speaking of doctors interpreting results from RCTs, suggests:
We cannot expect the results of RCTs and systematic reviews to be relevant to all patients and all settings (that is not what is meant by external validity) but they should at least be designed and reported in a way that allows clinicians to judge to whom they can reasonably be applied. (Rothwell, 2005, p. 83)
It is promising that there are SEL programs with evidence that they benefit Black and Hispanic/Latiné students. These findings are important for educational leaders considering implementing SEL programs in racially and ethnically diverse schools and districts. The majority of SEL trials did not, however, consider the role of race (and subsequently racism) in their analyses. While Black and Hispanic/Latiné are the largest groups of color in the United States, there was no evidence of SEL program benefit for other racial/ethnic groups of color. While progress is being made with more REM being included in SEL intervention trials, the field still has a long way to go to understand how SEL programs support the academic, social, and emotional needs of REM students.
Supplemental Material
sj-docx-1-rer-10.3102_00346543241310184 – Supplemental material for To Whom Do These Results Apply? Assessing Evidence for the Generalizability of Social and Emotional Learning Programs Among Specific Racial and Ethnic Groups
Supplemental material, sj-docx-1-rer-10.3102_00346543241310184 for To Whom Do These Results Apply? Assessing Evidence for the Generalizability of Social and Emotional Learning Programs Among Specific Racial and Ethnic Groups by Tiffany M. Jones, Bo-Kyung Elizabeth Kim, Charles B. Fleming, Jie Deng, Addison Duane, Amelia R. Gavin and Valerie B. Shapiro in Review of Educational Research
Supplemental Material
sj-pdf-1-rer-10.3102_00346543241310184 – Supplemental material for To Whom Do These Results Apply? Assessing Evidence for the Generalizability of Social and Emotional Learning Programs Among Specific Racial and Ethnic Groups
Supplemental material, sj-pdf-1-rer-10.3102_00346543241310184 for To Whom Do These Results Apply? Assessing Evidence for the Generalizability of Social and Emotional Learning Programs Among Specific Racial and Ethnic Groups by Tiffany M. Jones, Bo-Kyung Elizabeth Kim, Charles B. Fleming, Jie Deng, Addison Duane, Amelia R. Gavin and Valerie B. Shapiro in Review of Educational Research
Footnotes
Acknowledgements
We would like to thank an anonymous reviewer for encouraging and supporting us in more deeply integrating an analysis of race in our manuscript. We would also like to thank research assistants Tess Halac and Kaylee Becker, who assisted in the protocol development and analysis of articles, as well as Megan Mitchell, Esmeralda Michel, and Erika Hanson for their support in coordinating this project. Shapiro is funded by the William T. Grant Scholars Program and nurtured by their network of scholars striving to reduce inequality and promote the use of research evidence. Statements in this paper do not necessarily represent the views of these individuals or organizations.
Correction (April 2025):
Article updated to include an additional reference in the Reference List.
Author Note
Our objective in writing this review is not to disgrace the empirical foundations of our field but to honor and extend them by summarizing what is currently known about the effects of SEL for specific racial and ethnic groups. One reviewer in the peer-review process suggested that this review feels more pointed than others who have similarly stated objectives (e.g., Cipriano et al., 2023). We hope that the framing of our questions in ways that center students of color, the specificity of our review, and the concreteness of our claims facilitate the next steps for a field looking to further SEL intervention research while also providing current information of benefit to communities of color.
Although we, the authors of the present study, did not contribute to any SEL trials that met our search criteria, we wish to model intellectual humility in recognizing our prior work is also fallible in ways that are similar to the work of our colleagues that were included in this review. We seek continuous improvement in our forward-looking contributions to the field.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Notes
Authors
TIFFANY M. JONES is an associate professor at Colorado State University. Her program of research aims to bridge the divide between research and practice to improve youth outcomes and promote racial justice. She serves as the corresponding author of this study.
BO-KYUNG ELIZABETH KIM is an associate professor at the University of California, Los Angeles. Her research focuses on informing service systems and evidence-based practice strategies as alternatives to youth incarceration to reduce mental, emotional, and behavioral health inequity experienced by youth of color. Her work has been supported by the City of Los Angeles, National Institute of Justice, National Institute of Health, and the Greater Los Angeles Foundation.
CHARLES B. FLEMING is a research scientist in the Department of Psychiatry and Behavioral Sciences at the University of Washington. His areas of research include substance use and related problems in adolescents and young adults, trials of preventive and treatment intervention, and policy and program implementation. Most of his research has involved the analysis of quantitative data, including multilevel modeling and a variety of approaches to examining longitudinal data.
JIE DENG is a PhD student and graduate research assistant at Colorado State University. Her research interests focus on improving mental health outcomes in racial and culturally diverse communities through evidence-based practices.
ADDISON DUANE is an assistant professor at the California State University, Sacramento. As a former elementary school teacher, her community-based and participatory scholarship centers and amplifies the brilliance of children—and communities—of color to support equitable school transformation.
AMELIA R. GAVIN is a professor at the School of Social Work at the University of Washington. She has over 20 years of experience conducting epidemiological and clinical research studies. The primary objective of her research is to eliminate and/or reduce health and mental health disparities to improve population health and mental health outcomes.
VALERIE B. SHAPIRO is an associate professor at UC Berkeley. She is the scientific director and special project advisor for CalHOPE Student Support. She is a William T. Grant Foundation Scholar and the national chair of the Coalition for the Promotion of Behavioral Health (CPBH). In these roles, she builds and studies the infrastructure required for effective prevention practice in schools to promote youth well-being and equity.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
