Abstract
Inequitable school discipline practices, such as school suspensions, pose a serious threat to children’s development. Prior research has found robust links between children’s externalizing behaviors and school suspensions as well as disproportionalities in suspensions by race/ethnicity. This study extends these literatures using the National Center for Education Statistics’ Early Childhood Longitudinal Study, Kindergarten Class of 1998–99 to examine whether links between children’s externalizing ratings from kindergarten through fifth grade and school suspensions measured in eighth grade differ for children based on race/ethnicity. The analytic sample includes about 9,500 American Indigenous/Alaska Native (2%), Asian (6%), Black (11%), Latinx (18%), Native Hawaiian/Pacific Islander (1%), and white (62%) children. Results showed that ratings of children’s externalizing behaviors are more strongly associated with suspensions for Black children than for Asian, Latinx, and white children, which confirms and extends our knowledge of inequitable discipline practices, thereby reinforcing a need for antiracist solutions.
Keywords
Introduction
Punitive discipline policies such as school suspensions are a predominant response to an array of student behaviors, including both serious infractions, such as physical fights, and normative developmental behaviors, such as defiance or not following class rules (Lacoe & Steinberg, 2019). There is evidence that suspensions increase children’s likelihood of having worse grades and test scores, dropping out of school, and becoming involved in the juvenile carceral system (Del Toro & Wang, 2022; Pearman et al., 2019; Rosenbaum, 2020; Skiba et al., 2014). In the 2017–18 school year, >2.6 million U.S. public school students received one or more in-school suspensions, and >2.5 million received one or more out-of-school suspensions (Civil Rights Data Collection, 2021). There is robust evidence highlighting the disproportionate effects of school suspensions on Black students in particular, including overwhelming evidence that Black children are more likely to be suspended than white 1 children for the same behavioral infractions (Amemiya et al., 2020; Anderson & Ritter, 2017; Barnes & Motz, 2018; Gilliam et al., 2016; Girvan et al., 2017; McCarthy & Hoge, 1987; Okonofua & Eberhardt, 2015; Owens & McLanahan, 2020; Skiba, 2015; Skiba et al., 2002; Welsh & Little, 2018). In the 2017–18 school year, Black students made up ~15% of the public school population but reflected >38% of all out-of-school suspensions, whereas white students made up >47% of the population and reflected <33% of all out-of-school suspensions (Civil Rights Data Collection, 2021). Similarly, in the 2017–18 school year, Asian students made up ~5% and Latinx students ~27% of the public school population but reflected ~1% and 22% of all out-of-school suspensions, respectively (Civil Rights Data Collection, 2021).
Externalizing problems are defined as patterns of disruptive behaviors such as being argumentative, destructive, disobedient, physically aggressive, and demanding (Miner & Clarke-Stewart, 2008). Children with high externalizing ratings in elementary school (roughly ages 5–11 years) are more likely to receive in- and out-of-school suspensions in elementary and middle school (Lane, Oakes, Cantwell, Common, et al., 2019; Lane, Oakes, Cantwell, Royer, et al., 2019). For example, Lane, Oakes, Cantwell, Common, et al. (2019) assessed >4,000 elementary school students and found that those rated at high risk for externalizing problems received 3.27 more in-school suspensions than students rated at low risk for externalizing problems, and those assessed at moderate risk received 2.18 more suspensions than low-risk students. Proponents of school suspensions typically argue that children’s behaviors warrant exclusionary discipline practices and, in turn, argue that racial disproportionalities in suspensions reflect racial disproportionalities in behavior (e.g., Wright et al., 2014). Yet, there is robust evidence that Black children are disproportionately suspended for the same behavioral infractions as white children (e.g., Okonofua & Eberhardt, 2015; Owens & McLanahan, 2020; Skiba, 2015; Welsh & Little, 2018). Thus, robust associations between externalizing ratings and school suspensions warrant interrogating whether Black children with the same externalizing ratings as white children are also disproportionately suspended.
This study builds on prior work to examine whether race/ethnicity moderates associations between teachers’ ratings of externalizing behaviors in kindergarten through fifth grade and caregiver-reported number of school suspensions measured in eighth grade using the Early Childhood Longitudinal Study, Kindergarten Class of 1998–99 (ECLS-K). The more that is understood about the factors that relate to disproportionalities in school suspensions, the better equipped scholars, policymakers, and school staff will be to develop equitable antiracist solutions to eliminating them.
Conceptual Framework
Critical Race Theory grounds decades of work explicating how systemic racism perpetuates and is reinforced through inequitable discipline practices in schools (e.g., Anyon et al., 2018, 2021; Bonilla-Silva & Baoicchi, 2008; Holland, 2008; Simson, 2013). Systemic racism refers to the sociopolitical, economic, and historical oppression of racially/ethnically minoritized populations, particularly Black and Indigenous populations, concurrent with the supremacy of white populations embedded in the systems and institutions constituting the United States (Feagin, 2006). Research has suggested that racism and stereotypic beliefs stemming from systemic racism lead school teachers, administrators, resource officers, and other staff to punish Black children more severely for the same behaviors as white children (Barnes & Motz, 2018; McCarthy & Hoge, 1987; Owens & McLanahan, 2020; Skiba, 2015; Skiba et al., 2002; Welsh & Little, 2018). For example, teachers reported that they tended to expect challenging behaviors more from Black children (Gilliam et al., 2016) and were more likely to evaluate infractions by Black students as indicative of a pattern of behaviors, whereas they evaluated infractions by white students as one-time offenses (Okonofua & Eberhardt, 2015). Additionally, racial disproportionalities in discipline are often more salient in examinations of minor infractions and behaviors that are more subject to teachers’ interpretations than major infractions. For example, Black students are more likely than white students to be suspended for subjective acts such as defiance (Amemiya et al., 2020; Girvan et al., 2017; Holt et al., 2022). Relatedly, there is evidence that teachers use different linguistic indicators when describing student behavior in office referrals for discipline depending on the race/ethnicity of the student receiving the referral (Markowitz et al., 2023).
Racial biases also likely influence the evaluation and appraisal of students’ externalizing problems. Indeed, although limited, there is evidence of racial bias when comparing teachers’ ratings of externalizing problems against other standard measures of actual behaviors or infractions (Mason et al., 2014; Talbott et al., 2018). Research often fails to acknowledge that the concept of externalizing, and what does and does not qualify as an externalizing problem, is grounded in normative, deficit-based thinking stemming from white supremacy and systemic racism (Fenwick, 2016; Toldson, 2019). Quantitative Critical Race Theory (QuantCrit) highlights the harms of framing inherently subjective quantitative measures as objective, particularly by failing to acknowledge the social construction of race and how systemic racism biases measures commonly treated as objective in research (Castillo & Gillborn, 2022; Fenwick, 2016; Holland, 2008; Zuberi, 2001; Zuberi & Bonilla-Silva, 2008). Hence, this article refers to externalizing ratings as opposed to externalizing problems to emphasize that the independent variable is not an objective diagnosis but a subjective characterization that is susceptible to the sociocultural experiences of the person determining the quantitative rating.
Racial bias in externalizing ratings and discipline practices is compounded by the fact that most U.S. teachers are white (Taie & Goldring, 2020). For example, in 1998 (the year the study sample entered kindergarten), 85% of U.S. public school teachers in the United States were white, and this percentage has not decreased much in the following decades (U.S. Department of Education, 2016, 2020). This is especially noteworthy when considering evidence that students of color with a larger share of teachers of color are less likely to be referred for discipline, expelled, or suspended (Lindsay & Hart, 2017; Liu et al., 2023).
Expanding on Prior Studies Linking Externalizing to Suspensions Using the ECLS-K
In the first widely disseminated study examining associations between externalizing ratings and school suspensions in the ECLS-K 1998–99 cohort (the same data used in this study), Wright et al. (2014) argued that after controlling for teacher reports of externalizing and other problem behaviors in kindergarten through third grade, there were no significant differences in white versus Black children’s suspensions in eighth grade. In other words, Wright et al. (2014) argued that variance in children’s problem behaviors explained disproportionate suspensions between Black and white children. During the first Trump administration, the U.S. Department of Education used the study by Wright et al. (2014) as a basis for removing guidelines aimed to limit school suspensions, thereby warranting increased scrutiny of the study’s methodologic flaws (Huang, 2020). Huang (2020) subsequently examined the same research questions with the same ECLS-K data but employed more rigorous model specifications such as (a) using multiple imputation to reduce selection bias, (b) including behavior ratings through fifth grade (as opposed to third grade) to improve proximal predictiveness to the eighth-grade outcomes, and (c) focusing on externalizing behavior ratings specifically (as opposed to all problem behavior ratings), which are hypothesized to be more strongly associated with disciplinary outcomes. Huang (2020) demonstrated that there were still significant differences between Black and white students’ rates of suspension, even when controlling for externalizing ratings, thereby raising fundamental questions about the claim by Wright et al. (2014) that problem behaviors explain the variance in disproportionate suspensions for Black and white children. Although the ECLS-K data have already been analyzed to advance our understanding of associations between externalizing ratings and school suspensions, this study extends those prior studies in two important ways by (a) examining race/ethnicity as a moderator and (b) expanding the sample beyond just Black and white students to also include American Indigenous/Alaska Native, Asian, Latinx, and Native Hawaiian/Pacific Islander children.
Examining Race/Ethnicity as a Moderator
Prior ECLS-K studies (Huang, 2020; Wright et al., 2014) demonstrated that children with higher externalizing ratings from kindergarten through fifth grade have been suspended more on average by eighth grade and that Black children are suspended more on average than white children. However, it is unclear whether higher externalizing ratings disproportionately relate to more suspensions for Black children. Thus, this current study uses the same data to provide novel insight into whether Black students with higher externalizing ratings are at heightened risk for suspensions by eighth grade compared with students with similarly high externalizing ratings from other racial/ethnic backgrounds. In other words, moderation analyses examine whether associations between externalizing ratings and suspensions differ for children depending on their racial/ethnic background.
One reason this question of moderation is important is because, as noted previously and as highlighted by QuantCrit, externalizing ratings are subject to racial bias (Fenwick, 2016; Mason et al., 2014; Talbott et al., 2018; Toldson, 2019). Therefore, prior studies’ findings of associations between higher externalizing ratings and more suspensions (Huang, 2020; Wright et al., 2014) may be a reflection of the fact that Black children are more likely to be rated higher on externalizing measures. In particular, the initial argument by Wright et al. (2014) that racial/ethnic variability in suspensions was explained by variability in problem behaviors explicitly ignored the racism embedded in the subjective problem behavior ratings. Although Huang’s (2020) analyses found that racial disproportionalities in suspensions persisted even after controlling for externalizing ratings, the analyses do not indicate how racial differences in externalizing ratings may have mitigated or exacerbated those disproportionalities. Thus, it is critical to look beyond direct associations between externalizing ratings and suspensions to examine whether race/ethnicity moderates those associations.
Including a Racially/Ethnically Diverse Sample
Most studies examining disproportionalities in school suspensions, including those using the ECLS-K, focus exclusively on Black and white students (e.g., Amemiya et al., 2020; Barnes & Motz, 2018; Girvan et al., 2017; Huang, 2020; Morgan et al., 2019; Okonofua & Eberhardt, 2015; Owens & McLanahan, 2020; Wright et al., 2014). Yet, there are also disproportionalities in suspensions wherein American Indigenous/Alaska Native, Latinx, and Native Hawaiian/Pacific Islander children receive more suspensions than white and Asian children on average and Black children receive more suspensions on average than all other racial/ethnic groups (Civil Rights Data Collection, 2021; de Brey et al., 2019; Losen & Skiba, 2010; Nguyen et al., 2019; Sullivan et al., 2013; Wallace et al., 2008). Additionally, there is some evidence that American Indigenous, Latinx, and Pacific Islander students are disproportionately disciplined for the same behaviors as white students (Gregory et al., 2010; Nguyen et al., 2019; Skiba et al., 2011). There is also ample evidence that all racially/ethnically minoritized children experience racism and stereotyping in schools (e.g., Johnston-Goodstar & VeLure Roholt, 2017; Nguyen et al., 2019; Torres et al., 2022). Hence, the same processes leading to disproportionalities in school suspensions between Black and white children may be relevant for other minoritized groups as well. Consequently, this study extended to a larger population of students that may be experiencing the harms of disproportionalities in school discipline and a population that more fully reflects the racial/ethnic composition of students in the United States.
This Study
The research aim of this study was to examine whether children’s race/ethnicity moderated the association between children’s teacher-reported externalizing problem behavior ratings from kindergarten through fifth grade and children’s caregiver-reported school suspensions measured in eighth grade. Based on prior studies, we hypothesized that higher externalizing ratings in kindergarten through fifth grade would relate to a greater number of school suspensions among Black students compared with their white peers. In line with trends of disproportionalities in suspensions, we also hypothesized that other minoritized students—namely American Indigenous/Alaska Native, Latinx, and Native Hawaiian/Pacific Islander students—with higher externalizing ratings may be at greater risk for suspensions compared with white students with higher externalizing ratings.
Methods
Participants and Analytic Sample
This study used the Early Childhood Longitudinal Study, Kindergarten Class of 1998–99 (ECLS-K), conducted by the U.S. Department of Education’s National Center for Education Statistics (NCES, 2009). The ECLS-K is a nationally representative sample of >21,200 children in the United States followed each year from kindergarten entry (1998) through the spring of eighth grade (2007). The ECLS-K collected data assessing educational, socioemotional, and sociodemographic information from children, their caregivers, and their teachers. This study’s analytic sample included ~9,500 children from the ECLS-K who were American Indigenous/Alaska Native, Asian, Black, Latinx, Native Hawaiian/Pacific Islander, or white and were retained in the study through eighth grade. About 200 children who met the retention criteria were excluded from analyses because they had no information about their race/ethnicity or were biracial or multiracial. All other children in the ECLS-K were excluded from analyses because they did not meet the retention criteria. Table 1 includes descriptive information on the analytic sample.
Descriptive Information on the Analytic Sample.
Note. Descriptive statistics reflect imputed data.
Source: U.S. Department of Education, National Center for Education Statistics, Early Childhood Longitudinal Study Kindergarten Class of 1998–99.
Measures
Race/Ethnicity
Race/ethnicity was measured using a series of dichotomous variables including American Indigenous/Alaska Native, Asian, Black, Latinx, Native Hawaiian/Pacific Islander, and white based on caregiver reports of children’s race/ethnicity. It is important to note that our study refers to these distinct categories of race/ethnicity in terms of socially constructed differences and that these categories may or may not reflect children’s perceptions of their racial/ethnic identities (Buchanan et al., 2021; Castillo & Gillborn, 2022; Feagin, 2006; Lett et al., 2022; Noroña-Zhou & Bush, 2021; Smedley & Smedley, 2005).
Externalizing Behavior Ratings
Externalizing behavior ratings were measured in kindergarten and first, third, and fifth grades using the Teacher Social Rating Scale, which is based on the Social Skills Rating Scale (Gresham & Elliott, 1990). The items for the externalizing rating variable included whether the child argues, fights, gets angry, acts impulsively, disturbs ongoing activities, and talks during quiet study time (the latter is only included in third and fifth grades). These items were measured on a scale from 1, “Student never exhibits this behavior,” to 4, “Student exhibits this behavior most of the time.” Split half reliability values ranged from .86 to .90 (Pollack et al., 2005). Each child has a mean externalizing score reflecting teachers’ responses in kindergarten through fifth grade. By using a mean composite rating from kindergarten through fifth grade, analyses placed less weight on a specific teacher’s assessment in a given year. Thus, these composites are a more robust indicator of teachers’ ratings for an individual child over the course of schooling. Figure 1 shows statistical differences, as measured by t tests, in kindergarten through fifth grade composite externalizing ratings by race/ethnicity.

Mean externalizing behavior ratings by race/ethnicity.
School Suspensions
The dependent measure of in- and out- of school suspensions came from caregiver responses to the following items asked in the spring of eighth grade in 2007:
Has [CHILD] ever had an in- or out-of-school suspension?
How many times was [CHILD] suspended?
This measured the number of in- and out-of-school suspensions the child experienced ranging from 0 (for those who answered no to the first question) to 5 (with 5 indicating five or more). We chose to use the continuous measure of in- and out-of-school suspensions (as opposed to just using the information from the first question for a yes/no dichotomous measure) based on evidence that racial disparities in suspensions are more than twice as large for students who have been suspended more than once compared to students who have been suspended once (Okonofua & Eberhardt, 2015). Thus, using a dichotomous measure cannot capture variability in the disproportionalities of being suspended more than once. Figure 2 shows statistical differences, as measured by t tests, in mean number of caregiver-reported school suspensions by race/ethnicity. Importantly, 82% of the total sample had no reported suspensions. This rate differed by race/ethnicity, with 74% of American Indigenous/Alaska Native, 87% of Asian, 62% of Black, 80% of Latinx, 74% of Native Hawaiian/Pacific Islander, and 85% of white children having never been suspended. The suspension variable was natural log transformed for analyses to account for the high concentration of zeroes (the means in Figure 2 do not reflect the log transformation).

Mean number of suspensions by race/ethnicity.
Covariates
The models controlled for several covariates that tended to be correlated with externalizing behavior ratings and/or suspensions. Socioeconomic status was measured using a composite created by the ECLS-K that combines standardized measures of caregivers’ highest level of education, income, and occupational prestige scores. Children’s sex assigned at birth was measured with a dichotomous male/female (reference) variable. The models also controlled for caregivers being married (not stably married is the reference), household size (number of children under 18 years of age and adults over 18 years of age in the household), whether the child has an individualized education program, and a standardized math and reading score composite. Similar to the externalizing behavior ratings variable, the control variables reflect the overall means from the spring of kindergarten and first, third, and fifth grades. The study used mean composite variables to address the fact that the control variables are not necessarily stable over time. Although mean composite measures do not capture the full range of variability in children’s contexts, they do provide a more representative depiction than a one-time measure.
Analytic Approach
Three multilevel regression analyses using the xtmixed command in Stata 17.0 were estimated to test the research aim examining how race/ethnicity moderated associations between externalizing ratings and school suspensions. Models included a random intercept for children’s baseline school to adjust for children being nested in schools. Moderation was tested through interactions between each child’s dichotomous race/ethnicity variable and their mean externalizing behavior rating. White children were the largest racial/ethnic subgroup in the sample and therefore served as the reference group for the regression models. Analyses relied on post hoc pairwise comparisons using Sidak adjusted p values to examine significant differences between other racial/ethnic subgroups. Model 1 examined associations between externalizing ratings and school suspensions controlling for children’s race/ethnicity. Model 2 replicated model 1 and added the interactions between children’s race/ethnicity and externalizing ratings. Finally, model 3 replicated model 2 and added the full set of covariates.
The analytic sample had complete data for 86.7% of every variable in the analyses. Analyses adjusted for missing data by imputing 20 datasets with chained equations (ICE) in Stata (Royston, 2005). To be consistent with prior studies (Huang, 2020; Wright et al., 2014), we present the unweighted results. However, as a sensitivity analysis, we ran all models with baseline child weights to confirm that our results were not sensitive to the inclusion of weights.
Results
Results of the multilevel regression analyses are presented in Table 2 and Figure 3. Model 1 (Table 2), which only controls for race/ethnicity, suggested that a one-unit change in externalizing ratings is associated with a 0.21 log unit change in suspensions (p <.001), and the fully adjusted model 3 (Table 2), which controls for all interactions and covariates, suggested that a one-unit change in externalizing ratings is associated with a 0.18 log unit change in suspensions (p <.001). In other words, a mean externalizing rating (1.62 on a rating scale of 1 to 4) is associated with 0.12 suspensions, and an externalizing rating of one standard deviation (SD) above the mean (a rating of 2.10) is associated with 0.21 suspensions on average. The fully adjusted results (model 3 in Table 2) also confirmed that race/ethnicity moderates the association between teachers’ externalizing ratings from kindergarten through fifth grade and caregiver-reported school suspensions as of eighth grade. This is illustrated in Figure 3, which shows the mean number of suspensions for each rating on the externalizing scale by race/ethnicity. On average, higher externalizing ratings were related to significantly more suspensions for Black children compared with Asian (p <.001), Latinx (p <.001), and white (p <.001) children. For example, an externalizing rating one SD above the mean related to an average of 0.55 suspensions for Black children compared with an average of 0.21, 0.11, and 0.04 suspensions for white, Latinx, and Asian children, respectively (Figure 3). In addition, being in the top 10% of externalizing ratings (a rating of 2.64) was related to 0.81 suspensions for Black children on average compared with 0.33, 0.20, and 0.10 suspensions on average for white, Latinx, and Asian children, respectively (Figure 3). For the effect sizes above, externalizing ratings were less related to suspensions for Asian (p =.068) and Latinx (p =.041) children compared with white children. There were no significant differences between the associations of externalizing and suspensions for American Indigenous/Alaska Native and Native Hawaiian/Pacific Islander children compared with any other racial/ethnic subgroup.
Multilevel Regression Results of the Moderating Role of Race/Ethnicity on Associations Between Externalizing Behavior Ratings and School Suspensions.
Note. Letters indicate that the interaction coefficient is significantly (p < .05) different from the interaction coefficient for the (a) American Indigenous/Alaska Native, (b) Asian, (c) Black, (d) Latinx, and (e) Native Hawaiian/Pacific Islander samples. Source: U.S. Department of Education, National Center for Education Statistics, Early Childhood Longitudinal Study Kindergarten Class of 1998–99.
p < .1; *p < .05; **p < .01; ***p < .001.

Multilevel regression results of mean suspensions (untransformed) per mean externalizing behavior ratings by race/ethnicity.
Discussion
This study examined whether associations between teacher ratings of externalizing behaviors from kindergarten through fifth grade and caregiver reports of school suspensions as of eighth grade differed for children of different racial/ethnic backgrounds. We found support for our general hypothesis that race/ethnicity is a moderator of associations between externalizing ratings and suspensions. Our study confirmed our hypothesis that externalizing ratings related to more suspensions for Black children than for white children. This extension of Huang’s (2020) findings is critical because it highlights that, not only does variability in externalizing ratings not fully explain racial disproportionalities in school suspensions but, there is evidence to suggest that Black children were disproportionately suspended even when compared with white children who had the same externalizing ratings. In other words, it is not just the case that teachers are perceiving Black children as exhibiting more externalizing behaviors and therefore children with higher perceived externalizing behaviors receive more suspensions; there is evidence that even when teachers perceive white children to be exhibiting higher externalizing behaviors, white children do not receive the same number of suspensions as Black children. The strength of the association between externalizing ratings and suspensions also was significantly greater for Black children than for Asian and Latinx children. In other words, our study supports that externalizing ratings were more related to suspensions for Black children even when compared with other racially/ethnically minoritized children. This finding supports the robustness and distinctness of anti-Black racism (Dumas & ross, 2016) and illustrates the importance of expanding investigations of inequitable discipline beyond exclusively Black and white samples.
Contrary to one of our more exploratory hypotheses, we found that externalizing ratings were less associated with suspensions for Latinx children than for white children. Although results did find that Latinx children received significantly more suspensions on average than white children (Figure 2), including when controlling for externalizing ratings (Table 2, model 1), the fact that the interaction was significant in the opposite direction from what we expected certainly introduces a question for future research. This unexpected finding is aligned with the generally mixed findings on discipline trends within Latinx populations, wherein Latinx students are sometimes suspended at higher, lower, or the same rates as white children for the same behaviors (Gopalan & Nelson, 2019; Gregory et al., 2010; Skiba et al., 2011). One reason for mixed findings across the literature may relate to the heterogeneity of the Latinx population. In other words, general trends for Latinx children may obscure countervailing trends for different Latinx subpopulations, such as Afro-Latinx children (Aceves et al., 2022). Additional research exploring school discipline trends with Latinx students is especially important given that as of 2021, 28% of all public school-aged children in the United States were Latinx (U.S. Government Accountability Office [GAO], 2022).
Furthermore, it is important to emphasize that our study examined teacher ratings of externalizing behaviors rather than measures of specific behaviors and infractions (e.g., rating a child from 1 to 4 on how often they fight rather than a specific incident of a child being in a fight). Hence, the independent variable likely already incorporates teachers’ racial biases because there is evidence to suggest that teachers disproportionately rate Black children as exhibiting externalizing behaviors (Miner & Clarke-Stewart, 2008). This also aligns with the aforementioned QuantCrit theory wherein research fails to highlight the subjectivity of quantitative measures, such as externalizing ratings, that are inherently biased and often deficit based (Castillo & Gillborn, 2022; Fenwick, 2016; Holland, 2008; Zuberi, 2001). Therefore, our results of disproportionalities in associations between externalizing ratings and school suspensions may underestimate the degree to which racism underpins inequitable discipline practices.
This assumption also may depend on teachers’ race/ethnicity because there is evidence that Black and Latinx students in particular may be less likely to face exclusionary discipline when they have teachers of the same race/ethnicity (Gershenson et al., 2021; Lindsay & Hart, 2017; Ouazad, 2014; Redding, 2019). In this study’s sample, teachers were about 87% white, 7% Black, 3% Latinx, 1% American Indigenous/Alaska Native, 1% Asian, and <1% Native Hawaiian/Pacific Islander, so we did not have sufficient variability to explore differences in moderation along dimensions of teacher race/ethnicity. As mentioned previously, in 1998 (the year the ECLS-K cohort entered kindergarten), 85% of public school teachers in the United States were white, and as of 2018, 79% of teachers in the United States were white (U.S. Department of Education, 2016, 2020). Hence, as the teaching workforce becomes slightly more racially/ethnically diverse, there are increasing opportunities to examine associations of interactions between teachers’ race/ethnicity, teachers’ ratings of students’ externalizing behaviors, and students’ race/ethnicity on school discipline outcomes.
Importantly, even if teachers’ externalizing ratings were unbiased, developmental science has provided ample evidence that behaviors typically classified as externalizing problems often stem from contextual factors, such as family stressors and income volatility (Campbell et al., 2000; López-Romero et al., 2015; Marcynyszyn et al., 2008; Miller et al., 2021). Thus, our study’s findings reinforce a responsibility for schools to ensure that children’s behaviors, which are often a developmentally appropriate reaction to stressors, are not disproportionately punished in a manner that leads to an increased likelihood for negative developmental consequences, such as worse academic outcomes and involvement with the carceral system (Del Toro & Wang, 2022; Pearman et al., 2019; Rosenbaum, 2020; Skiba et al., 2014).
Alternatives to school suspensions including schoolwide positive behavioral interventions and supports, ecological classroom management, restorative justice, and empathic mindset interventions for teachers are increasingly common (Nese & McIntosh, 2016; Okonofua et al., 2022; Osher et al., 2010; Welsh & Little, 2018). These alternatives generally employ a more holistic approach that emphasizes reinforcing positive behaviors/strengths and prioritizes empathy for children’s perspectives and circumstances that may be underlying children’s behaviors. However, alternative discipline practices are often implemented as race neutral despite evidence of better implementation fidelity when racial and cultural consciousness and specific school contexts are considered (Borman et al., 2021; Gregory et al., 2018, 2021; Skiba, 2015; Welsh & Little, 2018). In fact, there are still persistent disproportionalities in discipline concurrent with the rise in school discipline policies and practices aimed at reducing school suspensions (Ritter, 2018; Welsh & Little, 2018). Thus, as alternative solutions to punitive and exclusionary discipline grow in popularity, grounding these solutions in antiracist pedagogy likely will help avoid inadvertently reinforcing racial oppression, something that is likely to occur when systemic racism underpins larger societal disproportionalities (Anyon et al., 2021; Curenton et al., 2022; Escayg, 2020; Feagin, 2006; Freire, 2000; Gregory et al., 2021; Kishimoto, 2018; Riddle & Sinclair, 2019; Ward, 2012; Zuberi, 2001; Zuberi & Bonilla-Silva, 2008).
Generalizability
The full ECLS-K is a nationally representative sample of children in the United States from 1998 to 2007. However, this study’s analytic sample only included children who remained in the study through eighth grade, so our study is not nationally representative. Prior studies examining the ECLS-K 1998–99 cohort confirmed that Black children with the highest problem behavior ratings in earlier grades were significantly less likely to remain in the study through eighth grade (Wright et al., 2014). Thus, because the analytic sample did not account for attrition, our results may be more conservative than they would be with a fully nationally representative sample.
Furthermore, because these data were collected between 1998 and 2007, they do not necessarily generalize to children in later years. Sociopolitical, cultural, and demographic shifts that occurred in the United States since 2007 may yield different findings today. For example, the demographic composition of elementary and middle school students in the United States today is significantly more racially/ethnically heterogeneous (U.S. GAO, 2022). Further, policies related to school discipline in public education have evolved in myriad ways, including a decline in exclusionary discipline and zero-tolerance policies, which are both linked to school suspensions (Ritter, 2018). Although, as noted earlier, disproportionalities in school suspensions are persistent despite these changes (Ritter, 2018; Welsh & Little, 2018), it also should be noted that there is a more recent cohort of the ECLS-K with data from 2010 through 2016; however, that study was only carried through fifth grade and did not include any explicit items about school suspensions, so it cannot be used to answer our study’s particular questions.
Finally, this study omitted biracial, multiracial, and other racial/ethnic subgroups of students due to small sample sizes. Additionally, as with any large-scale research, the racial/ethnic categories that were included were oversimplified and did not capture important heterogeneity within racial/ethnic subgroups (Buchanan et al., 2021; Rowley & Camacho, 2015). However, we were able to disaggregate Asian and Native Hawaiian/Pacific Islander children. The disparate findings between these two groups are consistent with prior research and emphasize the importance of examining more refined research within Asian American Pacific Islander populations that also should be extended to other racial/ethnic groups (Nguyen et al., 2019).
Limitations and Future Directions
Examining differences by race/ethnicity, especially oversimplified categories of race/ethnicity, falls short of examining the specific systemic processes giving rise to racial/ethnic differences in suspensions (Buchanan et al., 2021). Thus, future studies examining questions about disproportionalities should look beyond simply analyzing students’ race/ethnicity and include specific questions about racism (Rowley & Camacho, 2015). Another limitation of this study is its use of kindergarten through fifth grade composite externalizing rating measures. Evidence is clear that behaviors typically defined as externalizing problems are not always stable from kindergarten through fifth grade but rather that there are distinct trajectories such as stably low externalizing ratings or average transitioning to high externalizing ratings (Campbell et al., 2006; López-Romero et al., 2015; Miller & Votruba-Drzal, 2017; Miner & Clarke-Stewart, 2008; Shaw et al., 2003). Due to sample size and variability constraints, this study could not operationalize externalizing ratings as discrete trajectories. Although mean composite ratings of longitudinal data are certainly superior to measures at just one time point, analyzing this study’s questions with measures of externalizing ratings trajectories is an important future direction.
Additionally, this study would benefit from a more rigorous and longitudinal measure of school suspensions. Measuring caregivers’ reports of their children’s suspensions is a less reliable approach than receiving actual suspension data from schools. There may be bias in caregivers’ reports of suspensions due to a plethora of factors, including hesitancy to report sensitive information about their children. Relatedly, future studies could extend beyond teacher ratings of externalizing to include more reporters with different degrees of bias. For example, unlike teachers, caregivers are more likely to report higher externalizing behaviors in white children than in Black children (Miner & Clarke-Stewart, 2008).
Furthermore, an important future direction of this work is to employ an intersectionality framework (Crenshaw, 1989) to consider how salient factors of children’s identities beyond race/ethnicity are associated with the study’s research questions (Santos & Toomey, 2018). For example, there is a plethora of evidence that Black girls are at the most heightened risk for negative outcomes following school suspensions (Carter Andrews et al., 2019; Cooper et al., 2022; Harris, 2021; Hines-Datiri & Carter Andrews, 2020; Skiba, 2015). We did not test three-way interactions with sex assigned at birth in our study because this would require parsing variance further than the sample sizes of most of the racial/ethnic subgroups were statistically powered for. Thus, future work should more closely examine the intersectional role of sex/gender in particular. Finally, future research should examine the roles of more contextual factors on associations between externalizing ratings and suspensions. For example, these trends likely differ depending on larger contexts stemming from systemic racism, such as school segregation (e.g., Chin, 2021; Curenton et al., 2022).
Implications
Although this study relies on data from 1998 through 2007, it builds on two prior studies (Huang, 2020; Wright et al., 2014) that have been collectively cited more than 240 times. As long as these studies continue to be read and referenced, our study provides an important contribution for how to critically contextualize these prior articles in addition to expanding their scope with further analyses and findings. Our study confirms prior findings that Black children are disproportionately suspended and extends prior findings by demonstrating that higher teacher ratings of externalizing are related to more suspensions for Black children than Asian, Latinx, and white children. In other words, our study suggests that Black children whom teachers identify in elementary school as frequently engaging in behaviors such as arguing, getting angry, acting impulsively, and disturbing ongoing activities are more likely to be tracked into trajectories that lead to school suspensions by eighth grade than their peers of other racial/ethnic backgrounds who are rated similarly by elementary school teachers on externalizing.
As noted previously, disproportionalities in school discipline have a cascading impact on children’s development, such as leading to worse academic outcomes and increased involvement with the carceral system (Del Toro & Wang, 2022; Pearman et al., 2019; Rosenbaum, 2020; Skiba et al., 2014). In particular, the disproportionate representation of Black populations in the carceral system that stems from the disproportionate representation of Black children in school suspensions and expulsions is referred to as the school-to-prison pipeline (Skiba et al., 2014; Wald & Losen, 2003). Scholars have argued that this pipeline begins as early as preschool, and research has found that eliminating the racial discipline gap in middle and high school would reduce racial disparities in the carceral system by ~16% (Barnes & Motz, 2018; Rashid, 2009; Rosenbaum, 2020). Thus, addressing inequities in school discipline practices to ensure that children with higher externalizing ratings are not at increased risk for suspension because of their race/ethnicity is imperative for positive and equitable child development.
Footnotes
Acknowledgements
The authors thank Daniel Shaw for his feedback on early versions of this work.
Authorship Contribution Statement
Lorraine Blatt: conceptualization, formal analysis, methodology, project administration, visualization, writing—original draft, writing—review and editing. Daniesha Hunter-Rue: writing—review and editing. Elizabeth Votruba-Drzal: funding acquisition, methodology, resources, supervision, writing—review and editing.
Declaration of Conflicting Interests
The authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Open Practices
This study analyzed the restricted version of the Early Childhood Longitudinal Study, Kindergarten Class of 1998–99 (ECLS-K), which is available via a restricted-use data license from the National Center for Education Statistics. More information on the data, including a public-use version of the dataset, is available at
. All data-analysis code is available on request, conditional on permission from the National Center for Education Statistics per our restricted-use data license agreement.
