Abstract
Minority students are suspended at a disproportionately higher rate compared with others. To reduce racial suspension gaps, four California school districts banned schools from suspending students for willful defiance, a category consisting of relatively minor disruptive offenses. I evaluate the impact of these policies on high school student discipline outcomes using a difference-in-differences strategy that exploits the temporal variation in the enactment of these policies across school districts. The results suggest that while these policies decreased willful defiance out-of-school suspension rates by around 69%, they did not reduce overall out-of-school suspension rates. In fact, the policies significantly increased out-of-school suspension rates among Black students, particularly in schools with a small share of Black teachers. Taken together, the results suggest that the willful defiance suspension bans failed to address implicit and explicit biases in California schools.
Out-of-school suspension (OSS) is one of the most commonly used discipline actions in U.S. schools: over the past 20 years, about 5% of students received at least one OSS each year (de Brey et al., 2019). While there may be justifiable grounds for excluding disruptive students from the classroom, such as protecting other students’ safety and learning, a large proportion of OSSs are issued to students committing minor infractions that pose little to no direct threat to their classmates. This misapplication of OSS deprives students of educational opportunities and more generally harms the school learning environment by creating shared stress (Pena-Shaff et al., 2019).
Suspension is associated with poor academic performance, school dropout, crime, and delinquency (Anderson et al., 2019; Bacher-Hicks et al., 2019; Cuellar & Markowitz, 2015; Wald & Losen, 2003). In addition, the overuse of suspension may exacerbate disparities in learning outcomes among students from different demographic and socioeconomic groups (Gopalan & Nelson, 2019; Gregory et al., 2010; Pearman et al., 2019). Black students are at least twice as likely to be suspended as White students (Losen & Gillespie, 2012) and are suspended for longer periods, even when involved in the same type of incidents (Barrett et al., 2019). This suggests that levels of discretion and potential biases in the use of suspension for minor disorder infractions could disproportionately harm Black students, thus widening existing racial achievement gaps.
Several theories examine the impact of suspension on student outcomes both inside and outside of the classroom. Positive intervention and restorative justice (RJ) theories argue that suspension could lead to unintended long-term consequences, such as deterioration of the school atmosphere, entrenched antisocial behavior, and increases in instances of misconduct (Gonzalez, 2012; Mowen et al., 2020; Pesta, 2018; Sugai & Horner, 2002; Way, 2011). Empirical studies on the incapacitation effect of school suggest that excluding misbehaved students from schools increases their risk of engaging in criminal activities (Cuellar & Markowitz, 2015; Jacob & Lefgren, 2003; Luallen, 2006). Although the deterrence theory posits that stricter discipline can induce students to comply with rules and authorities (Kinsler, 2013; Nagin, 2013), considerable evidence has shown that the use of a less punitive discipline approach increases students’ respect for teachers, reduces students’ infractions, and improves school climate (Bradshaw et al., 2009; Bradshaw et al., 2010; Okonofua & Eberhardt, 2015).
Over the past two decades, concerns about the negative consequences of punishment and existing racial discipline gaps, especially suspension gaps between Black and White students, have led to many discipline reforms (Education Commission of the States, 2020), most of which focused on reducing suspensions for minor infractions. For example, New York City Middle Schools (Craig & Martin, 2019), the School District of Philadelphia (Lacoe & Steinberg, 2018), and Chicago Public Schools (Stevens et al., 2015) have either replaced OSS with in-school interventions for insubordination and disruptive behavior or reduced the length of OSS for all infractions. However, existing research suggests that these suspension reforms have little impact on student academic and discipline outcomes (e.g., Craig & Martin, 2019; Lacoe & Steinberg, 2018; Steinberg & Lacoe, 2018; Stevens et al., 2015).
This article studies a specific kind of suspension reform implemented by four California school districts, which banned schools from issuing OSS for willful defiance. Specifically, it asks whether schools in these districts complied with these willful defiance suspension bans (WDBs) and estimates their impact on student discipline outcomes.
The California Department of Education defines willful defiance as behavior that “disrupted school activities or otherwise willfully defied the valid authority of supervisors, teachers, administrators, school officials, or other school personnel engaged in the performance of their duties.” In practice, however, teachers interpret willful defiance differently, applying this generic label to suspend students for infractions ranging from eyerolling or backtalking to sleeping during class. Consequently, willful defiance accounted for more than 40% of OSSs before the implementation of WDBs. During the same period, OSS rates for Black students reached 15.4%, compared with 5.6% for White students (California Department of Education, n.d.).
Acknowledging OSS being overused for willful defiance and minority students being suspended for higher rates than students of other races, four school districts in California (San Francisco Unified School District [SFUSD], Pasadena Unified School District [PUSD], Azusa Unified School District [AUSD], and Oakland Unified School District [OUSD]) explicitly eliminated willful defiance as a reason for suspending K–12 students out of schools after the 2014–2015 school year. 1 While their approach mimicked previous reforms in Philadelphia and New York, the potential impact was greater. The California WDBs affected about 40% of total OSSs, compared with 15% to 25% in Philadelphia and New York reforms (Craig & Martin, 2019; Lacoe & Steinberg, 2018; Steinberg & Lacoe, 2018).
Although high schools are most likely to utilize OSS, most previous studies have focused on elementary and/or middle schools (Anderson et al., 2019; Gopalan & Nelson, 2019; Lacoe & Steinberg, 2018; Steinberg & Lacoe, 2018; Stevens et al., 2015). Therefore, this study extends this body of literature by estimating the impacts of school discipline reform in high schools. Using publicly available school-level data from the California Department of Education, I start by examining how WDBs affect student OSS rates and how this effect varies by student race to establish the extent of racial barriers and inequality in the California school system. 2 Then, I study the impact of WDBs on the prevalence of suspension by distinguishing between “willful defiance OSS” and “nonwillful defiance OSS.” To account for the influence of unobserved confounding factors, I exploit the temporal variation in the implementation of WDBs across school districts, using a difference-in-differences (DD) estimation strategy.
Several studies have considered the impact of suspension reforms on student discipline outcomes. Most studies find that these reforms were not correlated with decreases in OSS rates (Baker-Smith, 2018; Lacoe & Steinberg, 2018; Steinberg & Lacoe, 2018; Stevens et al., 2015), although a study on a suspension reform in Los Angeles showed evidence of reduced OSS rates (Hashim et al., 2018). However, these findings should be interpreted with caution for several reasons. First, most of these studies used research designs that could not account for the potentially nonrandom adoption of suspension reforms. Second, some only rely on data from the reformed school district, raising concerns over external validity (Baker-Smith, 2018; Hashim et al., 2018; Stevens et al., 2015). A related strand of research has focused on the impact of suspension reforms on student performance, with mixed findings (e.g., Craig & Martin, 2019; Steinberg & Lacoe, 2018). Compared with those studies, the current analysis of WDBs draws from a far greater number of reformed and unreformed schools, resulting in a much larger sample size. Furthermore, this study examines three specific explanations that may contribute to the negative effect of WDBs on student discipline outcomes: substitution of suspension, lack of same-race teachers, and negative changes in student behaviors.
My findings suggest that WDBs reduced willful defiance OSS rates by around 69%. However, these WDBs did not reduce overall OSS rates: schools simply changed the reasons given when suspending students. More important, Black students were disproportionately affected by this shift, and hence by WDBs. Supplemental analysis using data from the Youth Risk Behavior Surveillance System (YRBSS) shows that the increases in OSS cannot be attributed to more student infractions. Taken together, the findings suggest that WDBs failed to address biases against Black students in California schools.
The remainder of the article is organized as follows. The Policy Background section provides background on WDBs in California. Then, the Method section introduces the data and empirical strategy. The following sections present the main results and results from the supplemental analysis. The final section concludes with a summary of the findings and policy implications.
Policy Background
In 2011, a voluntary agreement between LAUSD and the U.S. Department of Education banned willful defiance as a reason for suspension district-wide (Aron, 2013; Blume, 2012; Hashim et al., 2018). Three years later, the California State Legislature introduced a statewide WDB for Grades K–3 (A.B. 420) in response to the prevalent and disproportionate use of willful defiance OSS among Black and Hispanic students. 3 More recently, California extended the WDB to cover students through Grade 8 by July 2025, but high school students will remain subject to willful defiance OSS. 4
In 2014, SFUSD became the first to implement a full WDB, applied to students in all grades. Prior to this move, SFUSD primarily relied on RJ and school-wide positive behavior intervention and support programs to combat high suspension rates. Yet, these programs failed to address the district’s concerns over the disproportionate suspensions of Black students. As a result, starting from the 2014–2015 school year, SFUSD banned willful defiance OSS to support previous efforts (SFUSD, 2014).
Later, PUSD and OUSD enacted full WDBs to support their existing suspension reduction programs. In the spring of 2015, seeing that positive behavior intervention and support programs (PBIS) failed to reduce willful defiance OSSs, PUSD extended the state’s WDB to all K–12 students. OUSD, in response to a 2012 resolution with the Department of Education to promote equitable discipline practices via specific goals, also implemented WDB in the 2016–2017 school year (U.S. Department of Education, 2012). Instead of revising discipline regulations, AUSD informally adopted a WDB in its 2015 annual accountability plan, with the goal to eliminate willful defiance suspension by SY 2016–2017 (AUSD, n.d.).
Notably, the WDBs in all four school districts were followed by alternative programs such as RJ and PBIS, designed to be implemented simultaneously and complementarily (Riestenberg, 2015). According to the California PBIS Coalition, the number of schools adopting PBIS increased from around 500 to more than 3,000 (about 33% of all traditional K–12 schools) from SY 2011–2012 to SY 2018–2019 (California PBIS, n.d.), indicating that some unreformed schools also implement RJ and PBIS programs. While the expansion of RJ and PBIS programs usually requires additional funding, only SFUSD and OUSD among the reformed districts explicitly agreed to provide it (Frey, 2013, 2015).
WDBs prohibited schools from suspending students out of school for willful defiance, requiring that offenders receive class suspension or in-school suspension instead (Frey, 2014). Nevertheless, WDBs did not eliminate the use of OSS for willful defiance for several reasons. First, teachers could suspend students in this category for other reasons, such as exhibiting violent behaviors or using profane language (Lasnover, 2015). Second, while academic outcomes draw intense scrutiny from the public and other authorities, OSS use attracts less attention. This lack of monitoring, training, and accountability systems might reduce schools’ willingness to comply with WDB requirements. Third, three out of the four districts (PUSD, SFUSD, and OUSD) implemented WDBs immediately after passage by their education boards. Schools did not have sufficient time to change their faculties’ perception that willful defiance was a necessary and sufficient reason to suspend students. Empirical evidence documented that around 55% of LAUSD teachers still viewed OSS as a legitimate consequence for willful defiance, even 2 years after implementing the ban (Lasnover, 2015).
This study estimates the impact of WDBs on student discipline outcomes using detailed data on OSS by race and suspension reason across four treated and 265 untreated school districts. I test whether (1) suspension reforms were met with full compliance and (2) whether schools substituted willful defiance OSS with nonwillful defiance OSS after WDBs. This work also examines the role of same-race teachers to explore whether student–teacher race match may affect the implementation of WDBs. Because California is a diverse state with large variations in student characteristics across schools, it must be noted that these findings can only be generalized to other California school districts with characteristics similar to those reformed school districts.
Method
Data
I used publicly available school-level data from the California Department of Education from SY 2011–2012 to SY 2018–2019. The data are reported at the school level. The number of OSSs issued was reported for each student racial group across six mutually exclusive categories: violent incidents leading to injury (violent injury), violent incidents that did not lead to injury (violent no injury), weapon possession, incidents related to illicit drugs (drug-related), defiance-only (willful defiance), and others (miscellanies). Supplemental Table A1 (available in the online version of this article) includes detailed definitions of infractions for each suspension category. After dropping schools with incomplete data, schools without data in the pretreatment period, nontraditional high schools, and schools with grades other than Grades 9 to 12, the remaining sample contained 4,730 school-year observations from 638 high schools in 269 school districts (online Supplemental Table A2 shows that the schools excluded due to incomplete data are similar to those schools with complete data). 5
Table 1 presents summary statistics on OSS rates, measured by the number of OSSs per 1,000 students by treatment status. On average, around 100 OSSs were issued per 1,000 students, including 40 for willful defiance and 60 for nonwillful defiance reasons. Each year, Black students received 211 OSSs per 1,000 students on average, twice as much as White or Hispanic students. Table 1 also shows slightly lower OSS and willful defiance OSS rates in treated districts before the implementation of WDBs. The nonwillful defiance OSS rates of treated and untreated schools were similar before suspension reforms. Columns 4 and 5 indicate that total suspension and willful defiance OSS rates decreased for all students after WDB implementation. However, the overall OSS rate for Black students only dropped by around 24 per 1,000 students, despite a 50 per 1,000 student reduction in the willful defiance OSSs rate. Such a moderate decline in the total suspension rate was not unexpected: among Black students, nonwillful defiance OSS increased from 145 to 179 per 1,000 students.
Descriptive Statistics on Suspension Rates by Treatment Status
Note. Suspension rates are the number of out-of-school suspensions per 1,000 students. Means are reported. Standard deviations are in parentheses. NWD=Nonwillful defiance; WD=Willful defiance.
Figure 1 plots the trends of overall, willful defiance, and nonwillful defiance OSS rates between SY 2011–2012 and SY 2018–2019. The vertical lines indicate the time of WDB implementation in each treated school district. This visual reveal a few interesting patterns. First, OSS rates vary dramatically across school districts, with PUSD and OUSD maintaining consistently higher levels than SFUSD across all three OSS rates. Furthermore, the average willful defiance OSS rates in the untreated districts were higher than those in the treated districts. Second, willful defiance OSS rates decreased over time in both the treated and untreated districts, while nonwillful defiance OSS rates remained unchanged, perhaps due to rising public concerns about the overuse of willful defiance OSS and the spillover effect of the statewide WDB. Third, Black students were suspended more than White and Hispanic students across all suspension categories. Figure 1 also confirms that schools did not perfectly comply with WDBs, either because of compliance issues or reporting errors (Frey, 2014; Lasnover, 2015). For example, White and Black students together only make up around 4% of the AUSD sample, with an average range of 0 to 4 willful defiance OSSs per school year. Therefore, the reported increases in this category may be attributed to noise and measurement errors. In addition, these increases indicate that WDBs may be less effective in schools with low willful defiance suspension incident rates due to the high costs of monitoring.

Changes in suspension rates over time.
Table 2 presents descriptive statistics on school characteristics by treatment status. In both treated and untreated school districts, Hispanic students made up around 40% of the total population. However, the proportions of Hispanic and White students were smaller in the treated schools than the untreated schools. In addition, while most teachers in the sampled schools were White, the treated schools employed more Black teachers and fewer White teachers than the untreated schools. There were also some small-size variations in student–teacher ratio, with a mean of 23.1, a standard deviation of 3.3, and an overall decrease in the four treated districts after the implementation of WDBs. Last, Table 2 shows that the treated school districts spent more per student and were more likely to have PBIS programs than the untreated schools.
Descriptive Statistics on School Characteristics by Treatment Status
Note. Means are reported. Standard deviations are in parentheses. FRPL = free or reduced-price lunch; PBIS = positive behavior intervention and support programs.
Empirical Strategy
I used a DD strategy to evaluate the causal impact of WDBs on discipline outcomes by comparing the gaps in OSS rates between the treatment and control districts before and after the implementation of WDBs. The DD strategy relies on the assumption that suspension rates in the treatment and control groups trended in a parallel fashion before the policy change, and that such trends would have continued if there were no policy change. Therefore, given that my outcome data were reported as count data, I specified a Poisson regression model (Equation 1):
where
In addition, the DD strategy assumes no other contemporaneous policy changes during the treatment period, which may be violated due to the adoption of PBIS and RJ in some school districts. To reduce these concerns, Equation (1) includes a series of time-variant controls (
For Equation (1), I estimated both an event study (nonparametric) version and a two-way fixed effect (parametric) version of the DD model because the nonparametric model allows treatment effects to vary over time. For example, the slow transition of suspension practices may mean that the first year’s treatment effects could be smaller than those in later years.
Equation (1) was also extended into Equation (2) to formally test the validity of the parallel trends assumption (Freyaldenhoven et al., 2019; Lafortune et al., 2018; Lindo & Packham, 2017):
Poisson models are preferred when data are measured in counts, and they do not require the transformation of data to accommodate zeros (online Supplemental Table A3 shows that around 5% of schools reported zero total and nonwillful defiance suspension rates, and a relatively larger proportion of schools reported zero willful defiance suspensions rates; Lindo & Packham, 2017; McClellan & Tekin, 2017; Osgood, 2000; Wooldridge, 2010). The Poisson estimates can be interpreted as changes in suspension rates by including school enrollments as controls and restricting their coefficients equal to one.
In addition, recent research suggests that DD analysis with staggered treatment timing can lead to biased estimates when treatment effects vary by cohorts (Goodman-Bacon, 2021; Sun & Abraham, 2020). While Goodman-Bacon (2021) proposed using decomposition to check the source of bias in DD coefficients, decomposition can only be applied to a linear model. Therefore, I calculated the interaction-weighted (IW) estimators, following Sun and Abraham (2020), to check for biases due to staggered timing in policy adoption.
Last, standard errors were clustered at the district-year level to allow within cluster correlation of error terms (results using cluster-robust standard errors at the district level are presented in the online Supplemental Appendix; Abadie et al., 2017). However, because only four school districts were treated, the cluster-robust standard errors might over reject the null hypothesis. Recent research suggests that p values from subcluster wild bootstrap (WR) and randomization inference (RI) procedures perform better when the number of treated groups is substantially smaller than that of untreated groups in DD designs (Conley & Taber, 2011; MacKinnon & Webb, 2018, 2020; Roodman et al., 2019; Young, 2019). Therefore, I also calculated p values using both the RI method and the subcluster restricted WR method at the district-year level. Specifically, I calculated randomization inference p values based on both coefficients (RI-c) and t statistics (RI-t), in line with Young (2019) and Mackinnon and Webb (2020), who found RIs based on t statistics to be more robust. For each p value, I followed the procedure mentioned in Heß (2017) and Pfeifer et al. (2020) with 1,000 permutations. Even though RI p values are superior to the cluster-robust standard errors, Mackinnon and Webb (2020) showed that RI p values under reject the null when the size of the treated groups is larger than that of the untreated groups. In this analysis, the treated districts contain 35.25 school-year observations, compared with 17.31 in untreated districts. Therefore, it needs to be noted that the RI p values in this study may lead to the underrejection of the null hypotheses.
Results
WDBs and Suspension
Critically, the DD design assumes that districts with WDBs and those without should have similar trends in suspension rates. Before presenting the main results, Figure 2 plots the event study coefficients and the 95% confidence intervals using Equation (2). Figure 3 presents the event study coefficients based on IW estimators proposed by Sun and Abraham (2020). Figures 2 and 3 show no pretrends for all event study figures except for White students. Notably, the treated school districts implemented WDBs because existing suspension reduction programs failed to reduce suspension rates among Black and Hispanic students. The pretrend in outcomes for White students could be attributed to those early efforts in reducing suspension rates. In estimating Equation (1), I accounted for this differential trend in White students’ outcomes across treated and untreated districts by extrapolating a linear trend from the two periods immediately preceding the WDBs following the practice of Dobkin et al. (2018) and Freyaldenhoven et al. (2019; see online Supplemental Table A5 for a reestimation of the main results using this alternative specification, which are consistent with the main results). 7

Event study figures.

Event study figures using interaction-weighted (IW) estimates.
Panel A of Table 3 displays the nonparametric DD estimates of Equation (2) as well as WR p values by race and reason. Estimates in Columns 1 to 3 indicate that while WDBs significantly reduced willful defiance OSS rates, they had little effect on overall and nonwillful defiance OSS rates, as suggested by the large p values. Particularly, the nonparametric estimates reflect the dynamic changes in willful defiance OSS rates after the implementation of WDBs; willful defiance OSS rates saw a relatively small decrease in the first year (67% decrease, e−1.123 − 1 ≈ −0.67
The Impact of Suspension Bans on Suspension Rates
Note. Panel A reports results from the nonparametric DD model using Equation (1); Panel B reports parametric DD results. Results for White students are estimated using models with a pretrend included. All models include covariates, time fixed effects, and school fixed effects. Covariates include the percentage of White, Black, Hispanic, and female students, percentage of students receiving free or reduced-price lunch, percentage of teachers who are White, Black, Hispanic, female, and hold a master’s degree, per-student expenditure, PBIS implementation, and student–teacher ratio. Two dummy variables are included to control for the missing in-school expenditure and the student–teacher ratio. Subcluster restricted wild bootstrap (WR) p values clustered at the district-year level and randomized p values (RI-c and RI-t) are shown in square brackets. Cells with missing WR p value indicate the WR process fails to generate p values. Robust standard errors and p values clustered at the district-year level are shown in parentheses and as asterisks. RI-c = randomization inference p values based on coefficients; RI-t = randomization inference p values based on t statistics; DD = difference-in-differences; PBIS = positive behavior intervention and support programs.
p < .1. **p < .05. ***p < .01.
The remaining columns in Panels A and B explore the heterogeneous impact of WDBs across student racial groups. According to the nonparametric estimates in columns 4 and 6, the changes in overall OSS rates for White and Hispanic students are not significant at the conventional level (p > .05). However, overall OSS rates for Black students increased by around 22% to 41% (
According to results in columns 7 to 9 of both Panels A and B, willful defiance OSS rates fell sharply compared with the contemporary changes in other districts for students of all races. Specifically, according to the parametric DD estimates, willful defiance OSS rates decreased by 78% for White students, 73% for Black students, and 67% for Hispanic students (
Estimates for nonwillful defiance OSS rates by race are displayed in columns 10 to 12 of Panels A and B. The nonparametric estimates imply that nonwillful defiance OSS rates for White students decreased by around 32% (
The Impact of Suspension Bans on Suspension Rates Using IW Estimators
Note. Results are estimated using IW methods; results for White students are estimated using models with a pretrend included. All models include covariates, time fixed effects, and school fixed effects. Covariates include the percentage of White, Black, Hispanic, and female students, percentage of students receiving free or reduced-price lunch, percentage of teachers who are White, Black, Hispanic, female, and hold a master’s degree, per-student expenditure, PBIS implementation, and student–teacher ratio. Two dummy variables are included to control for the missing in-school expenditure and the student–teacher ratio. Randomized p values (RI-c and RI-t) are shown in square brackets. Cells with missing WR p value indicate the WR process fails to generate p values. Robust standard errors and p values clustered at the district-year level are shown in parentheses and as asterisks. IW = interaction-weighted. RI-c = randomization inference p values based on coefficients; RI-t = randomization inference p values based on t statistics; PBIS = positive behavior intervention and support programs.
p < .1. **p < .05. ***p < .01.
Table 5 presents differences in the parametric DD coefficients across each racial group.
6
For all three OSS rates, I tested differences in the treatment effects between Black and White students, Black and Hispanic students, and Hispanic and White students. The effects of WDBs on nonwillful defiance OSS rates are significantly different between Black and White students and Black and Hispanic students (p < .05 for both analytical and WR p values). The treatment effects on overall OSS rates were also different between Black and White students (p < .05 for analytical p value, p < .06 for WR p value). However, Table 5 shows no evidence that the effects of WDBs on willful defiance OSS rates vary by race. Specifically, after WDBs were implemented, the nonwillful defiance OSS rates for Black students significantly increased, by 64% (
Differential Impacts of Suspension Bans Across Student Subgroups
Note. This table compares the differences in treatment effects between Black and White, Black and Hispanic, and Hispanic and Black students. The coefficient in each cell is estimated from a separate regression. The coefficients are the interaction terms between treatment and race dummies for models that all controls are fully interacted with race dummies on a sample combined from two race groups. All models include covariates, time fixed effects, and school fixed effects. A pretrend is included when White students are included in the comparison. Covariates include the percentage of White, Black, Hispanic, and female students, percentage of students receiving free or reduced-price lunch, percentage of teachers who are White, Black, Hispanic, female, and hold a master’s degree, per-student expenditure, PBIS implementation and student–teacher ratio. Two dummy variables are included to control for the missing in-school expenditure and the student–teacher ratio. Subcluster restricted wild bootstrap (WR) p values clustered at the district-year level are shown in brackets. Cells with missing WR p value indicate the WR process fails to generate p values. Robust standard errors and p values clustered at the district-year level are shown in parentheses and as asterisks.
p < .1. **p < .05. ***p < .01.
The Presence of Same-Race Teachers
If the substitution of suspension reasons was used to cope with WDBs, the prevalence of the strategy might depend on the characteristics of referring teachers. Research on student–teacher racial match has shown that the presence of same-race teachers could improve students’ academic and behavioral outcomes (Dee, 2004, 2005; Gershenson & Papageorge, 2018; Holt & Gershenson, 2019; Papageorge et al., 2020). Redding (2019) summarized three pathways for this phenomenon. First, same-race teachers may hold higher expectations of same-race students than do teachers of other races (Gershenson et al., 2016; Grissom et al., 2015; Ouazad, 2014), adjust instructions to meet the needs of same-race students (Aronson & Laughter, 2016), and build stronger relationships with students and their parents (Saft & Pianta, 2001). Second, students may respond more favorably to same-race teachers and learn from them to overcome negative racial stereotypes (Dee & Penner, 2019; Steele, 1997). Third, same-race teachers may advocate for changes in school policies and teacher behavior, benefiting students in their racial groups (Grissom et al., 2009).
Therefore, a shared cultural understanding motivates teachers to act or think in a way that could benefit students of their races in making disciplinary decisions. For example, teachers are less likely to escalate a negative response to the behavior of students of their own races (Okonofua & Eberhardt, 2015). However, the extent to which teachers could actively benefit same-race students depends on their ability and desire. Empirical research in other fields has shown such desire and ability are limited by factors such as organizational culture, socialization, and policy environment (Keiser et al., 2002; Wilkins & Williams, 2008). Before the implementation of WDBs, suspending disruptive students was the default and a consensus among school employees (Lasnover, 2015). Under such norms, teachers who preferred to avoid suspending disruptive students might have been pressured by their colleagues and/or risked violating school discipline policies. The implementation of WDBs, thus, served as a nudge and reduced the social and mental costs that such teachers faced before reforms. Although WDBs may also change the use of OSS among teachers of a different race, the lack of cultural understanding and racial stereotypes may encourage coping behavior during implementation by suspending students using other suspension reasons (Lasnover, 2015). Thus, we expect same-race teachers to use willful defiance suspensions and cite other suspension reasons as substitutes less frequently than do teachers of other races. These favorable behaviors among teachers would trigger student behavioral improvements and further reduce overall OSS rates among same-race students. Consequently, I hypothesize that same-race teachers mitigate the negative impacts of WDBs on students.
Table 6 examines how the impact of WDBs on student discipline outcomes varies by the presence of same-race teachers.
8
Each cell reports a coefficient on an interaction term between the WDB dummy and the percentage of same-race teachers in each school. The results show that increases in the percentage of White and Hispanic teachers did not lead to additional decreases in OSS rates for White and Hispanic students. This may follow from the fact that White and Hispanic students represent majorities and are thus subject to fewer perceptual biases in California (Riddle & Sinclair, 2019; Stewart et al., 2009). Yet, for Black students, 1 percentage point increase in Black teachers was associated with a 0.8% (
The Presence of Same-Race Teacher and the Impact of WDBs
Note. Coefficients in this table are the interaction terms between the treatment dummies and the percentage of White, Black, and Hispanic teachers, estimated from a separate regression. All models include covariates, time fixed effects, and school fixed effects. A pretrend is included when White students are included in the comparison. Covariates include the percentage of White, Black, Hispanic, and female students, percentage of students receiving free or reduced-price lunch, percentage of teachers who are White, Black, Hispanic, female, and hold a master’s degree, per-student expenditure, PBIS implementation and student–teacher ratio. Two dummy variables are included to control for the missing in-school expenditure and the student–teacher ratio. Subcluster restricted wild bootstrap (WR) p values clustered at the district-year level are shown in brackets. Cells with missing WR p-value indicate the WR process fails to generate p values. Robust standard errors and p values clustered at the district-year level are shown in parentheses and as asterisks. WDBs = willful defiance suspension bans; PBIS = positive behavior intervention and support programs.
p < .1. **p < .05. ***p < .01.
Robustness Checks
I checked the robustness of the main results by reestimating the parametric models in Panel B of Table 3 using ordinary least squares (OLS) and weighted least square (WLS), as both are considered equivalent to the Poisson model. However, the natural log of suspension rates is undefined for some outcomes, because some observations contain zero OSS rates. I addressed this issue by transforming the OSS rates per 1,000 students using the inverse hyperbolic function. WLS estimates were weighted by student enrollments in each racial group. I also followed Duflo (2001) and Bhuller et al. (2013) in interacting the districts’ average baseline outcomes and covariates in SY 2011–2012 with a linear time trend for each school district. These specifications allow the implementation of WDBs to be related to the underlying time trends depending on the outcomes or the characteristics of the districts before the implementation of WDBs. The specification for the baseline outcomes is
and the specification for the baseline covariates is
Last, I added LAUSD and dated its WDB to 2013, the year of its formal announcement, to test for sensitivity to the inclusion of this earliest adopter.
Online Supplemental Tables A10 to A12 in present results from these robustness checks, along with the main results for overall, willful defiance, and nonwillful defiance OSS rates. While the OLS and WLS estimates in the online Supplemental Tables A10 and A12 are less precise, both report similar coefficients to the main results (OLS and WLS results were omitted for willful defiance OSS rates, since an excessive number of schools reported zero willful defiance suspensions after implementing WDBs). Columns 4 and 5 in online Supplemental Tables A10 to A12 present the estimates based on Equations (3) and (4). Again, these estimates are similar, indicating that the main results are robust to including interacted time trends with baseline outcome and covariates. Last, the results in column 6 show that the main results are not sensitive to adding LAUSD in the analysis.
Student-Level Analysis
Data and Empirical Strategy
The results in previous section suggest a potential explanation for the unintended increase in nonwillful defiance OSS rates for Black students after the implementation of WDBs: school employees (e.g., teachers) continued suspending students for these infractions using different labels. While the increased presence of same-race teachers could mitigate such undesirable consequences, deterrence theory implies that increases in nonwillful defiance suspension rates after WDBs may be attributed to student behavioral changes (Nagin, 2013; Pesta, 2018).
To empirically test whether changes in OSS rates were due to behavioral changes, the main analysis was supplemented with analysis of individual-level student data from the biannual High School Youth Risk Behavior Surveys (YRBS), in LAUSD, SFUSD, and San Diego Unified School District (SDUSD), between 2001 and 2017. These YRBSs are part of the YRBSS, developed in 1990, to monitor health behaviors among youth and young adults, including those that contribute to unintentional injuries and violence. YRBSs include national surveys administered by the Centers for Disease Control and Prevention, state surveys conducted by state governments, and local surveys administered by local governments (e.g., local school districts). The district-level YRBS data in this study are from three of the 28 local YRBSs across the United States (LAUSD, SFUSD, and SDUSD). All three school districts have YRBS data before the implementation of WDBs, with a survey response rate above 60%. In the three included school districts, YRBSs are implemented among representative district samples of 9th- through 12th-grade students, during the spring semester, every 2 years between 2001 and 2017. This article uses data from the 2001, 2003, 2005, 2007, 2009, 2011, 2013, and 2017 surveys.
I measured the level of disruptiveness by creating a series of dummy variables, including whether students were (1) involved in a fight on school property, (2) brought weapons to school, (3) were offered illicit drugs on school property, and (4) used marijuana based on survey questions (see online Supplemental Appendix B for the original questions in the YRBS survey). Overall, across 44,577 observations, Hispanic students made up about 43% of respondents, and Black and White students accounted for only about 7% and 12%, respectively. The remaining respondents represented other minority groups (see online Supplemental Table A13 for the characteristics of surveyed students).
I used a parametric DD strategy and estimated a linear probability model to measure changes in student behavior after the implementation of WDBs:
where
The Results From the YRBS Data
Table 7 presents the sample averages and the treatment effects estimated by Equation (5) on self-reported behavior by race. As shown in Panel A, while 12% of all students were involved in a fight on school property, Black students were around nine percentage points more likely to be involved in a fight than White students. The DD estimates consistently show that the implementation of WDBs led to decreases in the likelihood of being involved in a fight for White, Black, and Hispanic students, although the estimates are not statistically significant. Online Supplemental Table A14 further shows that the DD estimates of Black students are also not statistically different from those of White and Hispanic students. Panel B of Table 7 displays the change in likelihood of carrying weapons in school. Even though Black students were around two percentage points more likely to carry a weapon than White students, the WDBs affected neither of the two. Panels C and D present the likelihood of being offered illicit drugs and using marijuana. Although the sample average indicates that Black students behaved similarly to White and Hispanic students, only Hispanic students experienced a decrease in the likelihood of being offered illicit drugs and using marijuana after WDBs were implemented.
The Impact of WDBs on Students’ Behavior by Race
Note. Means are based on weighted data from the LAUSD, SFUSD, and SDUSD YRBS for available years. Some questions were not asked in certain years. Each coefficient is estimated using a separate linear probability model that controls students’ age, grade, sex, race, year fixed effects, and district fixed effects. Standard errors are calculated using clustered robust standard errors at the district-year level shown in parentheses. The subcluster restricted WR p value at the district-year level are present in brackets. Asterisks are based on analytical p value clustered at district-year level. LAUSD = Los Angeles Unified School District; SFUSD = San Francisco Unified School District; SDUSD = San Diego Unified School District; YRBS = Youth Risk Behavior Surveys; WR = wild bootstrap.
In the past 30 days. bWithin the last year or the past 12 months.
p < .1. **p < .05. ***p < .01.
These findings reveal no evidence that students of a certain race became more disruptive after WDB enactments. Although one could argue that the WDBs generated positive behavior changes among Black students (because disruptors were suspended for nonwillful defiance reasons), this cannot explain why only Black students faced more nonwillful defiance OSSs and White students experienced similar behavior improvements without experiencing increases in suspensions. In addition, student-level analysis suggests that White and Hispanic students experienced some behavior improvements after WDBs were implemented. The purpose of WDBs is to reduce OSS rates when student behavior remains the same. In other words, OSS rates should decrease or remain the same even if student behavior becomes worse. However, the findings show that student behavior improved while OSS rates remained constant, or even increased, indicating that local WDBs failed to achieve their purpose.
To summarize, evidence in Table 7 suggests that, following the implementation of WDBs, student behavior across all racial groups improved instead of deteriorated. Therefore, the increase in nonwillful defiance OSS rates for Black students is at odds with improvements in their behavior.
Conclusion
This study examines the impact of WDBs on the use of OSSs in four California school districts. The results indicate that WDBs effectively reduced willful defiance OSS rates by around 67%. However, the WDBs did not affect overall OSS rates: an increase in nonwillful defiance OSS rates offset the decreases in willful defiance OSS rates.
Furthermore, the findings suggest that the impact of WDBs varied by student race. Despite receiving OSS at much higher rates before adoption, Black students benefited less from WDBs than White and Hispanic students. Particularly, WDBs increased nonwillful defiance OSS rates for Black students by around 26%, which contributed to increases in overall OSS rates. There were no significant changes in overall OSS rates for White and Hispanic students following the implementation of WDBs. Students’ behavior changes could not explain such heterogeneity in treatment effects by race. Rather, it is possible that some broadly defined nonwillful defiance category, particularly “violent no injury” or “miscellaneous,” replaced willful defiance as the reason for suspension. In addition, the substitution of suspension reasons among Black students were more salient in schools with fewer Black teachers, who can be assumed to hold fewer perceptual biases against Black students (see results in Tables 6 and A8). These findings are consistent with previous qualitative research by Lasnover (2015), who found that some teachers in LAUSD did suspend students under nonwillful defiance reasons when they no longer had willful defiance as a legitimate reason.
It is important to acknowledge that the analyses in this article suffer from several limitations. First, one should be cautious about applying this article’s findings to schools outside California because of the state’s unique culture and demographic composition. Second, due to data limitations, my analyses only focus on the use of OSS. If WDBs also led to more in-school suspensions, my estimates could be biased downward, and my results would be the lower bounds of the true effects of WDBs on student discipline outcomes. Future studies with data on student–teacher race match at the individual-level could complement this study. Last, I could not test the impact of WDBs on test scores because the “end of semester tests” after the 2014–2015 school year were not comparable to the previously offered “end of course tests.” I tested the impact of WDBs on students’ graduation and dropout rates but found no significant changes in either outcome (see online Supplemental Table A15 for these results).
Steinberg and Lacoe (2017) categorized discipline reforms into program-based interventions and policy-based interventions. Program-based interventions aim to improve school climates, encourage positive behavior, and reduce violence among students through nonpunitive approaches such as mentoring, group therapy, and teacher training, while policy-based reforms, such as WDBs, revise discipline policies. My findings, along with previous research on Chicago and Philadelphia suspension reforms, suggest that policy-based reforms are ineffective in changing the use of suspension in schools (Lacoe & Steinberg, 2018; Sartain et al., 2015; Steinberg & Lacoe, 2017). This study shows that overall OSS rates did not change after WDB reform, and, therefore, suggests that policy-based discipline reforms might make only limited contributions to improvements in academic performance.
This study also sheds light on the dangers of designing policy without accounting for unintended consequences and the critical role of frontline workers in policy implementation. The WDBs produced such unexpected consequences across three specific aspects. First, the policies did not consider that OSSs in the nonwillful defiance category could serve as substitutes for willful defiance OSSs. Second, the abrupt changes in discipline policy did not eliminate biases against Black students. Third, school districts failed to provide enough support for policy implementation. Policy makers looking to extend the current California statewide WDB in the next 5 to 10 years should consider restricting the use of all broadly or vaguely defined suspension reasons and add action items to support implementation, such as increasing the number of minority teachers, providing teacher training on effective discipline, and systematically adopting program-based discipline reforms like PBIS or RJ.
Supplemental Material
sj-docx-1-ero-10.1177_23328584211068067 – Supplemental material for The Impact of Suspension Reforms on Discipline Outcomes: Evidence From California High Schools
Supplemental material, sj-docx-1-ero-10.1177_23328584211068067 for The Impact of Suspension Reforms on Discipline Outcomes: Evidence From California High Schools by Rui Wang in AERA Open
Footnotes
Acknowledgements
The author thanks Erdal Tekin, Seth Gershenson, Anna Amirkhanyan, Robert Shand, and three anonymous reviewers and editors for their detailed and constructive feedback on earlier drafts of this article. The author is grateful for the help from Michael Lombardo and Luke Anderson of the California PBIS Coalition, who generously shared their data on PBIS implementation. The author is also grateful for the codes and insights shared by James MacKinnon, Manudeep Bhuller, Liyang Sun, Katharine Strunk, Ayesha Hashim, and Tasminda Dhaliwal. Youth Risk Behavior Surveillance System data were kindly provided by the San Francisco Unified School District, Los Angeles Unified School District, and Centers for Disease Control and Prevention and were used with permission. Opinions and errors are the sole responsibility of the author.
1
I consulted multiple sources to identify school districts that implemented WDBs, including district policy manuals, district board meeting documents, local newspapers, and direct contacts with local school districts. I omitted Los Angeles Unified School District (LAUSD), which banned suspension for willful defiance in SY 2011–2012, before which I have no data. The identified reformed school districts are consistent with the list of reformed school districts provided by California State Senator Nancy Skinner, who proposed Senate Bill 419, which eliminated willful defiance suspensions for K–8 students statewide and were signed by California Governor Newsom.
2
This study focuses on White, Black, and Hispanic students for two reasons. First, privacy concerns prevent the California Department of Education from releasing school-level data for student groups with fewer than ten enrollments. As a result, expanding the current analysis to Asian Americans would lead to a substantive reduction in sample size. Second, previous research has shown that Asian American students usually experience fewer suspensions than students of other races (Morgan & Wright, 2018). Due to data limitations and the rarity of suspension incidents, I decided to exclude Asian American students from this study.
5
I restricted the analytical sample to high schools that only offer classes from Grades 9 to 12 for two reasons. First, this ensures that the suspension data across schools represent students from the same grade ranges. Second, it would prevent any spillover effects due to discipline policy changes in lower grades (e.g., A.B. 420 banned willful defiance suspension for all students in kindergarten to Grade 3). In addition, alternative, juvenile justice, virtual teaching, and magnet schools were dropped from the analytical sample because they operate under different goals and mainly serve special groups of students. Due to the requirements of the Family Privacy Act, the California Department of Education only publishes the data if the enrollment of a specific group is larger than 10. Therefore, I excluded schools from the analysis if suspension data for any racial group in a school was missing.
6
I compared the treatment effects between two racial groups by testing the coefficient of the interaction term between the treatment dummy and the race dummy from a model in which all controls, including school and year fixed effects, are fully interacted with the race dummy. Although it is possible to construct a model to estimate the differences in treatment effects across three racial groups, the number of interactions between the school fixed effects and race dummies prevent Poisson regressions from converging when calculating WR p values.
7
Following Dobkin et al. (2018) and
, this work includes the linear trend from the two periods immediately preceding the WDBs for White students using the following equation:
where
8
Data were reported at the school level, and teacher identifiers were redacted for confidentiality considerations, prohibiting the identification of teachers who assigned OSS. I operationalized the presence of same-race teachers by calculating the percentage of same-race full-time teachers in each school.
Author
RUI WANG is an assistant professor at Shanghai University of Finance and Economics. His research focuses on educational equity and students’ noncognitive outcomes.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
