Abstract
Racial disparities in policing are well documented, but the reasons for such disparities are often debated. In the current research, we weighed in on this debate using a regional-level bias framework: We investigated the link between racial disparities in police traffic stops and regional-level racial bias, employing data from more than 130 million police traffic stops in 1,413 U.S. counties and county-level measures of racial bias from more than 2 million online respondents. Compared with their population share in county demographics, Black drivers were stopped at disproportionate rates in the majority of counties. Crucially, disproportionate stopping of Black drivers was higher in counties with higher levels of racial prejudice by White residents (
Keywords
Racial disparities in policing, ranging from higher rates of Black Americans being stopped, questioned, or frisked (e.g., Eberhardt, 2016) to incidents of police brutality and shootings (e.g., Hester & Gray, 2018; Voigt et al., 2017), are well documented. But there is an ongoing debate about the reasons for such disparities. Some writers have argued that racial disparities in policing are rooted in widespread racial bias (e.g., Lowery, 2020), whereas others have argued that “a few bad apples” are responsible for police misconduct (e.g., Benner, 2020). Such disparate views indicate that merely documenting racial disparities in policing offers neither sufficient explanation nor ways to overcome them (Hetey & Eberhardt, 2018). In the present research, we contributed to this debate by investigating whether the contexts in which police officers operate relate to racial disparities in policing outcomes.
Our analyses focused on police traffic stops, which are the most common situation in which Americans encounter police (Davis & Whyde, 2018). Broadly, two approaches have been used to investigate racial disparities in policing: field and lab studies. Field research demonstrates that Black drivers are stopped at higher rates than White drivers (e.g., Gelman et al., 2007; Langton & Durose, 2013; Pierson et al., 2020; Warren et al., 2006). Furthermore, traffic stops are often starting points for further racial disparities in policing: When stopped, Black Americans are treated less respectfully (Voigt et al., 2017) and are more likely to be searched (e.g., Higgins et al., 2011; Pierson et al., 2020), arrested (e.g., Kochel et al., 2011), and exposed to excessive force than White Americans (e.g., Edwards et al., 2019; Fryer, 2019). Thus, police traffic stops can set in motion a cascade of events that may threaten people’s lives and livelihoods, ultimately undermining trust in law enforcement (Camp et al., 2021).
Although such field studies document the extent of racial disparities, they often allow only indirect inferences of why these disparities occur. For example, Pierson and colleagues (2020) conducted a “veil-of-darkness test” to examine racial profiling in police traffic stops. The rationale behind this test is that if disproportionate stopping of Black drivers were indeed caused by biased decision-making, racial disparities should be larger during daylight, when drivers’ racial-group membership is more easily categorized, compared with during darkness (Grogger & Ridgeway, 2006). Consistent with this reasoning, results showed that racial disparities in police traffic stops were indeed smaller in darkness than during daylight, suggesting that racial-group membership might play a role in officers’ stop decisions (but see Worden et al., 2012). Yet such indirect tests of racial bias in field research are limited because they cannot investigate police officers’ attitudes, motives, and behavioral decisions while on duty. Alternatively, lab studies allow more fine-grained analyses of psychological processes in police officers’ behavior. For example, some studies have linked racial disparities in use-of-force decisions to threat-related stereotypes (Correll et al., 2015) or prejudice (Eberhardt et al., 2004). This research provides insights into cognitive and affective processes in split-second decisions. However, it is debatable to what extent findings in the lab generalize to real-world police encounters because lab studies potentially underestimate (Jasperse et al., 2022) or overestimate (Cesario, 2021) effects of bias in real-world decision-making. In the current research, we combined both approaches to examine racial disparities in policing by connecting psychological studies of intergroup biases and field data of policing outcomes within a regional-level bias framework (see Hehman et al., 2019).
Statement of Relevance
In the United States, Black drivers are stopped with disproportionate frequency, and these traffic stops are often starting points for multiple forms of racial disparities in policing. Current societal debates center on whether this is the result of widespread racial bias or whether it should be attributed to the actions of a few rogue police officers. Analyzing massive data sets of more than 130 million traffic stops from 1,413 U.S. counties, we found that White residents’ racial bias aggregated at the county-level relates to racial disparities in police traffic stops. These findings suggest that research on racial disparities in policing should not only focus on individual officers but also examine the social contexts in which police operate.
For additional thoughts on some of the psychological issues of societal importance considered in this research, see the invited Further Reflections piece authored by Payne and Rucker (2022), available online at https://doi.org/10.1177/09567976211056641 and on pages 666–668 of this issue.
Applying this approach, a growing number of studies used aggregated measures of prejudice and stereotypes at the regional level (i.e., counties, states) to investigate relationships between racial bias and racial disparities in societal outcomes. Such studies have demonstrated that racial disparities in policing, health care, and education are more likely in regions where residents demonstrate higher levels of racial bias than in regions with lower levels of racial bias (e.g., Hehman et al., 2018; Orchard & Price, 2017; Riddle & Sinclair, 2019). Recently, Payne et al. (2017) proposed a theoretical framework for such findings, conceptualizing racial stereotypes and prejudice as relatively stable characteristics of contexts (rather than people), making the activation of biased thoughts and feelings in some contexts more likely than in others. Building on this idea, we propose that racial disparities in policing may thus not entirely depend on individual attitudes of police officers but may also depend on the contexts in which they act. There is evidence that policing outcomes vary by region (e.g., Police Scorecard, n.d.), and these differences may relate to characteristics of the local environment that affect both local residents’ racial biases and policing behavior. For example, region-specific crime levels, racial segregation, quality and quantity of intergroup contact, local politics, and media content may affect whether and how strongly Black people are linked to stereotypes of criminality and threat (e.g., Sim et al., 2013). Consequently, in regions with higher levels of crime- or threat-related stereotyping, police officers might be cued to interpret Black drivers’ behavior as suspicious. Another possibility is that regional levels of anti-Black and/or pro-White prejudice might affect racial disparities in police traffic stops. Higher regional levels of hostility toward Black people might increase police officers’ inclination to stop Black drivers as a means of nuisance or harassment. Alternatively, higher regional levels of liking of White people might increase police officers’ inclination to spare White drivers the nuisance of a traffic stop. Importantly, given that stereotypes and prejudice can be related or distinct components of intergroup attitudes (e.g., Amodio & Devine, 2006; Phills et al., 2020), regional-level stereotypes and prejudice may independently or interactively relate to racial disparities in police traffic stops.
On the basis of these considerations, we investigated regional levels of racial stereotypes and prejudice and their relationships with racial disparities in police traffic stops using large, publicly accessible data sets. We linked county-level aggregates of several measures of racial stereotypes and prejudice to county levels of racial disparities in police traffic stops. Societal racial bias and discrimination is described as relying on White power structures (e.g., Berard, 2008), and classic intergroup theories in social psychology suggest that societal institutions (e.g., law enforcement) are largely shaped by dominant groups (e.g., Sidanius & Pratto, 1999). Consequently, we focused our analyses on White residents’ aggregates of stereotyping and prejudice as proxies of regional-level bias. 1
Method
Data sources
Police traffic stops
We retrieved data on police traffic stops from the Stanford Open Policing Project (Pierson et al., 2020), which collects and standardizes data on vehicle and pedestrian stops from law enforcement departments across the United States. Our analyses included data on 134,016,874 state patrol traffic stops located in 1,413 counties across 24 U.S. states documented between the years 2000 and 2018. We included all traffic stops for which drivers’ race and county information was reported. For three states (Illinois, New Jersey, and Vermont), we derived county information on the basis of zip codes and/or municipalities. We analyzed vehicle stops only and excluded data on pedestrian stops. For each county, we calculated the percentage of Black drivers among all stopped drivers. To examine whether Black drivers were stopped at disproportionate rates relative to their population share at the county level, we subtracted the percentage of Black residents in each county (as reported by the 2017 U.S. Census; U.S. Census Bureau, 2017) from the percentage of Black drivers stopped in each county. This resulted in a score of disproportionate stopping of Black drivers; zero indicates no disproportion, and a positive score indicates stopping rates higher than expected on the basis of county population.
Racial stereotypes and prejudice
Measures of racial stereotypes and prejudice were retrieved from Project Implicit (Xu et al., 2022). Threat-related stereotypes were measured with a weapons Implicit Association Test (IAT) as well as with items asking respondents how much more they associate Black people, relative to White people, with weapons and harmless objects. Prejudice was measured with an evaluative race IAT, an item asking respondents for their relative preference for White people relative to Black people, and feeling thermometers. We included data collected between 2002 and 2018 from U.S. respondents who self-identified as White and for whom geographic information was reported. On the basis of respondents’ geographic information, we aggregated measures of prejudice and stereotypes at the county level.
Weapons IAT
In the weapons IAT, respondents used two response keys to classify Black and White faces according to race and images of objects as weapons or harmless objects. In trials during which White faces and harmless objects share one response key and Black faces and weapons share another response key (compared with the pairing of White faces and weapons and Black faces and harmless objects), faster responses and fewer errors are interpreted as an indirect indication of threat-related stereotype representations of Black people relative to White people. Responses were converted into IAT D scores (Greenwald et al., 2003). Higher IAT D scores indicate a stronger stereotype effect.
Self-reported racial stereotypes
To measure self-reported endorsement of threat-related racial stereotypes, we analyzed respondents’ ratings on two items assessing how much they associate weapons and harmless objects with Black and White people on scales ranging from 1 (
Evaluative race IAT
In the race IAT, respondents use two response keys to classify Black and White faces according to race and positive and negative words (e.g., “lovely,” “terrible”) according to valence. In trials during which White faces and positive words share one response key and Black faces and negative words share another response key (compared with the pairing of White faces and negative words and Black faces and positive words), faster responses and fewer errors are interpreted as an indirect indication of pro-White/anti-Black evaluations (i.e., racial prejudice). Higher IAT D scores in the race IAT indicate a stronger preference for White relative to Black people.
Self-reported racial prejudice
To measure self-reported racial prejudice, we analyzed respondents’ relative preference for White people relative to Black people on a scale ranging from “
To measure self-reported racial prejudice in another way, we analyzed how much warmth respondents felt toward White people and Black people on feeling-thermometer scales ranging from 0 (
Analysis plan
To increase reliability for measures of stereotypes and prejudice, researchers have applied selection criteria of minimum respondents per county to be included in analyses (e.g., restricting analyses to 100 or 150 respondents per geographical unit; Hehman et al., 2018; Payne et al., 2019). Applying such fixed selection criteria is often arbitrary and without theoretical justification. Also, restricting the number of respondents reduces the number of counties included in analyses and may thereby reduce test power. To circumvent these problems and to increase transparency, we employed a multiverse approach, reporting findings across different selection criteria (i.e., including counties with more than 0, 25, 50, 75, 100, and 150 respondents, respectively).
We first computed zero-order correlations between disproportionate stopping of Black drivers and county-level stereotype and prejudice aggregates. To further examine the incremental value of different prejudice and stereotype measures and to control for potential covariates, we then tested several multiple-regression models. In a first series of models, we compared the contributions of different stereotype measures in predicting disproportionate stopping of Black drivers. Similarly, in a second series of models, we compared the contributions of different prejudice measures in predicting disproportionate stopping. In a third series of models, we compared the contributions of prejudice and stereotypes in predicting disproportionate stopping by adding all stereotype and prejudice measures as predictors. In a fourth and fifth series of models, we controlled for potential effects of county demographics, adding the percentage of Black and White residents as further predictors into the models. In a final set of analyses, we performed a veil-of-darkness test (Grogger & Ridgeway, 2006) and analyzed separate correlations between measures of racial bias and disproportionate stopping of Black drivers during daylight and darkness.
All analyses were conducted in the R programming environment (Version 4.1.0; R Core Team, 2019) using the R packages
Results
Preliminary analyses
Police traffic stops
Compared with their population share in the county demographics, Black drivers were stopped 2.75% (

Maps indicating percentiles of county-level disproportionate stopping of Black drivers (a), county-level race Implicit Association Test (IAT) D scores (b), and county-level self-reported racial prejudice (c).
Racial stereotypes and prejudice
Table 1 reports descriptive statistics for stereotype and prejudice measures for different selection criteria, restricted to counties for which we obtained data on police traffic stops. Results from the weapons IAT and from self-reported stereotypes showed that, on average, White respondents associate Black people with weapons, whereas they associate White people with harmless objects. Furthermore, White respondents’ race IAT D scores indicate an overall preference for White people relative to Black people. Similarly, White respondents’ self-reported measures of prejudice indicate an overall preference for White people and higher perceptions of warmth toward White people compared with Black people. Self-reported measures of prejudice were highly correlated at the county level (
County-Level Descriptive Statistics for White Residents’ Racial Stereotype and Prejudice Measures
Note: Means for the weapons and race Implicit Association Tests (IATs) are D scores. Stereotype scores are self-reported ratings of the extent to which harmless objects are associated with Black and White people (stereotype harmless) and the extent to which weapons are associated with Black and White people (stereotype weapons). Preference scores indicate preference for White people relative to Black people on a one-item measure, and values for the feeling thermometer are difference scores for White people relative to Black people. Positive scores on all measures indicate less favorable evaluations and more negative stereotypes of Black people relative to White people, whereas negative scores indicate more favorable evaluations and less negative stereotypes of Black people relative to White people.
Correlational analyses
Zero-order correlations
Figure 2 illustrates correlations between disproportionate stopping of Black drivers and county-level White residents’ weapons IAT D scores, self-reported racial stereotypes, race IAT D scores, and self-reported preference and felt warmth for White people relative to Black people (see also Table S2 in the Supplemental Material available online). There were no correlations between White residents’ weapons IAT D scores and disproportionate stopping of Black drivers (

Zero-order correlations between county-level disproportionate stopping of Black drivers and county-level measures of White residents’ racial stereotypes and prejudice. Results are shown separately for analyses with full county-level data and analyses with only subsets of the data. Error bars represent 95% confidence intervals. IAT = Implicit Association Test.
Multiple-regression models
To test unique contributions of different measures of racial stereotypes and prejudice in predicting disproportionate stopping of Black drivers, we conducted several regression analyses (Fig. 3; see also Tables S3–S8 in the Supplemental Material). To account for the hierarchical data structure with counties nested within states, we fitted linear mixed-effects models. In a first model, we included stereotype measures (i.e., weapons IAT D scores and self-reported racial stereotypes) as predictors. In this model, none of the predictors were consistently related to disproportionate stopping of Black drivers (conditional

Estimates from linear mixed-effects models predicting disproportionate stopping of Black drivers from county-level measures of White residents’ racial stereotypes and prejudice. Results are shown separately for models with full county-level data and models with only subsets of the data. Error bars represent 95% confidence intervals. IAT = Implicit Association Test.
Previous research suggests that racial bias is related to regional demographics, such as the percentage of Black or White people living in a region (e.g., Rae et al., 2015). Consequently, we conducted analyses, adding county proportions of Black and White residents as control variables. Because these variables were highly correlated (
Veil-of-darkness test
To perform a veil-of-darkness test, we followed recommendations from previous research (e.g., Grogger & Ridgeway, 2006; Pierson et al., 2020; Worden et al., 2012), restricting data for police stops to those times of the day that can be either light or dark, depending on seasonal shifts of sunset. By restricting the data to this intertwilight period, we could rule out the possibility that effects were potentially confounded by differences in driving patterns (i.e., one group driving more often during earlier or later hours of the day). In addition, we removed stops during the ambiguous twilight period of approximately 30 min between sunset and dusk. Note that these restrictions considerably reduced the number of police stops included in our analyses to 9,506,820 stops during daylight and 9,140,738 stops during darkness, measured in 1,022 counties in 15 states. Our analyses demonstrate that proportions of stopped Black drivers in comparison with the Black county population were lower during daylight (

Zero-order correlations between county-level disproportionate stopping of Black drivers and county-level measures of White residents’ racial stereotypes and prejudice during daylight (upper panel) and during darkness (lower panel). Results are shown separately for analyses with full county-level data and analyses with only subsets of the data. Error bars represent 95% confidence intervals. IAT = Implicit Association Test.
General Discussion
Combining data of more than 130 million traffic stops with regional aggregates of racial prejudice and stereotypes, we observed that racial disparities in police traffic stops were related to White people’s local levels of racial bias. These relationships were observed across different model specifications, highlighting their robustness. Relationships between racial bias and police traffic stops were consistent across different measures of racial prejudice but less consistent across measures of threat-related racial stereotypes. Furthermore, these relationships were observed regardless of daytime and after controlling for county demographics. The present findings are consistent with results of previous field studies documenting racial disparities in policing (e.g., Pierson et al., 2020). Connecting policing data to psychological studies of intergroup biases, our approach adds another layer by demonstrating robust relationships between racial disparities in policing outcomes and the contexts in which police officers operate.
Our finding that White county residents’ racial bias was associated with county-level racial disparities in traffic stops is consistent with other findings, such as that Black people were disproportionately shot by police in regions with higher levels of racial prejudice and threat-related stereotypes (Hehman et al., 2018). Although consistent overall, the present results depart from this prior finding in two ways. First, we observed stronger relationships for self-report measures than for indirect measures of racial bias. Second, our findings suggest that in stop decisions, prejudice toward Black people might be more relevant than threat-related racial stereotypes. These discrepancies might be explained by situational differences between shoot and stop decisions. Shoot decisions may occur more often in situations involving perceived immediate threats of physical harm, eliciting high arousal and fear and potentially reducing deliberate behaviors while increasing influences of threat-related stereotypes on police officers’ behavior (e.g., Correll et al., 2014). Consequently, situations surrounding shoot decisions should relate more closely to measures of threat-related stereotypes (March et al., 2020). Conversely, relationships between stop decisions and measures of threat seem less clear.
The role of prejudice in stop decisions is worth highlighting. Some scholars have argued that disparities in policing do not necessarily reflect racial bias but instead reflect that Black people are more often involved in crime, which in turn shapes people’s racial stereotypes (e.g., Cesario, 2021). The present findings challenge such notions, suggesting that relative liking and preference for White people over Black people played a more important role in racial disparities in police traffic stops than stereotypes (see also Essien et al., 2022).
The observed relationships between regional-level bias and police traffic stops underscore the role of the context in which police officers operate. Our findings are consistent with theorizing by Payne et al. (2017), who argued that some contexts expose individuals more regularly to stereotypes and/or prejudice, increasing mental accessibility of biased thoughts and feelings, in turn influencing individual behavior. Consequently, behavioral expressions of prejudice and stereotypes often reflect properties of contexts rather than stable dispositions of people (but see Connor & Evers, 2020).
One possible explanation of how regional-level bias might affect policing is that it reflects regional norms that interact with institutionalized practices in law enforcement. For example, police traffic stops in the United States are regularly used as investigative tools that encourage officers to maximize the number of traffic stops based on “suspicion of criminal activity” (Epp et al., 2017, p. 172). Although racially neutral on the surface, such practices may connect regional-level bias with police officers’ behavior: Incentives to maximize the number of stops in conjunction with local norms that favor White people might encourage police officers to disproportionately stop Black drivers. Moreover, when instructed to base initial stop decisions on “gut feelings,” officers may rely on contextually available prejudice or stereotypes (De Houwer & Tucker Smith, 2013; Epp et al., 2017; Swencionis et al., 2021). Additionally, racial bias of supervisors may affect decisions to direct officers to regions where they more likely encounter Black drivers (Beckett et al., 2006), consistent with documentations of overpatrolling of minority areas (Vomfell & Stewart, 2021). Thus, racial bias may affect investigatory stops even in the absence of prejudice or stereotyping on the part of the individual police officers involved. Future research needs to identify the psychological mechanisms that mediate between community racial bias and officer behavior, for example, by including direct investigation of institutionalized practices in law enforcement.
The veil-of-darkness test did not demonstrate greater racial disparities in police traffic stops during daylight compared with darkness, thus departing from previous research relying on similar data (Pierson et al., 2020). Although the hypothesis that darkness eliminates social categorization of drivers is intuitively convincing, it may not necessarily be true, especially because racial categorization has been shown to be robust to suboptimal viewing conditions (e.g., Kawakami et al., 2017). Given that the validity of the veil-of-darkness test hinges on a number of assumptions (e.g., that police officers use person features and neglect other information, such as vehicle type) and given other studies in which relationships between lighting conditions and racial disparities in police traffic stops were not observed (e.g., Worden et al., 2012), future research is needed to understand the boundary conditions of the veil-of-darkness test. Importantly, correlations between racial-bias measures and disproportionate stopping rates were unaffected by lighting conditions, underlining the robustness of these relationships.
The present research has limitations. First, the correlational approach does not allow causal interpretations. On the basis of previous theorizing (Payne et al., 2019), we suggest that regional-level bias provides a context that potentially affects police officers’ behavior. Alternatively, racial disparities in police traffic stops might affect regional-level bias by triggering negative evaluations linking Black people with crime (e.g., Hetey & Eberhardt, 2018). It is also possible that third variables affect both racial bias and policing behavior. For example, regional differences in socioeconomic disparities between Black and White residents might affect both racial bias and policing. This nonexhaustive list of alternative causal relationships illustrates the need for future research. Another limitation originates from the use of Project Implicit data (Xu et al., 2022), which are not representative of the U.S. population. Participants visit the demonstration website voluntarily and skew liberal (e.g., Essien et al., 2021). Similarly, because of the available data sets, our analyses included only one third of U.S. counties, excluding data for many midwestern and southern states. Consequently, we do not know whether or how including these regions might alter conclusions of this research.
Despite these limitations, it is important to highlight that the observed relationships were consistent across different measures of racial bias and relied on millions of online respondents and tens of millions of traffic stops. Our findings suggest that researchers analyzing racial disparities in policing should not only focus on individual officers but also consider the contexts in which police operate.
Supplemental Material
sj-pdf-1-pss-10.1177_09567976211051272 – Supplemental material for Racial Bias in Police Traffic Stops: White Residents’ County-Level Prejudice and Stereotypes Are Related to Disproportionate Stopping of Black Drivers
Supplemental material, sj-pdf-1-pss-10.1177_09567976211051272 for Racial Bias in Police Traffic Stops: White Residents’ County-Level Prejudice and Stereotypes Are Related to Disproportionate Stopping of Black Drivers by Marleen Stelter, Iniobong Essien, Carsten Sander and Juliane Degner in Psychological Science
Footnotes
Transparency
M. Stelter and I. Essien are joint first authors of this article; they developed the study concept and wrote the manuscript. C. Sander retrieved and prepared the data sets. M. Stelter prepared and analyzed the data and created the tables and figures. All the authors interpreted the results, revised the manuscript, and approved the final manuscript for submission.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
