Abstract
This study reports results from a new analysis of 17 survey experiment studies that permitted assessment of racial discrimination, drawn from the archives of the Time-sharing Experiments for the Social Sciences. For White participants (n=10 435), pooled results did not detect a net discrimination for or against White targets, but, for Black participants (n=2781), pooled results indicated the presence of a small-to-moderate net discrimination in favor of Black targets; inferences were the same for the subset of studies that had a political candidate target and the subset of studies that had a worker or job applicant target. These results have implications for understanding racial discrimination in the United States, and, given that some of the studies have never been fully reported on in a journal or academic book, the results also suggest the need for preregistration to reduce or eliminate publication bias in racial discrimination studies.
Keywords
Black Americans and White Americans disagree about the extent of racial discrimination in the United States today (Norton and Sommers, 2011). Social science can help reduce such disagreements and alert the public and policymakers to racial discrimination that needs to be eliminated or compensated for. Survey experiments that manipulate the race of experimental targets such as hypothetical employees or political candidates are particularly useful because such experiments provide strong causal inference compared to correlational studies that rely on statistical control for correlates that are often not available. However, estimates and inferences from reports of survey experiments in the academic literature can be misleading because of biases in certain published studies not reporting all outcome variables (Franco et al., 2015; Franco et al., 2016) and in certain survey experiments—particularly studies with null results—not entering the literature at all (Franco et al., 2014), with the bias disfavoring null results plausibly producing exaggerated estimates of racial discrimination. The present study addresses these potential publication biases by analyzing all relevant outcome variables from all available racial discrimination studies that met inclusion criteria in the archives of TESS, the Time-sharing Experiments for the Social Sciences program (cf. the research design of Franco et al., 2014). 1
Overview of the included studies
TESS selects survey experiment proposals through a peer review process, with data collection for selected proposals conducted by professional third parties with no apparent interest in the outcome of the study. Proposal authors receive exclusive access to the data for one year, after which the data are placed online. My analysis included the 17 survey experiment studies posted in the TESS archives prior to the end of 2015 that met four conditions: at least one experimental manipulation that involved a White target or prime and a Black target or prime but did not involve a well-known public figure; a sample with both White participants and Black participants, to permit comparison of relative levels of discrimination; at least one post-manipulation outcome variable that could be used to measure racial discrimination; and sufficient information to permit calculating estimates of racial discrimination. Brief descriptions of the included survey experiments are in the remainder of this section, with the supplemental material including additional information such as the questionnaire items and outcome variable scale Cronbach’s alphas (Cronbach, 1951).
Pager and Freese (2003)
This survey experiment manipulated characteristics including the race of a target named Michael. The outcome variable was how much government help Michael should be eligible to receive while looking for a job.
Oliver and Lee (2004)
This survey experiment assessed perceptions of overweightness and obesity for one female target and one male target manipulated to be White or Black. I combined the overweightness item and the obesity item for targets by sex into a scale, and then combined results for these scales using Borenstein et al. (2009) equations 8.2, 8.3, 8.4, 24.1, and 24.2.
Cottrell and Neuberg (2004)
This survey experiment included seven items measuring emotional responses to a target racial group and seven items measuring perceived threat of the racial group, with manipulations of the target race and the order in which survey sections were received. I combined all 14 items into a scale.
Benard (2005)
This survey experiment measured perceptions about a lawyer such as the lawyer’s competence, with the lawyer’s name—Kareem, Brad, Tamika, or Kristen—manipulated to signal sex and race. I combined the 11 items into a scale.
Pager (2006)
The first module of this survey experiment was similar to Pager and Freese (2003), but the outcome variables included support for job training assistance and support for cash assistance and had more response options. I combined these measures into a scale. The second and third modules were not used in the analysis.
Van Boven et al. (2006)
The first part of this survey experiment asked participants to indicate an emotional response to Hurricane Katrina, after receiving a prime to consider either sadness or anger. The second part of the survey experiment measured opinions about a target man in a photograph holding groceries, with the key manipulation being the man’s race. I created a scale from three items about the target such as the extent to which the participant agreed that the target should be criminally prosecuted for looting.
Ben-Porath and Shaker (2006)
This survey experiment displayed a photograph and a news article about hurricane Katrina: the manipulations were that the photograph contained either an individual or a group, and the person or persons in the photograph were either White or Black. I combined into a scale 14 items that asked participants to agree or disagree to items such as “The people who remained in New Orleans after the evacuation order acted irresponsibly” and the final item that asked participants to partition responsibility for the individual consequences of hurricane Katrina among the federal government in Washington, D.C., local authorities in New Orleans, and New Orleans residents.
McIlwain and Caliendo (2008, 2010)
These survey experiments showed participants ads for two candidates, one Black candidate who made no racial appeal, and the other candidate: White, with no racial message; White, with an anti-Black appeal; Black, with no racial message; or Black, with a racial authenticity appeal. I combined into a scale items about the candidates such as which candidate the participant would be most likely to vote for. The 2008 experiment was for only Black participants, and the 2010 experiment was for only White participants. The analysis was limited to conditions in which the candidate made no racial appeal because the anti-Black appeal and racial authenticity appeal were not equivalent.
Rattan et al. (2010)
This survey experiment used a text discussing a juvenile convicted of raping an elderly woman and sentenced to life in prison without parole; the manipulation was that the juvenile was described as Black or White. I created a scale of four items such as support for life sentences with no possibility of parole for juveniles convicted of serious but non-lethal crimes.
Davenport and McDermott (2011)
This survey experiment concerned opinions about a clash between police and protestors, with the manipulation involving the race of the protestors and the race of the police: mostly Black or mostly White. Reported analysis did not use the conditions with protestors and police mostly of the same race. I combined into a scale an item about whether the police or the protestors were most responsible for escalating the conflict and an item about whether police took the proper action in trying to stop the protestors.
Stephens (2011)
This survey experiment measured opinions about a political candidate, with manipulations for candidate race (a photograph of a Black or White candidate) and the appeal that the candidate made. I combined into a scale items for the likelihood of voting for the candidate and ratings of the candidate on characteristics such as intelligence and experience.
Pedulla (2011)
This survey experiment asked participants to review a resume for an assistant manager position at a large retail store, with manipulations that included applicant race signaled by name: Brad, Allison, Darnell, or Ebony. I combined into a scale participant responses to items about the applicant, such as recommended annual salary and ratings of whether the applicant responds well to supervision. Per Pedulla (2014: 84), salary recommendations over $80 000 were trimmed to $80 000.
Trawalter (2011)
This survey experiment measured perceptions of pain the participant would feel and the pain that the participant expected a target person to feel in instances such as disinfecting a cut. I combined these items into a scale. Manipulations included the race of the target displayed in a photograph. Pain ratings that the participant rated for themselves were used as a control in regressions predicting pain ratings for the target.
Denny (2012)
This survey experiment asked participants to evaluate a potential new employee, with manipulations that included the candidate’s race or ethnicity: Greg Baker, Jamal Washington, Victor Rodriguez, and Samuel Wong. I created a scale from items about the candidate, such as the number of times per month that the applicant would be expected to arrive late or leave early.
Denny (2013)
This survey experiment was similar to Denny (2012) but with female names added to the manipulations: Allison or Greg Baker, Keisha or Jamal Washington, Victoria or Victor Rodriguez, and Susan or Samuel Wong.
Powroznik (2014)
This survey experiment used a vignette about a target patient that included manipulations of the target’s race. Participants were asked items such as how responsible the patient was for his or her illness. These 12 items were combined into a scale.
Hopkins (2014)
This survey experiment provided participants information about seven pairs of candidates, with manipulations that included the candidate’s race and ethnicity. Participant candidate preference and rating of support for a candidate were used as the outcome variables, with Borenstein et al. (2009) equations 8.2, 8.3, 8.4, 24.1, and 24.2 used to combine results.
Research design
I conducted my own analysis of each study to standardize the analyses as much as possible. Non-dichotomous outcome variables were placed on a scale in which the standard deviation of the scale was 1 among participants of a given racial group, with positive values indicating more positive outcome variable responses in the group exposed to the Black target or prime. Non-dichotomous outcome variables were modeled with linear regressions, and dichotomous outcome variables were modeled with logit regressions, with logit estimates converted to a standardized scale using equations 7.1 and 7.2 in Borenstein et al. (2009). Control variables were used only for non-racial experimental manipulations such as the sex of a target. Reported results for the main analyses included only non-Hispanic White participants and non-Hispanic Black participants, excluded participants in control conditions and—except for Hopkins (2014)—from experimental conditions with a target that was not White or Black, and excluded participants who did not provide a substantive response to more than half of the items for outcome variable scales measured with multiple items. Data were not weighted, and participants who failed available attention checks were not removed from the analysis. 2 Reported p-values are for a two-tailed test. Each study had a national sample, except Hopkins (2014), for which the sample was limited to participants in California, Florida, New York, and Texas. The Illinois State University Institutional Review Board does not require IRB review of analysis of de-identified data because such analysis does not constitute human subjects research.
Results
Figure 1 reports results for White participants (left panel) and Black participants (right panel), with the bottom five lines indicating the pooled effect across studies and subsets of studies, based on a random-effects meta-analysis because the studies were not estimating the same effect (Borenstein et al., 2009: 98). 3 The point estimate, p-value, and 95% confidence interval for the pooled estimate among White participants (n=10 435) were 0.042, p=0.254, and [-0.030, 0.115], which is consistent with a small net discrimination favoring White targets or a small net discrimination favoring Black targets, although the lack of statistical significance cannot eliminate the possibility of zero net discrimination. The point estimate, p-value, and 95% confidence interval for the pooled estimate among Black participants (n=2781) were 0.313, p=0.002, and [0.117, 0.508], which is consistent with a small-to-moderate net discrimination favoring Black targets. 4 Because of the heterogeneity among the studies, pooled estimates might be biased by unrepresentativeness in subject matter or outcome variable content; however, the bottom section of Figure 1 indicates that similar patterns appeared across the three studies in which the target was a political candidate and across the four studies in which the target was a worker or job applicant. Funnel plots and tests for small study effects reported in the supplemental material did not indicate evidence of publication bias.

Results for racial discrimination in TESS survey experiment studies.
Black dots in Figure 1 indicate combinations of studies and participant samples that have been reported on as a primary finding in a journal article or academic book, prior to 1 September 2017, based on the information known to the author of the present study. The black dots indicating publication do not necessarily indicate a full reporting of all outcome variables and do not include dissertations, conference papers, submitted-but-unpublished manuscripts, or reports on TESS studies that appeared in secondary sources that used the studies for illustrative purposes, such as the Mutz (2011) book on population-based survey experiments that described results from several TESS studies or the Zigerell (2017) article that used multiple TESS studies to discuss selective reporting. The bottom two meta-analysis estimates in Figure 1 are for the set of published study/sample combinations and for the set of unpublished study/sample combinations. 5
Data for Hispanic participants were included in 12 of the 17 studies; the point estimate, p-value, and 95% confidence interval for the pooled estimate among Hispanic participants (n=1261) were 0.007, p=0.881, and [-0.084, 0.097], which is consistent with a small net discrimination favoring White targets or a small net discrimination favoring Black targets, although the lack of statistical significance cannot eliminate the possibility of zero net discrimination. For comparison, for the 12 studies in which Hispanic participants were available, the respective point estimate, p-value, and 95% confidence interval for the pooled estimate were 0.047, p=0.135, and [-0.015, 0.109] among White participants and 0.159, p=0.014, and [0.032, 0.285] among Black participants.
Limitations of the study
The survey experiment methodology of the TESS studies has the advantage of high internal validity because the randomization of participants to treatments eliminates all alternate explanations for a difference in post-treatment measurements between groups, except for the treatment and random assignment error. However, disadvantages of the survey experiment method include lower external validity relative to field studies or correlational studies of observational data, given that survey experiment participants responding to items over the phone or online might reveal more or less discrimination than unobtrusive observation of participants in their day-to-day life. Moreover, survey experiments might understate discrimination if a large percentage of participants fail to pay attention to the treatments, and survey experiments might overstate discrimination given that participants in a survey experiment receive only limited information about a target so that the target’s race might play a larger role in evaluations than if the participant were in a day-to-day situation in which the participant received more individuating information about the target.
Discussion
Social science has long detected evidence of pro-ingroup discrimination (see Tajfel et al., 1971), with the Balliet et al. (2014) meta-analysis reporting a standardized mean difference in ingroup cooperation compared to outgroup cooperation of 0.30 for natural groups and of 0.31 for groups in the United States, excluding studies with unilateral knowledge (p. 1571). Such discrimination in favor of the ingroup maps onto racial interactions, with members of a particular racial group expected to favor their own racial group over other racial groups. This pattern appeared in a literature review reported on in Axt et al. (2016: 2), with four leading psychology journals publishing 17 articles in 2013 and 2014 in which Whites favored Whites over Blacks but only two articles in which Whites favored Blacks over Whites.
However, pooled analyses across 17 survey experiment studies from the TESS archives produced a small-to-null estimate of anti-Black discrimination among White participants, which is consistent with the Saucier et al. (2005) meta-analysis that estimated racial discrimination in Whites’ helping behavior to have an effect size of a 0.03 Cohen’s d (p=0.103) and with the Mitchell et al. (2005: 627–628) meta-analysis estimates of a small same-race bias for verdict decisions and for sentencing decisions among White mock jurors that had respective Cohen’s ds of 0.028 and 0.118. Even if the lack of a detected pooled anti-Black discrimination among Whites is attributable to only social desirability, the detected pooled discrimination in favor of Black targets among Black participants remains an important finding for understanding racial attitudes in the United States, with the effect size estimate for Black participants of a moderate pro-ingroup discrimination [0.117, 0.508] consistent with the Mitchell et al. (2005: 627–628) meta-analysis estimates of a moderate same-race bias for verdict decisions and sentencing decisions among Black mock jurors (respective Cohen’s ds of 0.428 and 0.394). It is worth noting, though, that the research design of these TESS studies makes it difficult or impossible to differentiate ingroup favoring from outgroup disfavoring.
The methodological implication of this study regards results for many of the included survey experiment studies not having been reported on yet in a journal or academic book. Estimating and understanding racial discrimination are important goals for social science, but accomplishing these goals requires publishing a representative set of high-quality studies such as the TESS survey experiment studies. To help ensure that the social science that is available to researchers conducting a meta-analysis is better representative of the social science that has been conducted, researchers should consider publicly preregistering plans to conduct a study and preregistering the planned research design for the study (see Monogan 2015). Selective reporting of results, especially regarding politically-sensitive topics such as racial discrimination, undercuts the ability of social science to provide true information about the world to the public and policymakers.
Footnotes
Acknowledgements
An earlier version of this manuscript was presented at the 2015 American Political Science Conference in San Francisco. Thanks to Jeronimo Cortina, Eric S. Dickson, Gábor Simonovits, and the anonymous reviewers for comments on an earlier version of the manuscript; Stephen Benard, Daniel Hopkins, Eric Oliver, Devah Pager, David S. Pedulla, Aneeta Rattan, Sophie Trawalter, and Leaf Van Boven for information regarding their TESS studies or manuscripts describing their respective TESS studies; and Charlton McIlwain and Stephen M. Caliendo for information, data, and documentation for their TESS studies.
Correction (June 2025):
Declaration of conflicting interest
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: TESS data used in this manuscript were collected under three National Science Foundation grants: 0094964 (inception of TESS to 31 August 2008, Diana C. Mutz and Arthur Lupia, Principal Investigators) and 0818839 (1 September 2008 to 31 August 2012, Jeremy Freese and Penny Visser, Principal Investigators, and after 1 September 2012, Jeremy Freese and Jeremy Druckman, Principal Investigators).
Supplementary material
The supplementary files are available at http://journals.sagepub.com/doi/suppl/10.1177/2053168017753862. The replication files are available at: ![]()
Notes
Carnegie Corporation of New York Grant
This publication was made possible (in part) by a grant from Carnegie Corporation of New York. The statements made and views expressed are solely the responsibility of the author.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
