Sage Journals: Discover world-class research

Abstract

Eskine, Kacinik, and Prinz’s (2011) influential experiment demonstrated that gustatory disgust triggers a heightened sense of moral wrongness. We report a large-scale multisite direct replication of this study conducted by labs in the Collaborative Replications and Education Project. Subjects in each sample were randomly assigned to one of three beverage conditions: bitter (disgusting), control (neutral), or sweet. Then, subjects made a series of judgments about the moral wrongness of the behavior depicted in six vignettes. In the original study (N = 57), drinking the bitter beverage led to higher ratings of moral wrongness than did drinking the control or sweet beverage; a contrast between the bitter condition and the other two conditions was significant among conservative (n = 19) but not liberal (n = 25) subjects. In the current project, random-effects meta-analyses across all subjects (N = 1,137, k = 11 studies), conservative subjects (n = 142, k = 5), and liberal subjects (n = 635, k = 9) revealed standardized overall effect sizes across replications that were smaller than reported in the original study. Some were in the opposite of the predicted direction; all had 95% confidence intervals containing zero, and all were smaller than the effect size the original authors could have meaningfully detected. Results of linear mixed-effects regressions revealed that drinking the bitter beverage led to higher ratings of moral wrongness than did drinking the control beverage but not the sweet beverage. Bayes factor tests revealed greater relative support for the null than for the replication hypothesis. The overall pattern provides little to no support for the theory that physical disgust via taste perception harshens judgments of moral wrongness.

Keywords

disgust moral judgment open data open materials preregistered

This article presents results from a multilab replication of an experiment suggesting that gustatory disgust (via taste perception) can make moral judgments harsher (Eskine, Kacinik, & Prinz, 2011). Previous studies had revealed a similar link between sensory disgust induced by other means, such as olfactory stimulation (Schnall, Haidt, Clore, & Jordan, 2008), and judgments of moral wrongness. Moreover, since Eskine and colleagues’ (2011) study, empirical and conceptual work has grown around the idea that inducing perceptions of gustatory disgust is an especially effective way of increasing the severity with which people make moral judgments (Hellmann, Thoben, & Echterhoff, 2013; Schnall, Haidt, Clore, & Jordan, 2015). However, a recent meta-analysis failed to replicate the association between disgust and moral judgment (Landy & Goodwin, 2015a). Moreover, to our knowledge, there have been no attempts to replicate the relation between physical disgust via taste perception and moral wrongness. The purpose of this project was therefore to precisely estimate the effect of gustatory disgust on moral judgment by replicating the methods used by Eskine et al.

In the 1990s, the body of research that would eventually coalesce under the banner of embodied cognition began taking shape (Wilson, 2002). This research is based on the proposal that real-world thinking and problem solving are deeply situated within sensoriperceptual processes and are not merely cognitive processes (Anderson, 2003; Ionescu & Vasc, 2014). For example, it has been theorized that although disgust likely evolved to steer organisms away from pathogens, it also evolved to guide humans in their decision making in the domains of mate selection and morality (Tybur, Lieberman, Kurzban, & DeScioli, 2013). Other researchers agree that the cognitive computation systems involved in pathogenic (core) disgust likely overlap with the computational systems involved in feelings of moral disgust (Curtis & Biran, 2001; Inbar & Pizarro, 2014). Indeed, the insula plays a role in feelings of both pathogenic and moral disgust (Vicario, Rafal, Martino, & Avenanti, 2017), and pathogen and moral disgust both cause activation in the levator labii muscle region, specifically, the raising of the upper lip and wrinkling of the nose (Cannon, Schnall, & White, 2011; Chapman, 2018). Moreover, some researchers argue that the two constructs are so closely related that inducing incidental disgust (e.g., via the smell of garbage or the taste of a bitter drink) can amplify moral disgust and so harshen moral judgment (Inbar & Pizarro, 2014).

The idea that similar parts of the brain are activated by physical and moral disgust has prompted other researchers to examine how inducing a sensoriperceptual experience of disgust can affect moral judgment more specifically (Cameron, Payne, & Doris, 2013; Case, Oaten, & Stevenson, 2012; Gill & Nichols, 2008; Greene, Sommerville, Nystrom, Darley, & Cohen, 2001; Landy & Goodwin, 2015a). Several studies have suggested that olfactory (e.g., Inbar, Pizarro, & Bloom, 2012) and gustatory (e.g., Chapman, 2018; Hellmann et al., 2013) input, for example, can affect moral judgment. In particular, these studies found that perceptions of sensory disgust, induced by several means, led to harsher moral judgment (Cameron et al., 2013; Eskine et al., 2011; Schnall, Haidt, et al., 2008; Wheatley & Haidt, 2005). However, the estimated effect of disgust on moral judgment was negligible in Landy and Goodwin’s (2015a) meta-analytic review (d = 0.11 for the studies included in the meta-analysis, d = −0.01 accounting for publication bias). These effect sizes were calculated across multiple means of inducing disgust. There is no available meta-analysis of studies in which gustatory disgust specifically was induced, and the effect sizes Eskine and colleagues (2011) found were much larger than the effect sizes found in other studies investigating similar effects. This difference—combined with the small sample size of the study—made it an important candidate for a replication.

The Original Study

Eskine and colleagues (2011) found that subjects who drank a bitter beverage before reading six moral vignettes judged the characters’ actions more harshly than did subjects who drank a sweet beverage or water. Eskine et al. selected a bitter drink to induce disgust because they hypothesized that a strong bitter taste would be reliably experienced as disgusting. A manipulation check confirmed that their sample experienced the bitter drink (Swedish Bitters) as disgusting. In addition to the main effect of gustatory disgust on moral judgment, the results indicated that politically conservative subjects, compared with liberal subjects, were more sensitive to the effect, which suggested that political conservatives are more likely to be influenced by incidental sensoriperceptual cues (e.g., gustatory disgust) when making moral judgments.

The original research (N = 57, each randomly assigned to one of three conditions; N = 54 after exclusions) yielded large, statistically significant effects (d = 1.09, p = .001, for the difference between the bitter and neutral control conditions and d = 1.22, p = .003, for the difference between the bitter and sweet conditions). Although the authors did not report the F test for the interaction between beverage condition and political orientation, they conducted contrast analyses within politically conservative and liberal subgroups. In the conservative subgroup, the contrast between the bitter condition and the other conditions was large and statistically significant, t(16) = 4.473, p < .001, d = 2.21. The same contrast in the liberal subgroup was moderately large but not statistically significant, t(22) = 1.703, p = .103, d = 0.74.¹ It is unclear whether the difference between these two contrasts is significant. These findings have informed subsequent theoretical and empirical work in the psychology of morality and politics (e.g., Chapman, 2018; Chapman & Anderson, 2012; Haidt, 2012; Vicario et al., 2017), yet no attempts have been made to more precisely estimate the size of the effect. Given the wide-reaching impact of this study, it is important to replicate it to gauge the generalizability and reliability of the effect.

The Current Study

In this article, we report a multilab effort to replicate the methods used by Eskine et al. (2011). We tested (a) whether a bitter beverage indeed prompts harsher moral judgments than both a sweet beverage and water and (b) whether these effects are stronger in politically conservative populations than in politically liberal ones. Our overarching goal was to provide high-powered estimates of the effect observed by Eskine et al. using a crowdsourcing approach (see Hagger et al., 2016; Moshontz et al., 2018). That is, instead of evaluating the “replication success” of the original study solely on the basis of the results of a single replication, we meta-analyzed many distinct replication attempts using frequentist and Bayesian statistical approaches.

In sum, our goal was both to estimate the main effect of beverage type on moral judgments and to test political orientation as a moderator of this effect. We examined two contrasts for the effect of beverage type: bitter versus control and bitter versus sweet. With respect to moderation by political orientation, we focused primarily on the comparison between conservative and liberal subjects.

Disclosures

Preregistration

Before data collection, each lab preregistered their materials, protocol, and analysis plans on the Open Science Framework (OSF). In addition, we registered our analysis plan for this manuscript on OSF prior to conducting the analyses reported here (see our project at https://osf.io/5ygsp). Our preregistration is a partial preregistration in that each author knew the results of at least one particular replication before the meta-analysis plan was preregistered.

Data, materials, and online resources

After labs completed their studies, they uploaded their raw data, their analyses (including syntax), and a clear explanation of their results to their own pages at OSF. The data and all other relevant documentation from each of the labs included in this meta-analysis can be found at https://osf.io/kuyn8/wiki/home/. We uploaded the data and analysis code for this report to our project page at https://osf.io/5ygsp. An appendix with additional information is also available at OSF, at https://osf.io/a4cs3/.

Reporting

This article is based on analysis of existing data rather than new data collection. We report how we determined the sample size for our analyses and all data exclusions, manipulations, and measures.

Ethical approval

Each individual site secured approval from an institutional review board or similar committee prior to data collection and carried out the study in accordance with the provisions of the World Medical Association Declaration of Helsinki.

Method

The replication studies we report were conducted as part of the Collaborative Replications and Education Project (CREP), an initiative in which students perform replications of psychological research under faculty supervision (see Box 1 for further information). All subjects provided informed consent prior to participating. The 11 replication studies took place at five universities in the United States and three universities in Europe (United Kingdom, Germany, and The Netherlands). Table A1 in the appendix at OSF lists the site and mentor for each study.

Box 1.

The Collaborative Replications and Education Project

Each replication in this study was conducted as a part of the Collaborative Replications and Education Project (CREP; https://osf.io/wfc6u/), an initiative in which undergraduate methods and capstone students perform replications of psychological research under faculty supervision. As are other pedagogical replication projects (e.g., Hawkins et al., 2018), the CREP is intended to train students on best practices while contributing high-quality replication data to the field. There are a number of quality checks built into the CREP to ensure high-fidelity replications; for example, procedures and materials are reviewed by faculty experts before students are approved to begin data collection.The CREP process incorporates best replication and open-science practices recommended by scholars (Brandt et al., 2014) and open-science organizations (Center for Open Science). Students joining the CREP first request to participate in one of the available replication projects (selected by the CREP team for impact, feasibility, and potential student interest). Following approval, students build a study page on the Open Science Framework, uploading their materials, approval of their institutional review board, and a video of their procedure to demonstrate their fidelity to the original study’s procedures. Next, the CREP team (consisting of two or three Ph.D.-level researchers trained on a particular project) reviews the studies for that project.Following a successful review, students preregister their studies and collect data from a sample that meets the required minimum size (usually equivalent to the original study’s sample size). Finally, students post a summary of their results on their study page, along with their raw data and codebook. At this point, the CREP team again reviews each study, deciding whether to award a CREP certificate of completion, which requires that the minimum sample size was attained and procedures were carried out as planned. Additionally, students are encouraged to present their findings at conferences and to be involved with manuscripts on the meta-analytic findings of their specific CREP projects.

Design

The design was a 3 × 3 between-subjects factorial design. The first factor was the experimental manipulation, drinking a beverage intended to produce a neutral (control), sweet, or bitter taste sensation. The beverages were water, fruit punch, and Swedish Bitters (an herbal digestive aid). The second factor was political orientation. Subjects were categorized as conservative, liberal, or “other.”²

Target sample size

The minimum target for the overall sample for meta-analysis was 2.5 times the original sample of 57, or 143 subjects, which would be sufficient for a “small telescopes” analysis (Simonsohn, 2015). The individual labs were permitted to continue to submit qualified replication samples from January 2015 until December 2017. The CREP recommended that each lab collect data from at least 57 subjects (to match the original sample), but not all labs hit this target. We included labs that did not hit the target sample size in our analysis because we were primarily concerned with having adequate power to accurately estimate the overall effect size in the weighted meta-analysis across all samples (and we were not concerned about the power of an individual sample).

Deviations from the original study

The CREP team contacted the original authors, who provided the original moral-dilemma vignettes, manipulation check, and informed-consent forms. We could not obtain materials for the original imageability distractor task or cover story, so we created our own on the basis of the description in Eskine et al.’s (2011) report. Our distractor task is available at OSF (https://osf.io/ju7nq/). Our cover story consisted of the following paragraph:

In this study you will be asked to read several vignettes and make judgments about the characters in them. Your job will be to judge the actions of the characters. During this task, you will be asked to drink a beverage. The purpose of this study is to determine whether motor movements involved with drinking influence your judgments while reading about others. In order to successfully attain this, please drink each dose in a single swift motion, as if you were drinking a shot.

Two problems arose with using an international sample to replicate the original study: First, the terms liberal and conservative as used in the United States do not necessarily translate to major political axes within other countries. The European labs adapted the political-orientation question to capture similar concepts in their respective countries. The two authors responsible for coding political orientation in the pooled sample communicated with faculty mentors overseeing the replications at the non-American labs to better understand what subjects meant in responding to this question. For example, in the British sample, “right wing” and “center-right” were coded as conservative, “center-left” and “left wing” were coded as liberal, and “center” and “none” were coded as “other.”

Second, the drinks differed slightly from country to country. For example, Minute Maid Berry Punch and Swedish Bitters, the drinks used in the original study, are not widely available in some locations outside the United States. Where these beverages were not available in the original form, researchers were told to substitute a sweet drink with similar sugar content and a bitter but inert drink, respectively. Individual labs had discretion to substitute any brand of Swedish Bitters given that the brand was not specified in the original report. We used the manipulation check to ensure that subjects found the sweet drink sweet but not neutral or disgusting and the bitter drink disgusting but not sweet or neutral.

A third deviation from the original study was that in three of the studies (7, 9, and 10), the moral vignettes were presented via Qualtrics surveys rather than on paper. Subjects in these three studies indicated their moral judgments by using a slider scale that yielded a number rather than by making a mark on a line on paper.

A fourth (potential) deviation from the original study was that we computed a moral-wrongness composite score for all subjects who rated at least three of the six vignettes.³ We preregistered this approach to maximize statistical power and minimize selection biases. There were very few exclusions based on this preregistered criterion; only 3 of the subjects tested rated fewer than three vignettes. We provide further details in the Results section (see our summary of preliminary analyses of the moral-wrongness composite).

Protocol fidelity

Labs qualified to participate in this CREP project by submitting an application, preparing an OSF page with all materials and procedures, and submitting a video of a mock experimental procedure. Volunteer psychology professors working with the CREP reviewed this material for fidelity to the CREP protocol. The protocol required the use of exact stimuli, such as the materials for the moral-judgment task, though use of approximate stimuli was allowed where necessary (e.g., another drink could be substituted for Minute Maid Berry Punch where it was not easily available). The protocol also required appropriate data reporting. The CREP instructions and materials are available at OSF (https://osf.io/4hkjv/). Labs were required to revise their procedures in accordance with CREP reviewers’ feedback on both their written procedures and their video to increase precise compliance. Each lab included at least one student researcher and one supervising faculty mentor. Faculty mentors supervised students’ work and, in some cases, helped run subjects and analyze data. Labs that successfully complied with the CREP protocol, passed review, and submitted data for at least the minimum sample size (57) were eligible to receive a CREP certificate and a monetary reward (monetary rewards ended in July 2017, as funding ended).

Subjects

We analyzed data from a total of 1,137 subjects in 11 studies. There were 142 conservative subjects and 635 liberal subjects. The per-study N ranged from 24 to 439 (median = 65). These numbers do not include 4 subjects who did not consent to use of their data, 3 subjects who were under the age of 18 years, and 3 subjects who rated fewer than three vignettes. In addition, beverage condition was unknown for 2 subjects in one study.

In our preregistration, we stated that if linear mixed-effects analyses indicated that the effect of beverage condition or the interaction of beverage condition with political orientation varied with subjects’ knowledge of the hypothesis, we would exclude subjects who demonstrated knowledge of the hypothesis. However, in part because we could not obtain the open-ended responses necessary to code this variable for 3 of the 11 studies, we retained all available subjects in our analyses.

With regard to gender, 671 (59%) subjects identified as female, 392 (34%) as male, and 6 (1%) as nonbinary; there was no gender information available for 68 (6%) subjects.

Materials

All materials, including two versions of the moral-vignette packet (identical but with different orders), were provided on the OSF website.

As in the original study, the labs used six vignettes developed by Wheatley and Haidt (2005). The six vignettes focus on the following main characters and situations: Bob, who has a sexual relationship with his second cousin; Frank, who cooks and eats his dead dog; George, a lawyer who seeks clients at the hospital emergency room; Arnold, a politician who condemns corruption but accepts bribes himself; Robert, who shoplifts clothing; and Tim, who takes books and other items out of the library without checking them out.

Moral judgments were obtained by asking subjects to read each vignette and then answer the question “How morally wrong is this?” by making a mark on a line or selecting a number on a scale; in both cases, not at all wrong, moderately wrong, and extremely wrong were the anchors. Ratings were scaled to a range of 0 through 100, identical to the scale used by Eskine et al. (2011).

The imageability distractor task and cover story were intended to disguise the purpose of the study. The distractor task was based on the description in the original report (i.e., “[subjects] . . . rated sentences for their imageability”; Eskine et al., 2011, p. 296).

Subjects rated how much they enjoyed their beverage, and how sweet, bitter, neutral, or disgusting they found it, using a 7-point scale ranging from 1 (not at all), to 4 (neutral), to 7 (very much). These items served as a manipulation check to ensure that each type of beverage elicited the intended response in the subjects.

To run the experiment, labs needed drinking water; fruit punch; a brand of Swedish Bitters; cups; a private space where subjects could fill out the experimental forms; and a booklet containing the consent forms, the moral vignettes, the survey used for the manipulation check, the demographic (including political-orientation) questions, the distractor task, and the procedures. The specific questions used to collect demographic information differed somewhat among the labs. For example, some labs used an open-response box to collect political-orientation data, whereas others gave subjects specific options to choose from.

Procedure

Prior to the arrival of each subject, experimenters consulted a preprinted list of random numbers or an online tool used for random assignment to determine the appropriate drink preparation. Subjects were run individually in physically separate spaces.

After subjects provided informed consent, experimenters explained that the study was an examination of the influence of motor interference on cognitive processing. The ingredients for the appropriate beverage condition were listed on the informed-consent form so that subjects would not be exposed to allergens unwittingly. The experimenters called attention to the ingredient list in case subjects had not read the consent form closely. (Subjects at the Tufts University site verbally confirmed that they were not allergic to any of the ingredients during the consent process.) The experimenters provided the beverage and told subjects to drink it in a swift motion, “as if drinking a shot.” Subjects then completed the first half of the moral-judgment task. The experimenters administered a second serving of the beverage and then instructed subjects to complete the second half of the moral-judgment task.

Subjects then completed the distractor task, beverage ratings, and demographic survey. Finally, they responded to the prompt “What do you think this study is about? Please provide a few details to explain your answer.” Upon completion of this item, subjects were debriefed verbally and in writing. The complete protocol, including a script for researchers, is available on the OSF page for this study.

Two of the authors (E. Ghelfi and M. A. Fischer), who were blind to subjects’ assignment to condition, coded political-orientation responses using three categories: “conservative,” “liberal,” and “other.” The “other” category included subjects who declined to provide information. The two coders exhibited excellent agreement, Cohen’s κ = 1.00, 95% confidence interval (CI) = [.99, 1] (weighted κ = .99). We used the first coder’s responses in subsequent analyses. In total, 648 subjects (57%) were coded as liberal, 162 (14%) as conservative, and 327 (29%) as other.

The same two authors coded subjects’ explanations of what they thought the study was about using three categories capturing levels of knowledge about the hypothesis. The three categories were “naive” (i.e., no insight into the hypothesis; e.g., “how the mechanical act of drinking would affect an individual’s moral judgment”), “partially suspicious” (i.e., insight into part of the hypothesis; e.g., “This study may be about how much we liked or disliked the beverage and whether or not that influenced our answers”), and “fully suspicious” (i.e., clear and accurate description of the hypothesis; e.g., “This study is about how arm movement/taste can determine how you feel/think about something. I had a bad taste in my mouth so I saw everyone as bad in the short stories.”). The two coders exhibited very good agreement, Cohen’s κ = .76, 95% CI = [.71, .80] (weighted κ = .83). When the two coders disagreed, we used the code representing greater knowledge of the hypothesis. We could not obtain subjects’ responses to this question for Studies 1, 2, and 3 and thus could not code level of knowledge of the hypothesis for any of the subjects in these three studies.

Note that because the knowledge question was presented after the beverage-rating manipulation check, it is possible that thinking about the beverages during the manipulation check triggered subjects’ subsequent guesses about the purpose of the study. As a result, some subjects classified as partially or fully suspicious may have actually been naive when they made their moral judgments but “clued in” to the hypothesis by the time they were asked to guess the purpose of the study. We can be fairly certain, however, that subjects who were classified as naive even with the benefit of exposure to the beverage-rating task were in fact naive when they made their moral judgments.

Data analysis and inference criteria

We present the rationale for and details of all analyses where applicable in the Results section. Broadly speaking, in preliminary analyses, we computed descriptive statistics for moral-wrongness and beverage ratings, level of knowledge, and internal-consistency reliability for the moral-wrongness composite. In confirmatory analyses, we conducted random-effects meta-analyses for two effects of interest: bitter versus neutral condition and bitter versus sweet condition. We also conducted one-sided tests to determine whether observed effects were either smaller than the original study could have detected with 33% power or equivalent to zero. To complement these approaches, we conducted linear mixed-effects regression (LMER) modeling of individual subjects’ judgments of moral wrongness. Finally, for each replication study, we conducted a series of four Bayes factors (BF) tests described by Verhagen and Wagenmakers (2014).

For null-hypothesis significance testing, we set an alpha criterion of .05 (two-tailed). For BF tests, we considered BFs greater than 3 to provide nonanecdotal evidence for the replication hypothesis and BFs less than 1/3 to be nonanecdotal evidence against the replication hypothesis.

We wrote this manuscript as an R Markdown document in RStudio 1.2.1335 (RStudio Team, 2015) and analyzed our data using R (Version 3.5.2; R Core Team, 2017) and the following R packages: BayesFactor (Version 0.9.12.4.2; Morey & Rouder, 2015), boot (Version 1.3.20; Davison & Hinkley, 1997), BSDA (Version 1.2.0; Arnholt & Evans, 2017), coda (Version 0.19.2; Plummer, Best, Cowles, & Vines, 2006), compute.es (Version 0.2.4; Re, 2013), dplyr (Version 0.8.1; Wickham, Francois, Henry, & Mueller, 2017), effsize (Version 0.7.4; Torchiano, 2017), emmeans (Version 1.3.4; Lenth, 2018), ggplot2 (Version 3.1.1; Wickham, 2009), gridExtra (Version 2.3; Auguie, 2017), lattice (Version 0.20.35; Sarkar, 2008), lme4 (Version 1.1.21; Bates, Mächler, Bolker, & Walker, 2015), lmerTest (Version 3.1.0; Kuznetsova, Brockhoff, & Christensen, 2017), MASS (Version 7.3.49; Venables & Ripley, 2002), Matrix (Version 1.2.14; Bates & Maechler, 2017), MBESS (Version 4.5.1; Kelley, 2017), MCMCpack (Version 1.4.4; Martin, Quinn, & Park, 2011), metafor (Version 2.1.0; Viechtbauer, 2010), pacman (Version 0.5.1; Rinker & Kurkiewicz, 2017), papaja (Version 0.1.0.9842; Aust & Barth, 2017), polspline (Version 1.1.14; Kooperberg, 2015), psych (Version 1.8.12; Revelle, 2017), pwr (Version 1.2.2; Champely, 2017), R2WinBUGS (Version 2.1.21; Sturtz, Ligges, & Gelman, 2005), SDMTools (Version 1.1.221.1; VanDerWal, Falconi, Januchowski, Shoo, & Storlie, 2014), sjlabelled (Version 1.0.17; Lüdecke, 2018a), sjPlot (Version 2.6.3; Lüdecke, 2018b), sjstats (Version 0.17.4; Lüdecke, 2018c), stargazer (Version 5.2.2; Hlavac, 2018), stringr (Version 1.4.0; Wickham, 2019), tidyr (Version 0.8.3; Wickham & Henry, 2017), TOSTER (Version 0.3.4; Lakens, 2017), viridis (Version 0.5.1; Garnier, 2018a, 2018b), and viridisLite (Version 0.3.0; Garnier, 2018b).

When analyzing the data, we discovered that some studies contributed no or very few observations to one or more cells of the study design. We made the post hoc decision to include in random-effects meta-analyses only those studies for which there were at least 2 subjects in each cell, so that we could compute both a mean and a standard deviation. This left five studies for comparisons between the beverage conditions among conservative subjects and nine studies for comparisons among liberal subjects.⁴ We included all available observations in our LMER models.

Results

Preliminary analyses

Moral-wrongness composite

Because we excluded the few subjects who rated fewer than three of the six vignettes, and because no subjects rated exactly three vignettes, all subjects included in analyses provided judgments for at least four vignettes; 99.12% had a value for five (7.04%) or six (92.08%) vignettes.

Table 1 shows descriptive statistics for the moral-wrongness composite in each beverage condition for each replication study. Results for all subjects and the conservative and liberal subgroups are presented separately. We provide descriptive statistics across studies in the Confirmatory Analyses subsection of the Results section.⁵

Table 1.

Descriptive Statistics and Internal-Consistency Reliability for the Moral-Wrongness Composite in Each Replication Study

Study	Number of subjects				Mean			Internal consistency
Study	Bitter condition	Control condition	Sweet condition	Total	Bitter condition	Control condition	Sweet condition	α	ω
All subjects
1	20	19	20	59	75.28 (10.92)	65.37 (17.06)	67.94 (15.64)	.52 [.30, .69]	.64 [0, .74]
2	22	22	21	65	67.08 (11.12)	67.52 (12.49)	69.80 (8.59)	.28 [0, .53]	.37 [0, .59]
3	19	19	18	56	62.26 (17.09)	64.25 (14.59)	66.96 (13.82)	.53 [.27, .70]	.46 [.09, .61]
4	40	24	36	100	68.99 (11.93)	70.70 (15.70)	71.31 (12.32)	.50 [.30, .64]	.38 [.02, .48]
5	17	22	20	59	76.53 (13.97)	76.65 (11.06)	73.43 (10.74)	.26 [0, .51]	.44 [.08, 1]
6	9	9	6	24	78.22 (11.96)	64.05 (20.92)	60.68 (22.09)	.69 [.31, .87]	.70 [.20, .85]
7	24	29	28	81	67.03 (14.27)	65.94 (12.96)	69.05 (15.24)	.45 [.19, .62]	.40 [.21, .53]
8	23	22	23	68	70.20 (11.27)	72.55 (12.63)	70.80 (15.90)	.53 [.33, .67]	.55 [.19, .71]
9	51	55	54	160	70.20 (12.36)	71.7 (14.36)	69.77 (12.49)	.57 [.44, .71]	.58 [.43, .70]
10	147	142	150	439	71.79 (14.65)	68.31 (13.78)	73.69 (14.24)	.53 [.45, .59]	.54 [.47, .60]
11	9	8	9	26	70.20 (17.26)	64.76 (8.88)	70.59 (19.38)	.62 [.28, .76]	.58 [.01, .71]
Conservatives
1	2	4	2	8	78.58 (2.57)	63.42 (8.64)	93.18 (9.64)	.70 [0, .89]	.58 [.44, .64]
2	16	13	17	46	66.14 (11.48)	61.9 (10.31)	71.05 (7.90)	.19 [0, .52]	.37 [0, 1]
4	8	6	9	23	63.97 (6.27)	72.74 (6.17)	74.65 (9.82)	—	.34 [.01, 1]
5	3	7	2	12	78.1 (17.51)	75.73 (13.99)	82.8 (0.76)	.48 [0, .72]	—
10	17	16	20	53	76.67 (12.78)	70.25 (10.81)	72.23 (13.9)	.40 [.08, .63]	.47 [.19, .63]
Liberals
1	7	5	9	21	75.32 (12.20)	50.88 (10.46)	65.28 (13.94)	.64 [.26, .84]	.75 [0, .85]
3	12	7	7	26	58.72 (14.30)	55.03 (13.64)	65.90 (16.01)	.48 [.03, .74]	—
4	21	11	13	45	70.38 (13.59)	69.05 (16.07)	70.39 (15.49)	.62 [.42, .77]	.67 [.42, 1]
5	7	8	7	22	71.74 (12.14)	74.48 (12.41)	76.29 (5.57)	—	.11 [0, 1]
6	3	3	2	8	78.42 (17.77)	73.95 (15.98)	61.36 (1.40)	.54 [0, .82]	—
7	20	20	23	63	65.89 (15.08)	65.80 (12.92)	68.19 (14.72)	.42 [.12, .64]	.41 [.13, .57]
8	10	10	8	28	65.53 (6.15)	78.80 (13.42)	70.04 (10.58)	.40 [0, .68]	—
9	30	31	29	90	69.91 (12.82)	72.33 (12.72)	67.25 (12.30)	.57 [.40, .70]	.59 [.37, .74]
10	108	111	113	332	70.43 (14.99)	68.3 (13.74)	73.72 (14.55)	.54 [.46, .61]	.55 [.45, .62]

Note: Values inside parentheses are standard deviations. Internal consistency was calculated across beverage conditions, and 95% confidence intervals (in brackets) were determined via 2,000 bias-corrected and accelerated bootstrap replications. Results for α are not reported when negative correlations between some items and the total score led to a negative value, and results for ω are not reported when the model failed to converge.

Table 1 also shows the internal-consistency reliability of the moral-wrongness composite across beverage conditions in each replication study, again separately for all subjects and the conservative and liberal subgroups. Cronbach’s alpha calculated across all political orientations within each study ranged from .26 to .69 (median = .53), and omega ranged from .37 to .70 (median = .54). Internal consistency therefore ranged from poor to acceptable, in most cases falling below .70. Across subjects in all 11 studies, Cronbach’s alpha was .50, 95% CI = [.45, .54], and omega was .49, 95% CI = [.43, .54].

Beverage ratings

As a manipulation check, we assessed the extent to which the three beverages had the intended effect on subjective ratings (bitter, disgusting, neutral, and sweet). For each study, we computed descriptive statistics and linear regression models that compared ratings in the bitter condition with ratings in each of the other two conditions across all subjects and within the conservative and liberal subgroups (see Table A3 in the appendix). In addition, to summarize across studies, we assessed the fixed effects of beverage type and political orientation on each rating in four LMER models with a random intercept for studies (see Table A4 in the appendix). The beverage contrasts were consistently significant in the expected directions for all three political orientations.

The estimated marginal means from the LMER models indicated that subjects in the bitter condition perceived their beverage to be quite bitter, M = 6.18, 95% CI = [5.98, 6.38], and disgusting, M = 5.84, 95% CI = [5.63, 6.05], and not very sweet, M = 1.44, 95% CI = [1.27, 1.61], or neutral, M = 1.48, 95% CI = [1.27, 1.68] (7-point scale). Subjects in the neutral control condition perceived their beverage to be quite neutral, M = 5.65, 95% CI = [5.45, 5.85], and not very bitter, M = 1.67, 95% CI = [1.48, 1.87]; disgusting, M = 1.36, 95% CI = [1.15, 1.57]; or sweet, M = 1.74, 95% CI = [1.57, 1.91]. Subjects in the sweet condition perceived their beverage to be quite sweet, M = 5.55, 95% CI = [5.39, 5.71], and not very bitter, M = 1.98, 95% CI = [1.79, 2.17]; disgusting, M = 1.95, 95% CI = [1.74, 2.15]; or neutral, M = 2.28, 95% CI = [2.08, 2.47].

Level of knowledge of the hypothesis

Of the 933 subjects for whom we were able to code knowledge of the hypothesis, 543 (58.20%) were naive, 336 (36.01%) were partially suspicious, and 54 (5.79%) were fully suspicious. The original authors did not report how many subjects were partially suspicious, but indicated that 3 out of 57 (5.26%) “correctly guessed our hypothesis” (Eskine et al., 2011, p. 296).

Predicted probabilities were obtained in a generalized linear mixed-effects logistic regression assessing the fixed effects of beverage type and political orientation on level of knowledge (0 = naive, 1 = partially or fully suspicious). We included a random intercept for studies in the model. More subjects in the bitter (60.65%) and sweet (52.37%) conditions than in the neutral control condition (38.85%) were partially or fully suspicious. The difference between the bitter and control conditions was statistically significant, b = 0.65, SE = 0.26, p = .014, but the difference between the sweet and control conditions was not, b = 0.37, SE = 0.26, p = .145 (see Fig. A1 in the appendix).

Confirmatory analyses

Random-effects meta-analyses and one-sided tests

In our random-effects meta-analyses, we used two contrasts to estimate the standardized effects of drinking the bitter beverage on moral-wrongness judgments within and across political orientations and within and across replication studies. These meta-analyses enabled us to estimate, in standardized units, the extent to which drinking a disgusting, bitter beverage harshens moral judgments relative to drinking water (control) or juice (sweet). We would conclude that there was support for the hypothesis if the magnitudes of the two overall effects (bitter vs. control and bitter vs. sweet) were significantly greater than zero in the positive direction, perhaps only among conservative subjects as in the original study. We used a random-effects approach to determine the extent to which effect sizes varied from one study to the next. We excluded the original study from these analyses, as is typical for Registered Replication Reports (e.g., Wagenmakers et al., 2016), so that the estimates would be based only on unpublished studies that were registered in advance and, thus, were unbiased.

In addition to random-effects meta-analyses, we conducted one-sided tests examining whether the standardized meta-analytic effect sizes for the two contrasts of interest were significantly smaller than the effect size the original authors had 33% power to detect (d_33%; Simonsohn’s, 2015, small-telescopes approach). The small-telescopes approach told us whether the effect size obtained in the replication study was large enough to have been detectable in the original study. If the replication study were well powered and the replication effect significantly smaller than d_33%, we would conclude that the original study was unable to draw meaningful conclusions about the studied effect. Across all subjects in the original study, d_33% was 0.53 for the bitter-versus-control contrast and 0.55 for the bitter-versus-sweet contrast. Among conservative subjects, the value of d_33% was 0.94 for both contrasts, and among liberal subjects, the value of d_33% was 0.80 for both contrasts.⁶

Finally, we also conducted one-sided tests examining whether the replication effect was effectively equivalent to zero. We preregistered two sets of standardized effect-size equivalence bounds around 0, specifically 0 ± d_33% and 0 ± 0.30 (Lakens, Scheel, & Isager, 2018). The equivalence bound of ±d_33% reflects the smallest effect size the original study could have detected. The equivalence bound of ±0.30 reflects the smallest effect size of interest to us given that the original study may not have had adequate power to detect whether effects smaller than that were equivalent to zero. To simplify the presentation of our results, we deviate from our preregistered plan and report only the results for the more stringent of the two equivalence regions, 0 ± 0.30. Meta-analytic effect sizes equivalent to 0 ± 0.30 would suggest that the effects are unlikely to be of greater magnitude than ±0.30.

Figures 1 and 2 display the standardized effect sizes and 95% confidence intervals for the two contrasts, within and across studies across all subjects and for the conservative and liberal subgroups. Figures A2 and A3 in the appendix display the parallel information for the raw mean differences instead of the standardized mean differences; the pattern is quite similar. In what follows, we summarize results for the two contrasts of interest.

Fig. 1.

Forest plot of the effect sizes for the bitter-versus-control contrast in Eskine, Kacinik, and Prinz’s (2011) original study and the replications in the current project. Results are shown for contrasts across all subjects, within the conservative subgroup, and within the liberal subgroup. Within each group, replication studies are presented in descending order of the Hedges’s g point estimate; the original study is first, regardless of magnitude. The size of the symbols is inversely proportional to the variance of the estimate; larger symbols indicate more precise estimation. Error bars represent 95% confidence intervals (CIs). “RE Model” refers to the random-effects models providing overall estimates across the replication studies; these models exclude the original effect size. Red dashed vertical lines indicate ±0.30 equivalence bounds. The green dashed vertical line indicates d_33%.

Fig. 2.

Forest plot of the effect sizes for the bitter-versus-sweet contrast in Eskine, Kacinik, and Prinz’s (2011) original study and the replications in the current project. Results are shown for contrasts across all subjects, within the conservative subgroup, and within the liberal subgroup. Within each group, replication studies are presented in descending order of the Hedges’s g point estimate; the original study is first, regardless of magnitude. The size of the symbols is inversely proportional to the variance of the estimate; larger symbols indicate more precise estimation. Error bars represent 95% confidence intervals (CIs). “RE Model” refers to the random-effects models providing overall estimates across the replication studies; these models exclude the original effect size. Red dashed vertical lines indicate ±0.30 equivalence bounds. The green dashed vertical line indicates d_33%.

All subjects

The mean moral-wrongness judgment across the 11 studies, weighted by the number of subjects in each group for each study, was 70.65 (SD = 3.52) in the bitter condition, 68.94 (SD = 3.41) in the neutral control condition, and 71.29 (SD = 2.86) in the sweet condition.

The overall effect for the bitter-versus-control contrast was negligible and in the predicted direction. This effect was significantly smaller than d_33% for the original study (i.e., 0.53), z = −5.22, p < .001. It was also equivalent to 0 ± 0.30, z = −2.42, p = .008. Heterogeneity was low.

The overall effect for the bitter-versus-sweet contrast was negligible but in the opposite of the predicted direction. This effect was significantly smaller than d_33% for the original study (i.e., 0.55), z = −8.31, p < .001. It was also equivalent to 0 ± 0.30, z = 3.51, p < .001. Heterogeneity was low.

Conservative subjects

Among conservative subjects in 5 studies, the mean moral-wrongness judgment across studies, weighted by the number of subjects in each group for each study, was 70.98 (SD = 6.99) in the bitter condition, 68.45 (SD = 5.88) in the neutral control condition, and 73.53 (SD = 5.64) in the sweet condition.

The overall effect for the bitter-versus-control contrast was small and in the predicted direction. This effect was significantly smaller than d_33% for the original study (i.e., 0.94), z = −1.80, p = .036. It was not equivalent to 0 ± 0.30, z = −0.22, p = .412. Heterogeneity was substantial.

The overall effect for the bitter-versus-sweet contrast was medium and in the opposite of the predicted direction. This effect was significantly smaller than d_33% for the original study (i.e., 0.94), z = −4.46, p < .001. However, it was not equivalent to 0 ± 0.30, z = −0.52, p = .697. Heterogeneity was moderate.

Liberal subjects

Among liberal subjects in 9 studies, the mean moral-wrongness judgment across studies, weighted by the number of subjects in each group for each study, was 69.38 (SD = 3.97) in the bitter condition, 68.66 (SD = 5.94) in the neutral control condition, and 71.23 (SD = 4.01) in the sweet condition.

The overall effect for the bitter-versus-control contrast was near zero. This effect was significantly smaller than d_33% for the original study (i.e., 0.80), z = −4.95, p < .001, and equivalent to 0 ± 0.30, z = −1.78, p = .038. Heterogeneity was moderate.

The overall effect for the bitter-versus-sweet contrast was negligible and in the opposite of the predicted direction. This effect was significantly smaller than d_33% for the original study (i.e., 0.80), z = −8.97, p < .001. It was also equivalent to 0 ± 0.30, z = 1.91, p = .028. Heterogeneity was low.

Linear mixed-effects regression models

The random-effects meta-analyses and one-sided tests enabled us to examine the standardized effect of the beverage manipulation on moral-wrongness judgments within and across political-orientation groups. These analyses also addressed the extent to which beverage effects varied across the replication studies. They did not, however, reveal whether political orientation or knowledge of the hypothesis formally moderated the effects of beverage condition on moral-wrongness judgments. They also did not account for potential random variation in moral-wrongness judgments across subjects and vignettes. Thus, we conducted LMER models of the individual subjects’ data to address these gaps. Apportioning all relevant sources of variation simultaneously in LMER may increase sensitivity to detect predicted effects. For the same reasons as expressed for the random-effects meta-analyses, we excluded the original study from these analyses.

LMER model specification

We report three LMER models.⁷ All models included fixed effects reflecting beverage type with two contrasts: For the bitter-versus-control (BvC) contrast, the weights assigned were .5 for the bitter condition, −.5 for the control condition, and 0 for the sweet condition. For the bitter-versus-sweet (BvS) contrast, the weights assigned were .5 for the bitter condition, −.5 for the sweet condition, and 0 for the control condition. Effects of political orientation were also examined with two contrasts: For the conservative-versus-liberal (CvL) contrast, the weights assigned were .5 for the conservative subgroup, −.5 for the liberal subgroup, and 0 for the “other” subgroup. For the conservative-versus-other contrast (CvO), the weights assigned were .5 for the conservative subgroup, –.5 for the “other” subgroup, and 0 for the liberal subgroup. All models allowed for interactions between these beverage-type and political-orientation predictors.

In Model 1, we analyzed moral-wrongness ratings as a composite averaged across vignettes, with a random intercept to allow variation in the average rating across studies. Because we used effect coding, the intercept reflects the mean moral-wrongness rating across all subjects. There were 1,137 observations (subjects) for this analysis.

In Model 2, we added level of knowledge of the hypothesis as a fixed effect that could interact with beverage type, political orientation, or both.⁸ We recoded the knowledge variable to 0, naive, or 1, partially or fully suspicious⁹; the intercept therefore reflects the mean moral-wrongness rating among naive subjects. For three studies, we could not retrieve the open-ended responses to the question about what the study was about. Therefore, we could not code level of knowledge of the hypothesis for any of the subjects in those studies. With these study-level exclusions, there were 933 observations (subjects) for this analysis.

In Model 3, we analyzed moral-wrongness ratings for each vignette with the fixed effects from Model 2 and with random intercepts that allowed variation in the mean rating across studies, subjects, and vignettes. The intercept reflects the mean moral-wrongness rating among naive subjects. There were 5,502 observations for 933 subjects, each with moral-wrongness ratings for up to six vignettes. A total of 96 ratings from three studies (1.71%) were missing. We did not impute missing values.

LMER model results

Table 2 summarizes the results of the three confirmatory LMER models. The appendix at OSF includes figures depicting the moral-wrongness ratings in all design cells after accounting for fixed and random sources of variation in Model 1 (Fig. A4) and Model 2 (Fig. A5), respectively.

Table 2.

Summary of the Linear Mixed-Effects Models Examining the Moral-Wrongness Judgments

Predictor	Model 1	Model 2	Model 3
Intercept	69.937*** (0.874)	70.826*** (0.956)	70.782*** (2.706)
Bitter – control (BvC)	3.276* (1.369)	5.849** (2.045)	5.794** (2.045)
Bitter – sweet (BvS)	−2.030 (1.339)	−2.657 (2.135)	−2.526 (2.136)
Conservative – liberal (CvL)	2.388 (1.233)	1.042 (1.760)	1.118 (1.762)
Conservative – other (CvO)	−1.864 (1.373)	1.467 (2.141)	1.370 (2.141)
Level of knowledge (0 = naive)		1.093 (1.255)	1.083 (1.257)
BvC × CvL	2.823 (3.271)	5.905 (4.720)	6.061 (4.721)
BvC × CvO	−0.702 (3.717)	−5.826 (5.601)	−5.636 (5.598)
BvS × CvL	1.484 (3.218)	5.981 (4.906)	6.199 (4.909)
BvS × CvO	−6.109 (3.666)	−7.857 (5.997)	−7.784 (5.990)
BvC × Knowledge		−10.050** (3.622)	−9.883** (3.631)
BvS × Knowledge		3.878 (3.377)	3.754 (3.378)
CvL × Knowledge		3.534 (2.837)	3.663 (2.840)
CvO × Knowledge		−3.718 (3.235)	−3.625 (3.235)
BvC × CvL × Knowledge		−10.074 (8.267)	−10.139 (8.284)
BvC × CvO × Knowledge		−0.757 (9.421)	−0.794 (9.429)
BvS × CvL × Knowledge		−8.651 (7.796)	−8.802 (7.797)
BvS × CvO × Knowledge		9.230 (8.904)	9.114 (8.898)
Observations	1,137	933	5,502
Log likelihood	−4,590.005	−3,735.196	−25,537.830
AIC	9,202.011	7,510.392	51,119.670
BIC	9,257.409	7,607.160	51,265.150

Note: Values inside parentheses are standard errors. AIC = Akaike’s information criterion; BIC = Bayesian information criterion.

p < .05. **p < .01. ***p < .001.

Results were consistent with the hypothesis that physical disgust induces greater feelings of moral wrongness, in that subjects who drank a bitter beverage made significantly harsher moral judgments than did those who drank water (see the BvC terms in Table 2). However, contrary to the hypothesis, subjects who drank a bitter beverage made numerically milder moral judgments than did those who drank the sweet juice; this difference was not statistically significant (see BvS terms in Table 2).

The bitter-versus-control effect did not vary significantly between conservatives and liberals (see the BvC × CvL and BvC × CvL × Knowledge terms in Table 2). Similarly, the bitter-versus-sweet effect did not vary significantly between conservatives and liberals (see the BvS × CvL and BvS × CvL × Knowledge terms in Table 2).

The bitter-versus-control effect did, however, vary significantly by level of knowledge of the hypothesis (see the BvC × Knowledge terms in Table 2). The estimated marginal means from Model 2 indicate that among naive subjects, drinking the bitter beverage, M = 72.42 (SE = 1.56), resulted in harsher moral judgments than did drinking water, M = 67.90 (SE = 1.35); this simple effect was statistically significant, t(45.35) = 2.46, p = .018. Among suspicious subjects, drinking the bitter beverage, M = 70.43 (SE = 1.70), resulted in milder moral judgments than did drinking water, M = 74.02 (SE = 2.03); this simple effect was not statistically significant, t(87.03) = −1.41, p = .161.

The bitter-versus-sweet effect did not vary significantly by level of knowledge of the hypothesis (see the BvS × Knowledge terms in Table 2). The estimated marginal means from Model 2 indicated that among naive subjects, drinking the bitter beverage, M = 72.42 (SE = 1.56), and drinking the sweet juice, M = 72.15 (SE = 1.48), resulted in similar moral judgments. Among suspicious subjects, drinking the bitter beverage, M = 70.43 (SE = 1.70), and drinking the sweet juice, M = 71.31 (SE = 1.60), also resulted in similar moral judgments.

Intraclass correlation coefficients (ICCs) based on Models 1 and 2 indicated that the proportion of variance in moral-wrongness judgments due to studies was .03 and .01, respectively. The proportion of variance due to studies was lower in Model 3, ICC = .00, likely because of variance accounted for by subjects, ICC = .14, and items, ICC = .05.

The preceding confirmatory LMER analyses as a whole did not reveal support for the hypothesis that drinking a bitter, disgusting beverage promotes a heightened sense of moral wrongness relative to drinking water and relative to drinking sweet juice, either in all subjects or among conservative subjects. Therefore, we conducted some exploratory LMER analyses.

First, in our confirmatory analyses, we included all subjects who provided ratings of at least three of the six vignettes, but it was possible that hypothesized effects would emerge only among subjects who rated all six. Therefore, we conducted an exploratory analysis in which we excluded subjects who rated fewer than six vignettes (see Table A7 in the appendix for results). Estimates in these models were all rather similar in magnitude and direction to those from the models that included subjects who rated only three or more vignettes; none of the estimates in these exploratory models revealed the hypothesized elevation of moral-wrongness judgments in the bitter condition relative to the control and sweet condition across all subjects or as a function of political orientation.

Second, it was also possible that the hypothesized effects would emerge in a subset of the vignettes. For example, researchers have argued that moral judgments of purity violations might be particularly susceptible to manipulations of physical disgust (Horberg, Oveis, Keltner, & Cohen, 2009). Therefore, we conducted exploratory analyses using six linear mixed-effects models structured the same as Model 1 but with the moral-wrongness rating for an individual vignette as the criterion variable in each. The results are presented in Table A8 in the appendix. We also conducted analyses using two linear mixed-effects models structured the same as Models 1 and 2 but with moral-wrongness judgment across the two purity-violation vignettes as the criterion variable. Those results are presented in Table A9 in the appendix. None of these models revealed elevations of moral-wrongness judgments in the bitter condition relative to the control and sweet conditions across all subjects or as a function of political orientation. These results fail to support a causal effect of physical disgust on moral-wrongness judgments.

Third, Eskine et al. (2011) reported a positive association between disgust ratings and moral-wrongness judgments across conditions, β = 0.53, t(52) = 4.45, p < .001. In the present replication studies, zero-order correlations between these two variables ranged from −.11 to .42. A bivariate LMER assessing the fixed effect of disgust ratings (standardized) on moral-wrongness judgments (standardized) revealed a positive association, β = 0.07, p = .014. In two additional LMERs, ratings of bitterness were not associated with moral-wrongness judgments, β = 0.01, p = .808, and ratings of bitterness and disgust were strongly positively associated with one another, β = 0.77, p < .001. An LMER assessing the fixed effects of all four beverage ratings yielded a positive association between ratings of disgust and moral-wrongness judgments controlling for the other ratings, β = 0.15, p = .003. Across studies, each 1-unit increase in disgust specifically (7-point scale) was associated with a 0.87-unit increase in moral-wrongness judgment (101-point scale). By contrast, there was a negative association between bitterness and moral-wrongness ratings controlling for the other ratings, β = −0.12, p = .019, and there were near-zero associations between neutral and moral-wrongness ratings, β = −0.04, p = .363, and between sweetness and moral-wrongness ratings, β = 0.02, p = .675. These results establish a small, correlational link between physical disgust and moral-wrongness judgments that likely does not reflect general unpleasantness because controlling for perceived bitterness did not mitigate the association.

Bayes factor tests

Finally, the random-effects meta-analyses and the LMER analyses all relied on a frequentist statistical perspective, involving standardized effect sizes, confidence intervals, and p values. In our last set of analyses, we conducted a set of BF tests. Unlike frequentist methods, BF tests quantify relative levels of evidence for an alternative hypothesis (in this case, that drinking a bitter beverage harshens moral judgments compared with drinking a control or sweet beverage) versus the null hypotheses (that there is no effect of beverage type). Like the tests Wagenmakers et al. (2016) used, these tests focused on each of the replication studies individually. Three out of the four tests explicitly incorporated the original study.

Bayes factor model specification

Following the logic and code given by Verhagen and Wagenmakers (2014), we report four complementary BF tests for each replication study. In each case, the BF represents a comparison between two models; it captures the extent of evidence for one model relative to the other. The first test used the Jeffreys-Zellner-Siow (JZS) BF, which is independent from the original finding This test determined the relative evidence for the effect being present versus absent in the replication by setting a standard two-tailed Cauchy(0, 1) distribution as the prior, ignoring the original study. The second test was the replication BF test, in which the prior was based on the posterior distribution from the original study. It determined the relative evidence for the original effect versus a null effect. The third test was the equality-of-effect-size BF test, which determined whether the effect size in the replication study was equal to the effect size in the original study by determining relative evidence for the variance τ² of the effect sizes being zero versus nonzero. The fourth test was the fixed-effect meta-analysis BF test, which pooled the original and replication studies’ data, and used the two-sided Cauchy(0,1) distribution as the prior (as in the JZS BF test).¹⁰

Bayes factor results

Figure 3 summarizes the BF results across all subjects in all studies. (See Table A10 and accompanying text in the appendix for BF test results within the conservative and liberal subgroups.)

Fig. 3.

Results of the four Bayes factor (BF) tests across all subjects in each replication study. Results for the bitter-versus-control contrast are shown on the left, and results for the bitter-versus-sweet contrast are shown on the right. The dashed horizontal lines mark BF = 3 and 0.33; BFs above and below these thresholds provide nonanecdotal evidence for and against the replication hypothesis, respectively. For the Jeffreys-Zellner-Siow (JZS) Bayes test, the replication Bayes test, and the fixed-effect meta-analysis Bayes test, we report BF₁₀, which indicates relative evidence for the alternative hypothesis over the null hypothesis For the equality-of-effect-size Bayes text, however, we report BF₀₁, which indicates relative evidence for the null hypothesis over the alternative hypothesis, because replication success is supported by evidence favoring zero variance, the null hypothesis. In all four cases, a larger value constitutes greater relative support for the replication hypothesis.

The equality-of-effect-size, JZS, and replication BF tests yielded relatively consistent evidence against the replication hypothesis (bitter > control and bitter > sweet) across most of the studies. By contrast, about one third of the studies provided evidence for the replication hypothesis according to the fixed-effect meta-analysis BF test. That said, because the latter test pools the original and replication effects, the very large original effect size likely had a strong influence on the outcome. Note that of the four replication studies for which this test provided more than nonanecdotal support for the replication hypothesis in the case of the bitter-versus-control contrast, only one had a large enough sample size to inspire confidence in the precision of the estimation (Study 10: N = 439); the other three studies were relatively small (Study 1: N = 59; Study 6: N = 24; Study 11: N = 26).

Discussion

Several studies have shown that experimental manipulations of disgust can harshen moral judgments (e.g., Harlé & Sanfey, 2010; Moretti & di Pellegrino, 2010; Schnall, Benton, & Harvey, 2008; Schnall, Haidt, et al., 2008; Van Dillen, van der Wal, & van den Bos, 2012; Wheatley & Haidt, 2005). Although more recent research has cast some doubt on the existence of this relationship, some researchers have proposed that the effect might be more robust for a specific type of manipulation: induction of gustatory and olfactory disgust (see Landy & Goodwin, 2015a, 2015b). As this notion was largely based on one low-powered study with a particularly large effect size (i.e., Eskine et al., 2011), we were interested in obtaining an accurate estimate of the effect size by conducting a high-powered meta-analysis of preregistered direct-replication studies. Overall, in 11 studies, we found little to no support for the conclusion that gustatory disgust harshens moral judgments.

We adopted a multifaceted analysis strategy in an effort to fairly examine the research question (i.e., does gustatory disgust induced via a bitter drink amplify moral-wrongness judgments?) from slightly different angles. In part, we employed a frequentist approach, running random-effects meta-analyses, one-sided tests, and LMER models to make inferences across all the replication studies. We observed a very small overall random-effects meta-analytic effect in which the bitter drink induced harsher moral judgments than water across all participants. The parallel effect was small and statistically significant in the LMER models. However, in the random-effects meta-analyses comparing the bitter drink with water, the confidence intervals for the effect sizes encompassed zero in all but two replications, and the confidence interval for the overall effect included zero (see Fig. 1). In addition, standardized effects from these meta-analyses comparing the bitter drink with water were significantly smaller than the smallest effect the original authors could have found, and equivalence tests indicated that the effects were unlikely to be of greater magnitude than ± 0.30. Moreover, none of the frequentist tests indicated that the bitter drink resulted in harsher moral judgments than the sweet one did; if anything, the bitter drink led to more lenient moral judgments than the sweet drink. This latter finding is important in that it undermines the notion that disgust is responsible for any effects of the bitter drink on moral judgments; instead, any such effects might be explained by the act of drinking something with flavor. Finally, we found no evidence that political conservatives were especially harsh in their judgments after consuming a bitter drink. It should be noted, though, that this latter finding is based on a relatively small sample of conservative subjects (a limitation discussed in more detail later in this section).

We also used a Bayesian approach, which quantified the relative strength of the evidence in favor of versus against the replication hypothesis. We computed four different BF tests, of which three (i.e., JZS, replication, and equality-of-effect-size BF tests) showed evidence against the replication hypothesis in most of the studies. The fixed-effect meta-analytic BF test showed more support for the idea that disgust amplifies the harshness of moral judgments. However, this is not surprising, as this test pooled each of the replication studies with the original study, which had a particularly large effect. Even in this case, though, only 4 of the 11 replication studies (3 of which had small sample sizes) favored support for the replication hypothesis.

Given the results from these various tests, we conclude that there is little to no support for the notion that gustatory disgust can increase ratings of moral wrongness. Our study adds to the growing number of studies that have failed to find support for a relationship between incidental manipulations of disgust and moral judgments (e.g., Johnson, Cheung, & Donnellan, 2014; Johnson et al., 2016; Landy & Goodwin, 2015a). Our results not only cast doubt on Eskine and colleagues’ (2011) conclusion that a disgusting taste affects moral judgments, but also fail to support Schnall and colleagues’ (2015) proposal that gustatory inductions of disgust have a special potency to influence moral condemnation.

It is possible that the effect of incidental disgust on moral judgments depends heavily on moderator variables, such as individual difference measures. For example, it has been suggested that the relationship is stronger for individuals who are generally more sensitive to bodily sensations (i.e., who score high on measures of private body consciousness; Schnall et al., 2008, also see Schnall et al., 2015). We did not measure this variables, or any other individual difference measure, consistently across our studies, so we cannot directly test this hypothesis. However, an earlier replication study by Johnson and colleagues (2016) tested the importance of private body consciousness to the link between disgust and moral judgment, but did not find evidence indicating that it has a moderating effect (also see Johnson et al., 2014).

One moderator variable that we identified is knowledge of the hypothesis. A total of 6% of our subjects demonstrated full knowledge of the hypothesis that drinking a bitter beverage would harshen moral judgments; an additional 36% partially guessed that hypothesis. We could not separate out effects of partial versus full suspicion because of the absence of observations in one cell of the factorial design. However, when we combined the fully and partially suspicious groups in our LMER models, we found that the bitter-versus-control effect was larger in the predicted direction among naive subjects than among fully or partially suspicious subjects. This result is in line with the idea that induced disgust has a stronger effect on individuals who are unaware of a link between the taste of the drink and the moral judgments than among individuals who are aware of such a link (Schnall et al., 2015). Contrary to this idea, however, knowledge of the hypothesis did not moderate the bitter-versus-sweet effect.

It is unclear how closely the coding scheme we used to assess knowledge of the hypothesis maps onto the scheme used by the original authors. They reported that 3 of 57 subjects (5%), who were excluded from analyses, correctly guessed the hypothesis. Given that 6% of our subjects demonstrated full knowledge of the hypothesis, the 5% of subjects with knowledge of the hypothesis in the original study may have had full knowledge, not just partial. Regardless, we can conclude that knowledge of the hypothesis moderated the effect of beverage type on moral judgments to some extent. Future studies would benefit from refinement of deception procedures to minimize such knowledge. On this point, it is noteworthy that in the replication studies, more subjects in the bitter and sweet conditions than in the control condition were partially or fully suspicious; deception procedures would ideally yield similar rates of suspicion in all conditions so that level of knowledge of the hypothesis is orthogonal to the beverage manipulation.

Some limitations of the current investigation may have affected our findings. One potential limitation is the low reliability of the moral-wrongness composite in most of our replication samples. This may have influenced our ability to detect effects, especially if our average reliability was substantially lower than what was observed in the original study. Unfortunately, information about reliability was not reported by Eskine et al. (2011), or by Wheatley and Haidt (2005), the developers of these vignettes. However, linear mixed-effects analyses of the individual vignettes revealed that the bitter drink, compared with the sweet drink or water, did not induce harsher moral judgments in response to any of the vignettes. It therefore seems unlikely that the observed low reliability of the moral-wrongness composite explains our mostly null findings.

Another limitation is that we had few samples with enough conservative subjects for us to test the moderating effect of political ideology. Only 5 of the 11 replications had at least 2 conservative subjects in each of the three beverage conditions. Among those studies, only 3 had more than 6 conservative subjects in each of the three beverage conditions (i.e., the estimated number of conservatives per condition in the original study). This likely decreased our ability to detect potential moderation by political ideology. It should be noted, though, that we had substantially more conservative subjects across the replication samples (n = 162) than the original study did (n = 19).

Conclusion

This work joins the growing number of crowdsourced replication projects combining multiple laboratories’ efforts to replicate a single study (Hagger et al., 2016; e.g., Moshontz et al., 2018; Schweinsberg et al., 2016). Multilab replications have an advantage over single-lab replications in that they can more accurately estimate effect sizes given the possibility of much larger sample sizes. We carried out this research as part of the CREP. Thus, this project also joins the growing movement of conducting replications in the classroom (see also Leighton, Legate, LePine, Anderson, & Grahe, 2018; Wagge et al., 2019). Involving students in replication projects is more than a valuable pedagogical tool; it has been shown to be a promising means of carrying out high-quality replications (Frank & Saxe, 2012; Hawkins et al., 2018). Here we have demonstrated that pedagogical replications can provide valuable evidence about the robustness of an effect that has not yet been submitted to independent replication. We believe that pedagogical replications could be implemented widely to advance psychological science.

Supplemental Material

AMPPSOpenPracticesDisclosure-v1-0_Ghelfi – Supplemental material for Reexamining the Effect of Gustatory Disgust on Moral Judgment: A Multilab Direct Replication of Eskine, Kacinik, and Prinz (2011)

Supplemental material, AMPPSOpenPracticesDisclosure-v1-0_Ghelfi for Reexamining the Effect of Gustatory Disgust on Moral Judgment: A Multilab Direct Replication of Eskine, Kacinik, and Prinz (2011) by Eric Ghelfi, Cody D. Christopherson, Heather L. Urry, Richie L. Lenne, Nicole Legate, Mary Ann Fischer, Fieke M. A. Wagemans, Brady Wiggins, Tamara Barrett, Michelle Bornstein, Bianca de Haan, Joshua Guberman, Nada Issa, Joan Kim, Elim Na, Justin O’Brien, Aidan Paulk, Tayler Peck, Marissa Sashihara, Karen Sheelar, Justin Song, Hannah Steinberg and Dasan Sullivan in Advances in Methods and Practices in Psychological Science

Footnotes

Acknowledgements

The authors wish to acknowledge Jon Grahe and Hans IJzerman for early guidance and reviews, and Jordan Wagge, Mark Brandt, and all the volunteer reviewers for their work with the Collaborative Replications and Education Project ().

Transparency

Action Editor: Daniel J. Simons

Editor: Daniel J. Simons

Author Contributions

We have collectively identified major and minor contributors to this work. The first 8 authors are major contributors, and the other 15 (listed in alphabetical order) are minor contributors. According to the Contributor Roles Taxonomy (CRediT; ), the specific contributions were as follows: All the authors contributed to the conceptualization of this project, provided necessary resources, and developed the methodology. E. Ghelfi and C. D. Christopherson were responsible for project administration, and all the major contributors supervised the project. E. Ghelfi, F. M. A. Wagemans, and M. A. Fischer were responsible for data curation, and H. L. Urry and R. L. Lenne performed the formal analysis. H. L. Urry acquired the funds for the study at Tufts University. All the minor contributors conducted the investigation.; H. L. Urry and R. L. Lenne were responsible for validation of the research outputs and prepared the visualization of results All the major contributors wrote the original draft of the manuscript, and all the authors reviewed and edited it. At the University of Erfurt, Helene Weber, Liv Hochhäuser Conde, Mirjam Düben, Frank Renkewitz, Cornelia Betsch, and Lisa Felgendreff organized and conducted the study. Helene Weber, Liv Hochhäuser Conde, and Mirjam Düben helped provide their data to the meta-analysis team and helped frame German answers to the political-orientation question. No contributors from the University of Erfurt were involved in writing or reviewing the manuscript, and none elected to be authors.

ORCID iDs

Eric Ghelfi

Richie L. Lenne

Bianca de Haan

Joshua Guberman

Prior Versions

Prior versions of this manuscript were uploaded to the PsyArXiv preprint server, at .

Notes

References

Anderson

M. L.

(2003). Embodied cognition: A field guide. Artificial Intelligence, 149, 91–130. doi:10.1016/S0004-3702(03)00054-7

Arnholt

A. T.

Evans

(2017). BSDA: Basic statistics and data analysis (R package Version 1.2.0) [Computer software]. Retrieved from https://CRAN.R-project.org/package=BSDA

Auguie

(2017). gridExtra: Miscellaneous functions for “grid” graphics (R package Version 2.3) [Computer software]. Retrieved from https://CRAN.R-project.org/package=gridExtra

Aust

Barth

(2017). papaja: Create APA manuscripts with R Markdown (R package Version 0.1.0.9842) [Computer software]. Retrieved from https://github.com/crsh/papaja

Bates

Mächler

Bolker

Walker

(2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1). doi:10.18637/jss.v067.i01

Bates

Maechler

(2017). Matrix: Sparse and dense matrix classes and methods (R package Version 1.2.14) [Computer software]. Retrieved from https://CRAN.R-project.org/package=Matrix

Brandt

M. J.

IJzerman

Dijksterhuis

Farach

F. J.

Geller

Giner-Sorolla

. . . van’t Veer

(2014). The replication recipe: What makes for a convincing replication? Journal of Experimental Social Psychology, 50, 217–224.

Cameron

C. D.

Payne

B. K.

Doris

J. M.

(2013). Morality in high definition: Emotion differentiation calibrates the influence of incidental disgust on moral judgments. Journal of Experimental Social Psychology, 49, 719–725.

Cannon

P. R.

Schnall

White

(2011). Transgressions and expressions: Affective facial muscle activity predicts moral judgments. Social Psychological and Personality Science, 2, 325–331.

10.

Case

T. I.

Oaten

M. J.

Stevenson

R. J.

(2012). Disgust and moral judgment. In Langdon

Mackenzie

(Eds.), Emotions, imagination, and moral reasoning (pp. 195–218). New York, NY: Taylor & Francis.

11.

Champely

(2017). pwr: Basic functions for power analysis (R package Version 1.2.2) [Computer software]. Retrieved from https://CRAN.R-project.org/package=pwr

12.

Chapman

H. A.

(2018). A component process model of disgust, anger, and moral judgment. In Gray

Graham

(Eds.), Atlas of moral psychology (pp 70–80). New York, NY: Guilford Press.

13.

Chapman

H. A.

Anderson

A. K.

(2012). Understanding disgust. Annals of the New York Academy of Sciences, 1251, 62–76.

14.

Curtis

Biran

(2001). Dirt, disgust, and disease: Is hygiene in our genes? Perspectives in Biology and Medicine, 44, 17–31.

15.

Davison

A. C.

Hinkley

D. V.

(1997). Bootstrap methods and their applications. Cambridge, England: Cambridge University Press.

16.

Eskine

K. J.

Kacinik

N. A.

Prinz

J. J.

(2011). A bad taste in the mouth: Gustatory disgust influences moral judgment. Psychological Science, 22, 295–299.

17.

Frank

M. C.

Saxe

(2012). Teaching replication. Perspectives on Psychological Science, 7, 600–604.

18.

Garnier

(2018a). viridis: Default color maps from ‘matplotlib’ (R package Version 0.5.1) [Computer software]. Retrieved from https://CRAN.R-project.org/package=viridis

19.

Garnier

(2018b). viridisLite: Default color maps from ‘matplotlib’ (lite version) (R package Version 0.3.0) [Computer software]. Retrieved from https://CRAN.R-project.org/package=viridisLite

20.

Gill

M. B.

Nichols

(2008). Sentimentalist pluralism: Moral psychology and philosophical ethics. Philosophical Issues, 18, 143–163.

21.

Greene

J. D.

Sommerville

R. B.

Nystrom

L. E.

Darley

J. M.

Cohen

J. D.

(2001). An fMRI investigation of emotional engagement in moral judgment. Science, 293, 2105–2108.

22.

Hagger

M. S.

Chatzisarantis

N. L.

Alberts

Anggono

C. O.

Batailler

Birt

A. R.

. . . Bruyneel

(2016). A multilab preregistered replication of the ego-depletion effect. Perspectives on Psychological Science, 11, 546–573.

23.

Haidt

(2012). The righteous mind: Why good people are divided by politics and religion. New York, NY: Vintage Books.

24.

Harlé

K. M.

Sanfey

A. G.

(2010). Effects of approach and withdrawal motivation on interactive economic decisions. Cognition and Emotion, 24, 1456–1465.

25.

Hawkins

R. X.

Smith

E. N.

Arias

J. M.

Catapano

Hermann

. . . Reynolds

(2018). Improving the replicability of psychological science through pedagogy. Advances in Methods and Practices in Psychological Science, 1, 7–18.

26.

Hellmann

J. H.

Thoben

D. F.

Echterhoff

(2013). The sweet taste of revenge: Gustatory experience induces metaphor-consistent judgments of a harmful act. Social Cognition, 31, 531–542.

27.

Hlavac

(2018). stargazer: Well-formatted regression and summary statistics tables (R package Version 5.2.2) [Computer software]. Retrieved from https://CRAN.R-project.org/package=stargazer

28.

Horberg

E. J.

Oveis

Keltner

Cohen

A. B.

(2009). Disgust and the moralization of purity. Journal of Personality and Social Psychology, 97, 963–976.

29.

Inbar

Pizarro

D. A.

(2014). Pollution and purity in moral and political judgment. In Sarkissian

Wright

J. C.

(Eds.), Advances in experimental moral psychology (pp. 111–129). London, England: Bloomsbury Academic.

30.

Inbar

Pizarro

D. A.

Bloom

(2012). Disgusting smells cause decreased liking of gay men. Emotion, 12, 23–27.

31.

Ionescu

Vasc

(2014). Embodied cognition: Challenges for psychology and education. Procedia: Social and Behavioral Sciences, 128, 275–280.

32.

Johnson

D. J.

Cheung

Donnellan

M. B.

(2014). Does cleanliness influence moral judgments? Social Psychology, 45, 209–215. doi:10.1027/1864-9335/a000186

33.

Johnson

D. J.

Wortman

Cheung

Hein

Lucas

R. E.

Donnellan

M. B.

. . . Narr

R. K.

(2016). The effects of disgust on moral judgments: Testing moderators. Social Psychological and Personality Science, 7, 640–647.

34.

Kelley

(2017). MBESS: The MBESS R package (R package Version 4.5.1) [Computer software]. Retrieved from https://CRAN.R-project.org/package=MBESS

35.

Kooperberg

(2015). polspline: Polynomial spline routines (R package Version 1.1.14) [Computer software]. Retrieved from https://CRAN.R-project.org/package=polspline

36.

Kuznetsova

Brockhoff

P. B.

Christensen

R. H. B.

(2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82(13). doi:10.18637/jss.v082.i13

37.

Lakens

(2017). Equivalence tests: A practical primer for t tests, correlations, and meta-analyses. Social Psychological and Personality Science, 8, 355–362. doi:10.1177/1948550617697177

38.

Lakens

Scheel

A. M.

Isager

P. M.

(2018). Equivalence testing for psychological research: A tutorial. Advances in Methods and Practices in Psychological Science, 1, 259–269.

39.

Landy

J. F.

Goodwin

G. P.

(2015a). Does incidental disgust amplify moral judgment? A meta-analytic review of experimental evidence. Perspectives on Psychological Science, 10, 518–536. doi:10.1177/1745691615583128

40.

Landy

J. F.

Goodwin

G. P.

(2015b). Our conclusions were tentative, but appropriate: A reply to Schnall et al. (2015). Perspectives on Psychological Science, 10, 539–540. doi:10.1177/1745691615590570

41.

Leighton

D. C.

Legate

LePine

Anderson

S. F.

Grahe

(2018). Self-esteem, self-disclosure, self-expression, and connection on Facebook: A collaborative replication meta-analysis. Psi Chi Journal of Psychological Research, 23, 98–109.

42.

Lenth

(2018). emmeans: Estimated marginal means, aka least-squares means (R package Version 1.3.4) [Computer software]. Retrieved from https://CRAN.R-project.org/package=emmeans

43.

Lüdecke

(2018a). sjlabelled: Labelled data utility functions (R package Version 1.0.17) [Computer software]. doi:10.5281/zenodo.1249215

44.

Lüdecke

(2018b). sjPlot: Data visualization for statistics in social science (R package Version 2.6.3) [Computer software]. Retrieved from https://CRAN.R-project.org/package=sjPlot

45.

Lüdecke

(2018c). sjstats: Statistical functions for regression models (R package Version 0.17.4) [Computer software]. Retrieved from https://CRAN.R-project.org/package=sjstats

46.

Martin

A. D.

Quinn

K. M.

Park

J. H.

(2011). MCMCpack: Markov Chain Monte Carlo in R. Journal of Statistical Software, 42(9). doi:10.18637/jss.v042.i09

47.

Moretti

di Pellegrino

(2010). Disgust selectively modulates reciprocal fairness in economic interactions. Emotion, 10, 169–180. doi:10.1037/a0017826

48.

Morey

R. D.

Rouder

J. N.

(2015). BayesFactor: Computation of Bayes factors for common designs (R package Version 0.9.12.4.2) [Computer software]. Retrieved from https://CRAN.R-project.org/package=BayesFactor

49.

Moshontz

Campbell

Ebersole

C. R.

IJzerman

Urry

H. L.

Forscher

P. S.

. . . Chartier

C. R.

(2018). The Psychological Science Accelerator: Advancing psychology through a distributed collaborative network. Advances in Methods and Practices in Psychological Science, 1, 501–515.

50.

Plummer

Best

Cowles

Vines

(2006). CODA: Convergence diagnosis and output analysis for MCMC. R News, 6(1), 7–11. Retrieved from http://cran.r-project.org/doc/Rnews/Rnews_2006-1.pdf#page=7

51.

R Core Team. (2017). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.

52.

A. C. D.

(2013). compute.es: Compute effect sizes (R package Version 0.2.4) [Computer software]. Retrieved from http://cran.r-project.org/web/packages/compute.es

53.

Revelle

(2017). psych: Procedures for psychological, psychometric, and personality research (R package Version 1.8.12) [Computer software]. Retrieved from https://CRAN.R-project.org/package=psych

54.

Rinker

T. W.

Kurkiewicz

(2017). pacman: Package management for R (R package Version 0.5.1) [Computer software]. Retrieved from http://github.com/trinker/pacman

55.

RStudio Team. (2015). RStudio [Computer software]. Retrieved from http://www.rstudio.com/

56.

Sarkar

(2008). Lattice: Multivariate data visualization with R. New York, NY: Springer.

57.

Schnall

Benton

Harvey

(2008). With a clean conscience: Cleanliness reduces the severity of moral judgments. Psychological Science, 19, 1219–1222.

58.

Schnall

Haidt

Clore

G. L.

Jordan

A. H.

(2008). Disgust as embodied moral judgment. Personality and Social Psychology Bulletin, 34, 1096–1109. doi:10.1177/0146167208317771

59.

Schnall

Haidt

Clore

G. L.

Jordan

A. H.

(2015). Landy and Goodwin (2015) confirmed most of our findings then drew the wrong conclusions. Perspectives on Psychological Science, 10, 537–538. doi:10.1177/1745691615589078

60.

Schweinsberg

Madan

Vianello

Sommer

S. A.

Jordan

Tierney

. . . Uhlmann

E. L

. (2016). The Pipeline Project: Pre-publication independent replications of a single laboratory’s research pipeline. Journal of Experimental Social Psychology, 66, 55–67.

61.

Simonsohn

(2015). Small telescopes: Detectability and the evaluation of replication results. Psychological Science, 26, 559–569. doi:10.1177/0956797614567341

62.

Sturtz

Ligges

Gelman

(2005). R2WinBUGS: A package for running WinBUGS from R. Journal of Statistical Software, 12(3). doi:10.18637/jss.v012.i03

63.

Torchiano

(2017). effsize: Efficient effect size computation (R package Version 0.7.4) [Computer software]. Retrieved from https://CRAN.R-project.org/package=effsize

64.

Tybur

J. M.

Lieberman

Kurzban

DeScioli

(2013). Disgust: Evolved function and structure. Psychological Review, 120, 65–84.

65.

VanDerWal

Falconi

Januchowski

Shoo

Storlie

(2014). SDMTools: Species distribution modelling tools: Tools for processing data associated with species distribution modelling exercises (R package Version 1.1.221.1) [Computer software]. Retrieved from https://CRAN.R-project.org/package=SDMTools

66.

Van Dillen

L. F.

van der Wal

R. C.

van den Bos

. (2012). On the role of attention and emotion in morality: Attentional control modulates unrelated disgust in moral judgments. Personality and Social Psychology Bulletin, 38, 1222–1231.

67.

Venables

W. N.

Ripley

B. D.

(2002). Modern applied statistics with S (4th ed.). New York, NY: Springer.

68.

Verhagen

Wagenmakers

E.-J.

(2014). Bayesian tests to quantify the result of a replication attempt. Journal of Experimental Psychology: General, 143, 1457–1475.

69.

Vicario

C. M.

Rafal

R. D.

Martino

Avenanti

(2017). Core, social and moral disgust are bounded: A review on behavioral and neural bases of repugnance in clinical disorders. Neuroscience & Biobehavioral Reviews, 80, 185–200.

70.

Viechtbauer

(2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3). doi:10.18637/jss.v036.i03

71.

Wagenmakers

E.-J.

Beek

Dijkhoff

Gronau

Q. F.

Acosta

Adams

R. B.

Jr. . . . Zwaan

R. A.

(2016). Registered Replication Report: Strack, Martin, & Stepper (1988). Perspectives on Psychological Science, 11, 917–928.

72.

Wagge

J. R.

Brandt

M. J.

Lazarevic

L. B.

Legate

Christopherson

Wiggins

Grahe

J. E.

(2019). Publishing research with undergraduate students via replication work: The Collaborative Replications and Education Project. Frontiers in Psychology, 10, Article 247. doi:10.3389/fpsyg.2019.00247

73.

Wheatley

Haidt

(2005). Hypnotic disgust makes moral judgments more severe. Psychological Science, 16, 780–784.

74.

Wickham

(2009). ggplot2: Elegant graphics for data analysis. New York, NY: Springer-Verlag.

75.

Wickham

(2019). stringr: Simple, consistent wrappers for common string operations (R package Version 1.4.0) [Computer software]. Retrieved from https://CRAN.R-project.org/package=stringr

76.

Wickham

Francois

Henry

Mueller

(2017). dplyr: A grammar of data manipulation (R package Version 0.8.1) [Computer software]. Retrieved from https://CRAN.R-project.org/package=dplyr

77.

Wickham

Henry

(2017). tidyr: Easily tidy data with ‘spread()’ and ‘gather()’ functions (R package Version 0.8.3) [Computer software]. Retrieved from https://CRAN.R-project.org/package=tidyr

78.

Wilson

(2002). Six views of embodied cognition. Psychonomic Bulletin & Review, 9, 625–636. doi:10.3758/BF03196322

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.65 MB

Reexamining the Effect of Gustatory Disgust on Moral Judgment: A Multilab Direct Replication of Eskine,Kacinik,and Prinz (2011)

Abstract

Keywords

The Original Study

The Current Study

Disclosures

Preregistration

Data, materials, and online resources

Reporting

Ethical approval

Method

Design

Target sample size

Deviations from the original study

Protocol fidelity

Subjects

Materials

Procedure

Data analysis and inference criteria

Results

Preliminary analyses

Moral-wrongness composite

Beverage ratings

Level of knowledge of the hypothesis

Confirmatory analyses

Random-effects meta-analyses and one-sided tests

All subjects

Conservative subjects

Liberal subjects

Linear mixed-effects regression models

LMER model specification

LMER model results

Bayes factor tests

Bayes factor model specification

Bayes factor results

Discussion

Conclusion

Supplemental Material

AMPPSOpenPracticesDisclosure-v1-0_Ghelfi – Supplemental material for Reexamining the Effect of Gustatory Disgust on Moral Judgment: A Multilab Direct Replication of Eskine, Kacinik, and Prinz (2011)

Footnotes

Acknowledgements

Transparency

ORCID iDs

Prior Versions

Notes

References

Supplementary Material