Abstract
In the replication crisis in psychology, a “tone debate” has developed. It concerns the question of how to conduct scientific debate effectively and ethically. How should scientists give critique without unnecessarily damaging relations? The increasing use of Facebook and Twitter by researchers has made this issue especially pressing, as these social technologies have greatly expanded the possibilities for conversation between academics, but there is little formal control over the debate. In this article, we show that psychologists have tried to solve this issue with various codes of conduct, with an appeal to virtues such as humility, and with practices of self-transformation. We also show that the polemical style of debate, popular in many scientific communities, is itself being questioned by psychologists. Following Shapin and Schaffer’s analysis of the ethics of Robert Boyle’s experimental philosophy in the 17th century, we trace the connections between knowledge, social order, and subjectivity as they are debated and revised by present-day psychologists.
Introduction
The latest crisis in psychology is as much about ethics as it is about statistics and methodology. We do not mean ethics only in the sense of concern with fraud and questionable research practices, although these are certainly major points of debate. At issue more broadly is how academics in psychology should conduct themselves in general: as researchers, as teachers, as writers, reviewers, and administrators. The reform movement that has emerged in response to the problems in the discipline discusses not only the analysis of data but also how to mentor PhD students, whether or not to publish in Elsevier journals, and how to ask questions at conferences, to name but a few topics of debate. The crisis in psychology has encouraged reflection on numerous aspects of the work of psychologists, including personal and practical matters as well as theoretical and methodological issues. Ethical issues have become particularly prominent in a debate that was prompted by the actions of the reformers 1 themselves. Since 2011, the crisis has seen a number of controversies, usually over the results of replication studies. In these controversies, reformers have regularly been accused of a lack of civility and professional courtesy, and even bullying. In response, reformers have depicted their detractors as a beleaguered old guard, attempting to stifle healthy, scientific criticism, and protect their position of power. A “tone debate” has developed; a debate about the tone of the debate. It is a meta-debate that is recognized as a specific, ongoing discussion in and with the reform movement in psychology.
What people can say to each other, how they should say it, and where and when, are all matters that are explicitly or implicitly regulated by social rules of propriety, and these rules may vary historically and culturally. To give just one example: What a child can say to a parent in the present-day Netherlands is different from what they could say 300 years ago in the same country, or in other cultures today. What nowadays would generally be considered a charming directness in The Netherlands would have been inconceivable impertinence in earlier days, and is that still elsewhere. The interaction of scholars is likewise guided by rules, which vary as a function of different contexts in time and in place. In this article, we take as a starting point the ethics of the new experimental philosophy that Robert Boyle created in the 17th century. For Boyle, the discursive practices of the community of experimental philosophers were an important concern. As analyzed by Stephen Shapin and Schaffer (1985), the production of facts in Boyle’s laboratory involved not only a material technology centered on the air pump, and a literary technology for describing and publishing findings, but also a “social technology that incorporated the conventions experimental philosophers should use in dealing with each other and considering knowledge-claims.” (Shapin & Schaffer, 1985, p. 25) Generating matters of fact and deciding on their explanation required a community of experimentalists that held itself to certain rules of discourse.
Boyle had a particular attention to the proper conduct of dispute. Managing dissensus required, above all else, that the debate should focus on the interpretation of facts—not on the facts themselves, and certainly not on the character of the persons engaged in the debate. The ad hominem was to be avoided at all cost, and debates should always be conducted with civility. “The moral tone of philosophical controversy was to be civil and liberal.” (Shapin & Schaffer, 1985, p. 76) In Shapin and Schaffer’s interpretation, the three technologies, including these rules of discourse, served to make matters of fact appear as objective givens, rather than man-made.
Shapin and Schaffer emphasize that the way a community of scholars is ordered and the knowledge it produces are inextricably connected. “(S)olutions to the problem of knowledge are embedded within practical solutions to the problem of social order, and (. . .) different solutions to the problem of social order encapsulate contrasting practical solutions to the problem of knowledge” (p. 15). There is a connection between the knowledge a community produces and the way the members of that community relate to each other. But there is a third term at play as well. Boyle’s experimental philosophy depended on reliable witnesses: Only if trustworthy eyewitnesses could attest to the matters of fact generated in the laboratory could facts have their foundational role as indubitable givens. To multiply the number of witnesses beyond what the space of the laboratory could hold, it was essential that the reports of the experiments allowed what Shapin and Schaffer (p. 60) term “virtual witnessing” by the readers. The reports had to be “prolix,” as Boyle called it, describing the events in minute detail, but it was also essential to display humility: “A man whose narratives could be credited as mirrors of reality was a modest man; his report ought to make that modesty visible” (p. 65). The production of knowledge, in other words, was not only embedded in a social order but also required a particular kind of subjectivity.
In the current tone debate, we recognize Boyle’s concern with the management of dissensus. Using Shapin and Schaffer’s framework, we analyze how those engaged in the debate see the relations between knowledge, social order, and subjectivity. How do they think that researchers should relate to each other, particularly when they do not agree? How do they relate to themselves, that is, what kind of person do they strive to be? What are the literary and social technologies involved in creating this social order and its attendant subjectivity? And what does this imply regarding the knowledge that psychology should produce? It is not our goal to give our own answers to these questions. We do not, for example, offer an opinion on where to draw the line between science and scientist, or between legitimate criticism and ad hominem attacks. Instead, we are primarily interested in where such lines are drawn by those engaged in the tone debate. In as much as we hold a position in this article, it is limited to Shapin and Schaffer’s contention that there is an intimate connection between knowledge, social order, and subjectivity.
We selected the material for our analysis based on its relevance for the theoretical framework described above and using our familiarity with the debate. We both occupy a hybrid position as participants in and observers of the crisis debate. Derksen has followed the crisis debate since 2011 as it unfolded in journal articles, in blog posts, on Facebook, and on Twitter. He has written several papers about it, some intended as contributions to the debate (Derksen, 2011; Derksen & Rietzschel, 2013), some intended as analyses of the debate (Derksen, 2019a, 2019b) Field studies the community using interviews, participant observation, and online ethnography, and has also published papers contributing to the discussion (Field & Derksen, 2021; Field et al., 2019, 2020).
There are ethical issues connected with the use of material from social media. Beaulieu and Estalella (2012) have argued that the traceability of digital data is one source of potential problems. “Being traceable, digital data holds the possibility of locating and identifying participants, sites, social interactions and the ethnographer herself, and makes particular issues such as anonymity, visibility, exposure, ownership and authorship especially prominent.” (Beaulieu & Estalella, 2012, p. 33) Because tweets are considered by some to be private communication (albeit on a public platform) rather than “publications,” we have asked permission to quote tweets. In most cases we received permission. Two people did not respond to our query, and we removed their tweets from the paper. In contrast, we treat blog posts as publications, and have not asked for permission to cite them. Finally, digital sources like blog posts and tweets are sometimes deleted (although blog posts may still be traceable using the internet archive). To our knowledge, there is no established practice as regards the archiving of textual Twitter content, despite the important issue of online source reusability. An analysis conducted by Zimmer and Proferes (2014) finds that few articles which explicitly use Tweets as data discuss ethical issues surrounding the retention and reuse of their data. We have archived the deleted tweets that we use in our analysis and we will share them on request, provided confidentiality is guaranteed.
Trash Talk
Rather than attempting to give an exhaustive account of the tone debate, we start by considering one of its most heated episodes, which contained most of the elements of the discussion as a whole. In September 2016, Susan Fiske, then president of the Association for Psychological Science (APS), decided to address the issue of tone in the reform debates. A draft of Fiske’s “presidential column” for the APS Observer was put online 2 , in which she decried how the field’s discourse had deteriorated through the use of new media such as blogs, Twitter, and Facebook. A culture of naming and shaming had developed, she wrote, in which civilized debate about psychological research had given way to “uncurated, unfiltered trash-talk” (Fiske, 2016a). Individual researchers and their careers were being damaged by ferocious online attacks. “I have heard from graduate students opting out of academia, assistant professors afraid to come up for tenure, mid-career people wondering how to protect their labs, and senior faculty retiring early, all because of methodological terrorism.” (Fiske, 2016a) Fiske preferred to keep these victims anonymous, so as not to fuel the “ad hominem smear tactics” (Fiske, 2016a).
The immediate reaction to Fiske’s draft column focused on her own use of offensive terms such as “destructo-critics” and “methodological terrorism.” The latter in particular was deemed unacceptable. In response, Fiske softened her tone somewhat in the published version of her column: “trash talk” became “denigration,” “bullies” became “people engaging in (. . .) toxic behavior,” “methodological terrorism” became “methodological intimidation” (Fiske, 2016b). However, the substance of her criticism remained the same. Clearly, Fiske believed that the social order in Psychology had been severely disturbed by an emerging culture of undisciplined debate. The proper way to manage dissensus requires, first of all, that debate should always be restricted to matters of substance, and steer clear of the kind of personal attacks that the “destructo-critics” revel in. Fiske’s ideal scientific community respects a strict distinction between the science and the scientist. Moreover, that distinction should be policed. Constructive criticism is of course very important, but it should always be “subject to editorial oversight and peer review for tone, substance, legitimacy” (Fiske, 2016b). Scientific debate should be moderated, curated, edited, reviewed, and controlled. To air your criticism in public, without prior peer review of its quality goes against the “ethical rules of conduct” in science. That also means that debate should be confined to the proper venues: “moderated channels” such as the traditional scientific journal, or social media groups that “monitor individual posts to ensure they are appropriate.” Although Fiske did not say so explicitly, that rule would exclude Twitter, where there is no moderation apart from that by Twitter’s own moderators—and they had not censured the trash talk that Fiske decried. This new social technology, enthusiastically adopted by many scientists, should have no place in the scientific community, because its radical openness (compared with a scientific journal, at least) enables debates that evade oversight.
There have been other flare-ups in the tone debate. In 2014, a controversy erupted over an attempt to reproduce the results of a study by Simone Schnall and colleagues (2008). Despite much larger sample sizes, the effect sizes in the three replication attempts were close to zero (Johnson et al., 2014). Schnall thought there was a fatal flaw in the results (a ceiling effect) and believed the paper should not have been published as it was. After some email correspondence, Schnall was eventually given space for a commentary by the editors of the special issue in which the replication study appeared. In May 2014, when the special issue had appeared, Schnall wrote a blog post describing her experiences. She criticized the process, detailed her doubts about the results, but most of all she expressed her concern at what she saw as an attack that had “defamed” her and damaged her reputation. She described the “mocking and bullying on social media,” the insinuations about questionable research practices, and the lack of collegiality (Schnall, 2014a). It reminded her of the culture of suspicion and spying in East Germany (Schnall, 2014b).
Not long after, the work on “power pose” by Amy Cuddy and colleagues came under fire (Carney et al., 2010). Their study purported to show that striking an expansive, open pose expressing power not only made participants feel more powerful but also made them take more risks and changed their testosterone and cortisol levels. Cuddy popularized this work in a TED talk that went viral, and a best-selling book. However, a replication study (Ranehill et al., 2015) showed no effect on risk taking and hormones. The power pose study and Cuddy herself were increasingly criticized, and became a symbol for all that is wrong in psychology in the eyes of the reformers (e.g., Gelman & Fung, 2016). Fiske’s denouncement of “destructo-critics” was likely also a reaction to the criticism of Cuddy, a former student of hers.
A final example: In 2017, the work of Brian Wansink, a successful and influential researcher of eating behavior, came under scrutiny after he had published a blog post in which he seemed to describe questionable research practices underlying his results. An extremely thorough inspection of Wansink’s work was performed by early career researchers Jordan Anaya, Tim van der Zee, Nick Brown, and James Heathers, showing numerous problems. Eventually, Wansink was fired for misconduct by Cornell University, but again the criticism from the reform community also led to concerns with “tone and attacks in blogs and elsewhere” (Brown et al., 2018, p. 3).
In all these cases, largely the same topics that were discussed in response to Fiske’s column were taken up. First, the dividing line between legitimate scientific criticism and inappropriate personal attacks is always the main issue. Everyone agrees that scientific debate should focus as much as possible on the validity of claims; on the truth of the matter. Critics should not make things personal, and those whose work is criticized should not take it personally. Commenting on the Schnall affair Michelle Meyer and Christopher Chabris (2014) quoted the movie The Godfather: “it’s not personal, it’s business.” At the same time, the psychology of psychologists, for instance their cognitive biases or the influence of the academic incentive structure on psychologists’ decision-making, is a central concern in the present crisis (Flis, 2019; Morawski, 2020). As the broader crisis debate is permeated with psychological topics, simply eliminating the personal element from criticism seems impossible. As Michael Inzlicht put it in a podcast: “Sometimes it does say something about the person if they keep making egregious sloppy errors.” (Inbar & Inzlicht, 2018 at 53:36)
Second, this basic issue of truth and subjectivity is constantly related to power. Who is in power is an important bone of contention. Fiske’s charge of “bullying” implied that the critics are powerful, but in response she has been described as a representative of the discipline’s old guard, defending their position of power (Gelman, 2016). “The truth is that we are in the midst of a power struggle, and it’s not between Fiske’s ‘destructo-critics’ and their victims, but between reformers who are trying desperately to improve science and a vanguard of traditionalists who, every so often, look down from their thrones to throw a log in the road.” (Chambers, 2016) Thus, the debate is also about who and what the people involved are: Who are the bullies, who are the victims, who are the reformers, and who is the old guard. In the debate, the parties involved and their relations are constructed.
Third, the tone debate, which takes place to a large extent on social media, is also about social media. Their open character has facilitated the reform movement, enabling lightning fast, quasi-global debate, where everyone, including early career and independent researchers can join in. But according to, for example, Fiske, same openness and lack of oversight hurt the quality of the debate. 3 True scientific discussion is moderated by editors and peer reviewers. Clearly, this too is a matter of the distribution of power. The infrastructure provided by the internet, and social media, in particular, has upset the power structure afforded by the traditional, journal-based ways of publishing in science. Informal conversation and gossip, formerly largely restricted to personal contacts and conferences, have become public and have increased immensely in volume. The democratization of debate that these new social technologies have brought has made the question of the relation between self and truth an urgent concern.
Etiquette
Since the self and its relation to truth is the primary issue in the tone debate, the question of how to improve this relation receives much attention. The answers can be distinguished by the degree to which they involve the self itself. They can be arranged on a continuum from doing to being, from codes of conduct to virtues. First, a common instrument or “social technology” (to use Shapin and Schaffer’s term) to govern the behavior of researchers-as-critics is a list, often bulleted, sometimes numbered, of rules of conduct.
An early example of such a list was proposed by Daniel Kahneman, in response to the Schnall controversy. Kahneman offered a list of rules, calling it “a new etiquette for replication” (Kahneman, 2014). Kahneman emphasized the personal aspect of replication studies. “Science is not a purely rational activity,” he noted: There are egos and reputations at stake, and feelings are easily hurt. Science and the self cannot be separated, and therefore we need “rules for the interaction of replicators and authors,” to be “enforced by reviewers.” The main aim of the four (numbered) rules is to make sure that the original authors are involved in replications of their work, so that they can fill the replicators in on all the little details of the procedure that are necessary to make the experiment work but that did not fit in the method section. The four rules spell out which information needs to pass between the two parties and when, and stipulate (rule #4) that the whole correspondence needs to be recorded so that reviewers can “evaluate the reasonableness of the positions taken by the two sides.”
As the tone debate heated up, lists of rules specifically for criticism started to appear. Uri Simonsohn, a widely respected proponent of reform in psychology, offered three rules for “civil criticism” (Simonsohn, 2016). Drawing on his own history of uncivil behavior he urged his fellow academics to describe rather than label what they object to; not to speculate about motives; and to contact the person they criticize and ask their opinion first before going public. Thus, the first two rules aim to keep the debate as business-like and impersonal as possible, but the third rule suggests that a personal relation is a good vehicle for civil criticism. As a technique to maintain one’s civility, Simonsohn advised a mental excercise: Imagine going out for dinner with the authors and their parents after you have delivered the criticism.
Simonsohn’s advice not to make things personal is commonplace in this debate. Criticism “should not be personal” (Bishop, 2018, p. 437). It should use “language that focuses on the ideas rather than the authors” (LeBel, 2014), “criticise the science, not the scientist” (Bishop, commenting on Gelman, 2016), and “comment on studies, data, methods, and logic, not authors” (Brown et al., 2018, p. 2567). This rule was also implied in Fiske’s criticism of the ways in which the “destructo-critics” break the “ethical rules of conduct”: They “attack the person, not just the work” (Fiske, 2016b). At the same time, reformers like to point out that scientific work is always somebody’s work, and to some extent, criticism is inevitably personal. “(E)ven valid criticism implicates what the authors did or didn’t do, and it will likely be personally uncomfortable for them.” (Srivastava, 2018b) Or, as Simine Vazire put it, “when we critique a scientific claim, we are necessarily saying that the people making it are wrong.” (Vazire, 2017) Thus, as much as one would want to, in practice it is impossible to keep apart “the science” and “the scientist,” as Kahneman had realized before. One way to deal with this predicament is to start by aiming one’s criticism at the science, and only target the scientist once it is certain that he or she bears responsibility. We should not immediately assume malice, but if there is “water-tight evidence” that people are intentionally engaging in questionable research practices they should be called out (Bishop commenting on Gelman, 2016). Srivastava summarized this by saying: “We should have a low bar for talking about science and a high bar for talking about scientists” (Srivastava, 2018b).
Manners and Virtues
Historian of science Herman Paul (2018) has warned against reducing the history of research ethics to the history of its codification. He argues that a study of the scientific self is particularly relevant in a time when the demands placed on researchers intensify and change rapidly. The tone debate exemplifies that relevance. Although reformers and their critics still discuss rules of conduct, as the debate intensified character and virtue have become more prominent issues. Of course, bad character was a major issue in the tone debate from the outset, with reformers and their detractors alike calling each other names and/or calling out the name-calling. The irony of Susan Fiske’s (2016) APS Observer column was that she did both: She criticized the personal attacks of the reformers, while calling them names such as “destructo-critics” and “data police.” In general, it is fair to say that the perceived ad hominems of the reformers have been met with a barrage of invective about the flaws in their character: “vindictive little bastards” and “human scum” according to anonymous psychologists cited in Bartlett (2018). (Roberts, 2018, provides a longer list.)
Positive traits are promoted in the tone debate, and two stand out. “Civility” is often mentioned as a crucial antidote to the harsh tenor of criticism. It is mostly presented not as a quality of people, but of scientific debates, like when Meyer and Chabris (2014) wrote that “these exchanges should, of course, remain civil.” Similarly, a petition that was signed by more than 600 psychologists after Fiske’s attack on the destructo-critics called for civil (as well as open, critical, and inclusive) scientific discourse (Coan et al., 2016). Thus, civility is talked about primarily as a manner, an aspect of behavior, rather than as a virtue of the self, a way of being. One should debate “with civility.” In as much as one should “be civil,” it need not be a permanent way of being. (This shallowness is of course one of the connotations of civility.)
The second quality that is often mentioned in the tone debate is “humility,” in particular “intellectual humility.” This too can be formulated as a quality of actions. When Alexa Tullett wrote that she is all for “more humility,” she was discussing “ways of correcting ourselves and each other” (Tullett, 2016). Similarly, Brian Nosek has spoken of the need to adopt “a stance of humility” (Aschwanden, 2016), which leaves open the possibility that, or even suggests that, this stance is temporary. In a later interview with Aschwanden, however, Nosek said he saw “a real sense of intellectual humility” in the community (Aschwanden, 2018). Humility fits well with the Popperian emphasis on falsification that many reformers share (Derksen, 2019a). Scientists should be open to the possibility that their hypotheses are false; indeed, they should do everything to test them. Etienne Lebel’s blog is titled “Prove yourself wrong,” and Simine Vazire’s blog “sometimes i’m wrong.” Being wrong is almost itself a virtue.
It is noteworthy that civility and humility are, in a sense, complementary virtues: Civility is being circumspect regarding the other (do not get unnecessarily personal, do not hurt others’ feelings) whereas humility is being circumspect regarding yourself (you could be wrong or biased, do not feel superior). They are both virtues that regulate the relation between subjectivity and truth, but whereas civility is geared toward the other(s), humility characterizes a relation to the self.
Self-Transformation
If doing good science requires being a good scientist with scientific virtues, the question naturally becomes how these virtues can be acquired and strengthened. Self-transformation has become a topic of the tone debate. Brian Nosek tweeted that “part of intellectual humility is cultivating a sense of interest (at least) for results counter to expectation/desire” (Nosek, 2019). Indeed, Scott Lilienfeld has suggested the crisis has been instrumental in fostering humility: “the replication crisis highlights the operation of psychological science at its best, as it reflects our growing humility.” (Lilienfeld, 2017, p. 660) The transformation of “our” character that the crisis has wrought according to Lilienfeld is reflected in several stories of epiphany and conversion that reformers have shared. Daniel Lakens, now a prominent reformer, described in a blog post how, after publishing his first ever article, he received an email from a methodologist who called the results “too good to be true.” At the time, Lakens was upset. He did not like the tone of the email, because it seemed to imply fraud. Eventually, however, he came to realize that the criticism was justified. “In keeping with the times, we had indeed performed multiple comparisons without correcting, and didn’t report one study that had not revealed a significant effect” (Lakens, 2016). In hindsight, that email motivated him to improve his research and fully commit to reform. “(I)t took slightly hurtful criticism for me to really be motivated to ignore current norms in my field, and take the time and effort to reflect on what I thought would be best practices.” (Lakens, 2016) Similar conversions have been described by Uri Simonsohn (2016), and by Michael Inzlicht (Inzlicht & Inbar, n.d.), who admonishes his colleagues that “the only way we can really change is if we reckon with our past, coming clean that we erred; and erred badly” (Inzlicht, 2016).
Conversion requires confession, and admissions of error are always highly praised by reformers. For example, reformers contrasted Dana Carney’s (2016) frank and unreserved acknowledgment that her research on power pose was problematic with what they saw as the defensiveness of her co-author Amy Cuddy. Another example: Will Gervais reviewed one of his own early studies, and characterized it as a “small, cute, barely significant experiment.” When another team conducted a replication with a much larger sample the original effect unsurprisingly disappeared. To Gervais, the experience was part of what he called his “Methodological Awakening (tm)” (Gervais, 2017). One commenter called the post “amazing”; another wrote it was “downright inspirational” (Calin-Jageman and Funder, in Gervais, 2017). To facilitate such admissions, Julia Rohrer and her colleagues created the Loss of Confidence project, intended to “help normalize and destigmatize individual self-correction” (Rohrer et al., 2021, p. 2). They invited researchers to submit short statements in which they described a finding they no longer believed in, that is, a result they had lost confidence in. The project drew a lot of attention and praise. A similar celebration of humility is Veronika Cheplygina’s “How I fail” series of interviews with researchers, which she publishes on her blog. To Cheplygina, “failures” include rejections of papers and grant applications, but also personal, moral failings like bad mentoring (Cheplygina, 2019). Most interviewees restrict their answers to the first, more easy to acknowledge kind. 4 But these failures too can have life changing, self-transforming impact. As one interviewee said of a particularly negative set of reviews: “I look back on that grant now and credit the reviews with shaping my perspective on granting and writing as well as showing me that good scientists must be humble and persistent. Humility and persistence were my two biggest lessons.” (Mike Yassa, in Cheplygina, 2017).
To summarize: Although most people engaged in the tone debate agree that one should be cautious about “getting personal,” at the same time there is more and more attention to the researcher as a person. In the tone debate, there has been a shift from straightforwardly demarcating self and science (never get personal, no ad hominem attacks) to defining and fashioning a self that can negotiate the boundary between self and science better because it is cautious: Humble and civil, that is, aware that on both sides of the debate there are selves involved that cannot be entirely disconnected from the topic at hand, but must be treated with circumspection.
Community, Diversity, and Inclusion
What we have described so far may give the impression of a community of reformers working together to devise new ways of being critical and getting along at the same time, a community trying to transform itself into a collective of critically, but civilly collaborating researchers. There is, however, considerable disagreement about this goal, as well as about the community itself. There are, first of all, a number of people who are skeptical about the concern with tone itself. It has been argued that criticizing tone can be a “derailment tactic” employed by the discipline’s old guard, intent on protecting the status quo (Srivastava, 2018a). 5 We should not fall into the trap of debating tone, Chris Chambers has warned, because that will only divert attention away from the necessary reforms. A tone debate, he wrote, is precisely what the opponents of reform want: “whenever you argue about tone you are playing your opponents’ game by your opponents’ rules. Although we tear ourselves sideways worrying about such trivial nonsense, they are smirking from their thrones.” (Chambers, 2017) Instead, Chambers favors straightforward, unadorned criticism—honesty rather than rhetoric. “(M)y tone on here [social media] is who I am” (Chambers, 2018). Tal Yarkoni has similarly argued that “there is no ‘tone’ problem in psychology” (Yarkoni, 2016). He argued that we should accept that tension will always exist between doing good science and having good relationships: Science requires being critical, and criticism is hard to take. As scientists, we cannot always be nice to each other. One need not accept being yelled at, but to be told one’s research is flawed is an inevitable part of the life of a scientist. We have to accept the pain that comes with criticism (Tullett, 2016). Such arguments also imply an ideal scientific self: One characterized by a kind of rugged honesty.
Others, however, have warned that this kind of subjectivity and the behavior associated with it can be exclusionary. Olivia Guest coined the term “bropenscience” to denote the aggressive machismo that she and others observe in open science circles. “Within the open science movement a bro will often be condescending, forthright, aggressive, overpowering, and lacking kindness and self-awareness” (Whitaker & Guest, 2020, p. 35). Guest was not the first to note that “tone” is also a gendered issue: In 2015, blog post Alison Ledgerwood, Elizabeth Haines, and Kate Ratliff had already pointed out that men were a lot more vocal in the reform debate than women (Ledgerwood et al., 2015). One of the reasons for this had to do with tone, they argued. The “heated, polarizing, and often angry or moralizing” character of the discussion suits men better than women, who are traditionally not supposed to talk this way. Thus, the kind of “competitive discourse” that is common among reformers hurts women and helps men (Ledgerwood et al., 2015). Iris van Rooij has proffered a radical rejection of such competitive forms of debate, saying that “(w)e can do without criticism in scientific discourse” (van Rooij, 2019b; see also Devezer, 2020). Criticism is appropriate where it regards the exclusion of underrepresented minorities from science, and any “tone policing” of their criticism is unacceptable. But in scientific matters, we should practice not criticism but critique, “a thorough analysis or reflection of some kind, usually putting forward arguments for or against some position, or advancing a new, overlooked position” (van Rooij, 2019a). Such a form of academic discourse also requires a specific virtue: not civility, but kindness (Devezer, 2019; van Rooij, 2020; Whitaker & Guest, 2020; but see Perfors, 2020 for a critique). An important voice here is that of Danielle Navarro, who has repeatedly pleaded for “epistemic modesty” and “kind critique: “Acknowledging our own flaws requires us to avoid harshness in how we evaluate the work of others; to the extent that we begin to endorse a culture of harsh criticism, we encourage others to be competitive, defensive and hostile. This is the antithesis of what we should desire in a scientific process, I think” (Navarro, 2020, p. 12). It is worth noting that Van Rooij, Devezer, Navarro, and Guest, while sharing many of the reform community’s goals, do not identify as members of the community, in part because of what they see as its exclusionary character. Their critique complicates the “us versus them” picture of a reform community engaged in a battle with old guard traditionalists. As van Rooij (2018) put it: “[I] have a sense that the dominant narrative is built on a false dichotomy and the incorrect impression there are only 2 groups: Status quo defenders vs. reformers. Not everyone who comments there is an exclusionary vibe is part of establishment.” Another reason for Van Rooij, Devezer, Navarro, and Guest to reject the dichotomy of old guard versus reformers is that they believe that the reform movement has focused far too much on replication and methodological reform, and not enough on theory. The reform movement, in their view, shares with the defenders of the status quo a focus on experimental results, whereas the goal of psychology should not be the production of phenomena in the lab, but the theoretical explanation of psychological capacities. Methodological improvements are not a solution to the weakness of many theories in psychology (see a.o. Guest & Martin, 2021; Szollosi et al., 2020; van Rooij & Baggio, 2021).
Analysis: Knowledge, Self, and Social Order
In their study of Boyle’s natural philosophy (and Thomas Hobbes’ critique of it), Shapin and Schaffer (1985, p. 15) argued that “different solutions to the problem of social order encapsulate contrasting practical solutions to the problem of knowledge..” As we pointed out in the introduction, for Boyle the ideal scientific self, characterized by humility rather than dogmatism, was the third term in the equation. Thus, natural philosophy involved a particular conception of knowledge, self, and social order, and the material, literary, and social technologies to realize this conception in practice. Using Shapin and Schaffer’s analysis as a model, the question then is what the various positions in the tone debate imply regarding the knowledge that psychology should produce, the social order of the discipline, and the kind of people psychologists are supposed to be.
Just as it was for Robert Boyle, the management of dissensus is a major concern in the present crisis in psychology. This is a marked difference with the previous crisis in psychology in the 1960s and 1970s, when, as Jill Morawski pointed out, debate concerning the self and the conduct of the researcher focused on what happened in the laboratory, not outside it (Morawski, 2020; see also Faye, 2012; and Sturm & Mülberger, 2012). At issue were, for example, experimenter effects and the ethics of deception in experiments, not how psychologists should get along. The debates in the previous crisis were at times quite heated: For example, some social psychologists were very upset by Ken Gergen’s “Social Psychology as History” paper, in which he argued that social psychology should not try to emulate the natural sciences, because its subject matter is inherently historical (Gergen, 1973). E.E. Jones called this an “intellectually irresponsible invitation to despair” (Jones, cited in Blank, 1988, p. 653). But there was no tone debate in that crisis, tone did not become a topic of its own.
Fiske’s criticism of the use of new online media for scientific debate is an indication that the tone debate is in part a response to the rise of new (especially online) fora. The members of the current reform movement are very much part of an online community for which Twitter (and to a lesser degree, Facebook) is an essential medium of interaction, and discussion makes up a large part of that interaction. Debate did not enter academia with Facebook and Twitter, but such social technologies have greatly expanded the possibilities for engaging in conversation with colleagues and other academics. This online debate is very open: Anyone with an account can join in, including researchers from other fields and non-academics. There is, moreover, little formal control over the interaction. Both Facebook and Twitter have policies in place for sanctioning users who are deemed to have overstepped the boundaries of propriety, and Facebook groups can be moderated 6 , but other than that management of the interaction is left to the users themselves. Thus, academic debate in and on psychology has changed in three ways: There is more of it, it is much more open (engaging a wider and more diverse group of people), and it is largely non-hierarchical.
The proliferation and democratization of debate that social media have facilitated may go some way toward explaining why tone has become such an issue. The more debate there is, the more chances for conflict, and the more diverse the participants in the conversation are, the greater the risk of misunderstanding. An unfortunate phrase or a bad joke can easily spark an angry reaction, which, absent a moderator to calm everyone down, can quickly become a full-blown conflict. A study by Roos et al. (2020) offers an explanation for this tendency of online discussion to escalate to conflict. They propose that in face to face communication potential conflicts over divisive topics can be managed through vagueness and ambiguity, and by immediate but subtle feedback (a frown, a brief silence) indicating disagreement, allowing instant repair. Online, however, there often is not an immediate response, leading to a disjointed, asynchronous, or semi-synchronous conversation. People compensate for this absence of social cues by making their messages brief and unambiguous (one might say “curt”). Online conversations suffer from “abundant clarity” (Roos et al., 2020, p. 904). As a result, maintaining good relations in the face of disagreement becomes more difficult.
In academic conversation, this tension between relations and disagreement is heightened because criticism and correction play such an important role in knowledge production. The management of dissensus is a major challenge for any scientific community, as Boyle realized. But we also have to account for the central issue of the tone debate: the self and its relation to truth. Time and again the tone debate revolves around the question of whether, and if so when, one may “get personal” in a scientific discussion. Fiske’s diatribe against “naming and shaming” and “ad hominem smear tactics” is a prime example (Fiske, 2016b). To understand why the self and its relation to truth has become such a divisive issue it may help to return to Shapin and Schaffer’s analysis of Boyle’s rules for an experimental philosophy.
What Boyle aimed at, according to Shapin and Schaffer, was to protect the status of matters of fact as objective givens. Once an experimental result was vouched for by gentlemen of indubitable character, either because they witnessed it in person or because they could “virtually witness” it through the experimenter’s prolix description, it was beyond dispute. Debate should be about the interpretation of facts, not about the facts themselves. In the current replication crisis, however, it is precisely the results of experiments that are time and again called into doubt. A failure to reproduce the result of an earlier study makes its status as fact at least debatable. If a result is not reproducible, it is no longer an objective given, turning instead into a product of fallible, human work. As Shapin and Schaffer put it, “(r)eplication is the set of technologies which transforms what counts as belief into what counts as knowledge.” (Shapin & Schaffer, 1985, p. 225) The failure to reproduce an earlier result in a replication study threatens to turn what was thought to be objective knowledge back into subjective belief.
One way in which replication failures bring subjectivity to the fore is through the role of expertise in experimentation. Some of the most heated discussions during the crisis in psychology have been about the purported importance of specialized expertise. When, for example, Roy Baumeister (2016, p. 156) used terms like “flair,” “intuition,” and “skills and talents” to explain why some get results in experiments while others do not, he was mocked and criticized. Such an emphasis on expertise is at odds with the ideals of open science that the reformers advocate: Expertise (and a fortiori talent) is inherently “closed” in the sense that is not shared by everyone. Concepts like “flair” and “intuition,” moreover, suggest they are tacit forms of skill, difficult or impossible to transfer other than by direct guidance from an expert, and this too seems to limit how “open” science can be. 7 Our point is not that it is impossible to give (tacit) expertise a role in open science, but that it is not evident how this should be done, and that any mention of such factors draws attention to the scientist, rather than the science. This is all the more the case when it is suggested that a replication failure is due not to the lack of skill of the replicators, but to the fact that the effect simply does not exist, although the original researcher(s) claimed it did. The suggestion of questionable research practices or outright fabrication often hangs over replication controversies.
Thus, one way or another, replication failure forces psychologists to discuss questions that implicate researchers not as rational and objective, but as fallible, imperfect people: merely incompetent or careless, perhaps sloppy or even intentionally fraudulent. Such discussions almost inevitably have an ad hominem character. Once the experimental results in the literature lose their status as matters of fact the bottom drops out of the debate and the morality and psychology of the people who claim to have “witnessed” the matters of fact is no longer off the table. In such a situation, the management of dissensus becomes crucial to make sure that criticism does not harm relationships, and relationships do not hamper criticism. In a field where the status of experimental results as matters of fact is regularly at issue it becomes necessary to explicitly discuss how criticism can be delivered and received without hurting relations. 8 As in Boyle’s experimental philosophy, knowledge in psychology is centered on matters of fact, on phenomena and results, and because these results are often difficult to reproduce, that scientific self becomes an issue. This problematic relation between knowledge and self then leads to a tone debate in which the self in its relation to others and in its relation to itself is discussed: How to criticize in a civil way, but also how to be civil or humble or otherwise virtuous as a scientist.
At the same time, the tone debate is not only a discussion among a group of colleagues about how to get along and manage disputes. The tone debate is part of a process in which the community itself gets defined. That happens informally when discussants share their ideas about the virtues of the ideal scientist, or share tips on how to become scientifically virtuous. But there are also formal instruments used to demarcate the community. One example of this is the petition, mentioned above, that was put online in the wake of the Fiske controversy: The text defines “our science” as one that needs both disagreements and a “style of discourse” that promotes the free exchange of ideas. By signing the petition, one commits oneself to such a science, and, interestingly, to continuing the discussion about the distinction “between appropriate scientific criticism and inappropriate harassment,” which makes the tone debate itself a defining element of the community (“Promoting Open, Critical, Civil, and Inclusive Scientific Discourse in Psychology—SPSP,” 2016). Another example is the code of conduct of that the Society for the Improvement of Psychological Science (SIPS, 2019) has drawn up. It includes a list of unacceptable behaviors including “intimidating, harassing, lewd, demeaning, bullying, stalking, or threatening speech or actions..” The code of conduct, one could say, gives an ethical definition of the community of reformers in psychology, one in which inclusivity is the main goal: Everyone should feel safe at SIPS. At the same time, code of conduct can be used to police the boundaries of the SIPS community. When Jordan Anaya expressed his dissatisfaction with the fact that issues of diversity (including “bropenscience”) were on the agenda for the 2019 conference, he was banned from SIPS conferences for a minimum of 3 years because he had used insulting language. 9
Conclusion
We have described how, in the tone debate that is occurring in and with the community of reformers in psychology, the relations between knowledge, social order, and subjectivity are discussed and redefined. There are many similarities with Robert Boyle’s efforts to create a community of experimental philosophers, defining how they should interact and communicate, what virtues they should display, and the kind of knowledge they should seek. The management of dissensus is obviously an important issue in both cases, as is the foundational status of facts (reproducible results of experiments), and the distinction between science and scientists. Moreover, for the 21st century experimental psychologists, just as they were for the 17th century experimental philosophers, humility and civility are key virtues. A key difference with the experimental philosophers of the Royal Society is that current psychologists have a new social technology at their disposal: online social media. Discussion on social media is in general difficult to keep from escalating into conflict, but we have argued that this problem is exacerbated in the crisis in psychology, because debates about replication failures often have a personal character. The inability to reproduce earlier results tends to raise questions about the ability and or morality of the original researchers or the replicators. When people start to lose confidence in matters of fact, which for Boyle were the stable anchor of the community of researchers, the social order that depended on a separation of science and scientist is disrupted. Once subjectivity in knowledge production becomes an issue, questions about the relations between scientists and about their relations with themselves will follow. We have shown how the debate developed from an emphasis on procedures for orderly discussion, often expressed in numbered or bulleted lists of steps to take, to a foregrounding of manners and virtues (civility and humility especially) and finally discussion about how to achieve the self-transformation needed to acquire such manners and virtues, with confessing failures frequently mentioned as salutary.
As the scientific self becomes an ever more prominent topic of debate, the limits and character of the community are also defined. As analysts, we see that many psychologists call for reform, that there is talk of a “movement” and a “community,” and that this movement or community is to some extent institutionalized, with for example a Society for the Improvement of Psychological Science and a Center for Open Science. At the same time, however, what the community is and who belongs to it remains a point of discussion, as we have shown. Who “we” are and who “they” are questions that are regularly discussed. Are “we” vulnerable, early career researchers, or are “they” vicious bullies? Are “we” academics who discuss their differences of opinion in a civil manner, or are calls for civility “their” last-gasp effort to resist much-needed change? Are “we” trying to reform Psychology, or are “they” no better than the people they criticize? Are “we,” reformers, a community, or are “we” simply scientists who do research as it is meant to be done? The tone debate is an important forum for the construction of such groups and categories.
The tone debate continues. We have shown that differences of opinion remain. Some reject the preoccupation with tone altogether and consider the tone debate a distraction from the real issues (replication, questionable research practices, publication bias, and so on). They are skeptical about civility and prefer unadorned, straightforward criticism. Others are skeptical about criticism itself, about the emphasis on polemical, competitive debate. They emphasize the virtue of kindness. As the debate about debate goes on, the connections between knowledge, self, and social order continue to be probed and redefined.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
