Abstract
This article investigates the inquiries and sanctions that followed accusations of fraud directed toward Dutch social psychologist Diederik Stapel in the early 2010s. Relying on the public reports published by the investigative committees, as well as on interviews conducted with committee members and Stapel’s former students and collaborators, we propose to analyze how this case facilitated the diffusion, in social psychology, of statistical rules that were hitherto unenforced in this field. The Stapel case thus illustrates the regulative role played by statistics in the contemporary scientific field while also demonstrating the appeal of legal modes of dealing with misconduct when it comes to the treatment of scientific deviance. More generally, this article shows how the study of scientific deviance can serve to bring to light symbolic hierarchies that are habitually kept tacit, thus serving as a magnifying glass for the scientific field’s inner processes.
Introduction
From Haruko Obokata in cell biology to Diederik Stapel in social psychology, to Yoshitaka Fujii in anaesthesiology and Olivier Voinnet in plant biology, accusations of scientific fraud seem to have become more prevalent and audible in the last couple decades. Once they become ‘affairs’ (Offenstadt and Damme, 2007), these cases lead to the expression of various types of judgment – sometimes legal, but more often symbolic, political, ethical, and so on (Biagioli and Lippman, 2020; Hesselmann, 2019; Mongeon and Larivière, 2016) – both within and outside the scientific field. Among the most detached observers, there is no denunciation of the individual scientists as such, but of their bouillon de culture, as Gabriel Tarde would have said. Hence, in an oft-heard narrative, the scientific field and the increased competition that it promotes are the actual culprits of the lack of integrity of some researchers (Anderson et al., 2007; Broad and Wade, 1982), although the empirical evidence supporting this causal link remains unclear (Fanelli et al., 2015). But whether they locate the roots of scientific fraud in individual scientists or in scientific institutions (or a combination of both), fraud narratives share a common ambition: ‘attempt[ing] to discover the “etiology” of the “disease”’, that is, ‘the causes of unwanted behavior’ (Becker, 1966: 22). It is thus no wonder that scientific ‘misconduct’, as it is now called, is being redefined as an object of criminological (Ben-Yehuda and Oliver-Lumerman, 2017; Faria, 2018; Hesselmann, 2019) or even epidemiological (Zuckerman, 2020) inquiry.
In this article, we propose to demonstrate that the interest of analyzing scientific deviance lies not so much in identifying the original sources of the disease, as in what these behaviors, and the ensuing reactions, reveal about the functioning of the scientific field. It is in this regard particularly noteworthy that the first steps of the sociology of science were marked by an attention for the question of deviance. This interest was perfectly visible not only in Harriet Zuckerman’s (1977) research but also in Robert K. Merton’s (1973), whose writings on the normative structure of science paved the way for a research program dealing with violations of the scientific ethos and the normative structure of science. Since that time, however, the problem of scientific deviance has been, with a few notable exceptions (Bechtel and Pearson, 1985; Ben-Yehuda, 1985; Ben-Yehuda and Oliver-Lumerman, 2017; Dubois and Guaspare, 2019; Faria, 2018; Hackett, 1994; Hesselmann, 2019; Hesselmann et al., 2017; Hesselmann and Reinhart, 2021; Larregue and Saint-Martin, 2019; Zuckerman, 2020), quite neglected.
The analysis of accusations of scientific fraud is nevertheless likely to make important contributions to the sociological literature, as it allows to interrogate the relations between science and normativity from a new angle. Because of its relatively recent constitution, the scientific field offers an adequate field of investigation to study the genesis of regulative systems. Yet, until now, the emergence of judicial logics within the scientific field has only attracted little attention, apart from a few works on the emergence of ethical norms (Jacob, 2019), article retractions (Hesselmann et al., 2017), or law-like corpus of documents aimed at controlling research misconduct (Faria, 2018: 160–170). Processes of sentencing, in particular, remain largely uninvestigated, perhaps because ‘institutions for dealing with scientific misconduct are fairly new and heterogeneous’ (Hesselmann and Reinhart, 2021: 415).
To address this gap, this article investigates the inquiries and sanctions that followed accusations of fraud directed toward Dutch social psychologist Diederik Stapel in the early 2010s. Relying on the public reports published by the investigative committees, as well as on interviews conducted with committee members and Stapel’s former students and collaborators, we propose to analyze how this case facilitated the diffusion, in social psychology, of statistical rules that were hitherto unenforced in this field. The fact that there exists no clear consensus on what scientific integrity is, and should be (Faria, 2018; Horbach and Halffman, 2017), requires to account for the structure of the scientific field itself. As Bourdieu (1975) underlined, the latter is the locus of a competitive struggle, in which the specific issue at stake is the monopoly of scientific authority, defined inseparably as technical capacity and social power, or, to put it another way, the monopoly of scientific competence, in the sense of a particular agent’s socially recognised capacity to speak and act legitimately (i.e. in an authorised and authoritative way) in scientific matters. (p. 19)
When drawing the boundaries between good and bad scientific practices, institutions dealing with scientific misconduct participate to construct a certain conception of what science should be.
It is thus important to analyze ‘which groups manage to impose their norms as legitimate, through what mechanisms, in which social spaces, and what resistance they encounter (or not)’ (Sapiro, 2020: 604). In this regard, the Stapel case exemplifies the regulative role played by statistics in the contemporary scientific field (Espeland and Stevens, 2008; Porter, 1996; Prévost, 2009) while also demonstrating the appeal of legal modes of reasoning when it comes to the treatment of scientific deviance. More generally, this research illustrates how cases of scientific deviance bring to light symbolic hierarchies that are often kept tacit, thus serving as a magnifying glass for the scientific field’s inner processes.
Investigating a de-singularized fraud case
As cases of scientific fraud are characterized by their impenetrability (Hesselmann and Reinhart, 2021), gathering data on the working methods of institutions dealing with such matters constitutes in itself a particular challenge. Even when ad hoc committees are formed and their findings publicly disseminated through reports and press conferences, we can only access information that the investigators deemed relevant and worth communicating. Hence, although the Stapel case provides abundant documentation for an empirical study of this kind, which is quite rare, one should interpret them cautiously. While we shall mainly analyze the reports of the ad hoc committees which investigated Stapel’s actions in three Dutch universities (University of Amsterdam, University of Groningen, Tilburg University), the following materials were also available: Stapel’s scientific articles, interventions by so-called ‘watchdogs of science’ (Didier and Guaspare-Cartron, 2018), including PubPeer and Retraction Watch, reactions from the field of psychology (researchers, journals, associations, etc.), or else the autobiographical account that Stapel published after his resignation, which was then translated into English (Stapel, 2014).
To triangulate the data gathered from these archives, we also conducted 14 semi-structured interviews with researchers linked to the Stapel case, either because they participated in the work or were members of the ad hoc committees (n = 6), or because they were interrogated by these committees for having collaborated with Stapel (n = 8). While they are not analyzed in this article, we also conducted 12 complementary interviews with researchers in social psychology who have either dealt with other cases of fraud or have been actively involved in the reform movement that has affected the discipline over the past decade. The 14 interviews focused mainly on the working methods of the committees, on the statistical dimension of the inquiries, on the definition of deviant scientific behavior, and, more generally, on the production of knowledge in social psychology, both before and after the Stapel case. The sensitive nature of the issues addressed influenced the interviews’ dynamics, as several persons expressed fears as to the negative consequences that their participation could have on their academic career. The contextualization of the interviews is therefore voluntarily kept to a minimum to guarantee the participants’ anonymity.
The interest of the Stapel case does not only lie, however, in the richness of the empirical material that we were able to gather, but also in its normative significance and resonance within social psychology and the scientific field more broadly. Far from being restricted to Stapel’s fraud stricto sensu, the investigation led by the ad hoc committees constituted an opportunity for a more systematic examination of social psychology. In an article eloquently entitled Psychology’s Renaissance, three researchers who have been very active in the statistical reform movement in psychology see in the Stapel affair one of the five igniters of this ‘spiral of methodological introspection’ (Nelson et al., 2018: 512). Not only because of the seriousness of the fraud committed by Stapel himself, but also because the investigation ‘uncovered problematic methodological practices even in studies that were not fabricated’ (Nelson et al., 2018: 513). To be sure, the significance of the Stapel case is dependent upon the meaning that scholars invested it with. Hence, when Nelson and his colleagues pinpoint it as foundational of ‘psychology’s renaissance’, this affair effectively constitutes a rhetorical device aimed at legitimating further examination and reform of scientific practices in psychology.
How is it, then, that the Stapel case became such a device? Besides the (relative) transparency of the investigations, the seriousness and certainty of the frauds, as well as the professional stature of Stapel (three elements that, combined, distinguish this case from relatively lower tier affairs such as that of Nicolas Guéguen, Dirk Smeesters, or Jens Förster), two other factors made it a compelling motive for reforming social psychology. One has already been mentioned: although the investigations were initially presented as a reaction to accusations directed toward Stapel, the work and conclusions of the ad hoc committees eventually encompassed a much larger spectrum. As one interviewee would explain to me, the committees ‘were more interested in, let’s say, the research culture in general’ (interview 13). Hence, two entire chapters of the final report (Levelt Committee et al., 2012) are devoted to the ‘scientific culture’ in which Stapel evolved. This allowed a de-singularization (Boltanski et al., 1984) of the affair: Stapel’s deeds were not just a matter of individual responsibility anymore, they said something about social psychology as a collective scientific endeavor.
One way this generalization was achieved was through the statistical dimension. This is the second factor that led the Stapel case to become a rhetorical device for denouncing ‘questionable research practices’ in social psychology. As they were based on experimental, quantitative analyses, the Dutch psychologist’s publications fell squarely within the jurisdiction of statistical experts. This case thus represented an opportunity for statisticians to impose their own views of what good science should be. In so doing, committee members extended the scope of application of rules which were, for a good part, hitherto ignored in social psychology. To be sure, this does not mean that social psychologists did not pay attention to methodological soundness before the Stapel case. But as we shall see, the fraud investigations constituted a channel through which new rules would be introduced and enforced, and through which this could be done without obtaining social psychologists’ approval beforehand.
Constructing independent judge(ment)s
‘Genius’, ‘golden boy’, ‘charismatic’, ‘inventive’, ‘brilliant’: these are some of the terms that the final report used to describe the reputation that Diederik Stapel enjoyed at the different universities he attended until his denunciation. After completing his doctorate in psychology at the University of Amsterdam for which he received an award from the Association for Dutch Social Psychology (ASPO), Stapel became a professor of social cognitive psychology at the University of Groningen in 2000. A few years later, in 2006, he left Groningen to join Tilburg University, where he was soon appointed dean of the Faculty of social and behavioral sciences (in 2010). So far, so good for the ‘golden boy’ of social psychology.
But things would quickly evolve. At the end of August 2011, following informal conversations, two graduate students and an early-career professor submitted a report on Stapel’s actions to the head of the Department of Social Psychology at Tilburg University, Marcel Zeelenberg. Zeelenberg then informed the university’s Rector Magnificus, Philip Eijlander, who decided to talk to the ‘whistleblowers’ (Levelt Committee et al., 2012: 9; see also Bhattacharjee, 2013). In early September 2011, the executive committee of the university suspended Diederik Stapel until more would be known.
On 9 September 2011, the provost formed the Levelt Committee, which was charged with ‘investigating the extent and nature of the breach of scientific integrity committed by Dr. D.A. Stapel’ (Levelt Committee et al., 2011: 3). This mission was to involve two main tasks. First, ‘the committee will examine which publications are based on fictitious data or fictitious scientific studies and during which period the misconduct took place’. Second, ‘the committee should form a view on the methods and the research culture that facilitated this breach, and make recommendations on how to prevent any recurrence of this’ (Levelt Committee et al., 2011: 3). To cover Stapel’s past appointments, two other committees were formed: one at the University of Amsterdam, where he did his doctorate (1993–1999), and a second at the University of Groningen, where Stapel held a professorship for 7 years before joining Tilburg (2000–2006).
Two main reports would be published as a result of the committees’ investigations. These reports were prepared in coordination by the three committees. A first interim report was published on 31 October 2011 (Levelt Committee et al., 2011). It dealt primarily with events that took place while Stapel was employed by the University of Tilburg, which can be explained by the fact that the Levelt committee started its activities earlier than the Groningen and Amsterdam committees. The final report, which is about 100 pages, was published on 28 November 2012, more than a year after the beginning of the investigations (Levelt Committee et al., 2012).
Before addressing the content of the investigative work of the three committees, it is useful to analyze their composition. This shall shed light on who was entitled to deliver authoritative judgments on matters of scientific integrity and psychological research at the time, in the Netherlands. The Levelt, Drenth, and Noort committees are all composed of three or four tenured faculty members who hold or have held important administrative positions within the academic field. Some of them were employed by the university in which the committee was formed, while some were external members. In addition to their full-time members, each committee had research assistants that helped with the ‘dirty work’ (Hughes, 1962), a coordinating secretary, and one statistician or more. The social status of this support staff was markedly lower than that of the full-time committee members (six of them were non-tenured doctors at the time).
Let us delve further into the composition of the committees. The Levelt committee (Tilburg University) was headed by Pim Levelt, a psycholinguist from the Max Planck Institute and former president of the Royal Netherlands Academy of Arts and Sciences. It included two other full-time members: Marc Groenhuijsen and Jacques Hagenaars, professor of criminal law and professor of social science methodology at Tilburg University, respectively. On the Groningen side, the committee was headed by Ed Noort, professor emeritus of Old Testament and former dean and member of the Royal Netherlands Academy of Arts and Sciences. At the time, Noort was also directing the Scientific Integrity Committee of the University of Groningen. The Noort Committee included two other members: Herman Bröring, professor of administrative law and member of the Committee on Scientific Integrity of the University of Groningen, and Jules Pieters, professor of applied psychology at the University of Twente and former director of the Association for Dutch Social Psychology. Finally, the Amsterdam committee was headed by Pieter Drenth, professor of psychology, former Rector Magnificus of the Free University of Amsterdam, and former president of the Royal Netherlands Academy of Arts and Sciences. Other members included Jaap Zwemmer, scientific integrity advisor and professor of tax law at the University of Amsterdam, where he was also Rector Magnificus; Len de Klerk, professor of urban planning and former dean of the University of Amsterdam; and Chris Klaassen, professor of mathematical statistics at the University of Amsterdam and former vice-dean of the University of Amsterdam.
Listing the members of the three committees makes it clear that the three universities sought to assert the legitimacy of their final conclusions and recommendations by calling upon personalities who already enjoyed ample symbolic recognition within the scientific field. This is especially true of the three chairpersons, who were partly chosen for their reputations and the intellectual independence that supposedly derives from such capital. A committee member would thus reflect on the selection process as follows: What we really thought . . . What we do need is an excellent scholar, somebody who has a lot of experience, has a very good name. And . . . Well I knew Pim Levelt, and he was . . . I mean he’s an excellent scholar, he . . . He’s known as a very independent person both nationally and internationally. (Interview 1)
Before turning to the issue of independence raised by this interviewee, another element to emphasize is the judicial dimension given to the procedure. The presence of professors of law eloquently conveys the legal model that guided the treatment of Stapel’s deviance. In fact, the terms of reference specified that the Groningen and Amsterdam committees should ‘offer a view on the possible legal and other consequences’ (Levelt Committee et al., 2012: 8–9) of his actions (this was not explicitly stated regarding the Tilburg committee). It should thus come as no surprise that, from the start of the procedure, Stapel hired a lawyer to defend his interests. While this background presence constitutes a good illustration of the penetrating force of law, it is also a possibility that the universities and their committees mimicked legal proceedings with the hope that this would reinforce the legitimacy of their verdict. Hence, law can be seen as both a constraint and a resource.
In any case, statistics served as a transmission line between scientific and legal repertoires, which can be traced back to the growing use of scientific expertise in criminal proceedings, especially since DNA was accepted as a reliable source of evidence (Aronson, 2007; Lynch et al., 2009). In a Janus-faced logic, statistics served both as a scientific apparatus – differentiating good science from ‘sloppy science’ (Levelt Committee et al., 2012: 5) – and as a legal apparatus, namely, bringing evidence of Stapel’s fraud. The same way that experts are called in courts to evaluate the likelihood that the DNA sample found on a crime scene is that of the defendant, the committees’ statisticians quantified the probability that Stapel’s findings, as exposed in his publications, might have been the result of sound, non-faked data. One of the statisticians even attributed his participation in the committee to his expertise in, and interest for, forensic science: A couple of years before that I was involved in a new master program at the University of [X] within the science faculty, namely forensic science. And forensic science is of course about interpreting evidence in a crime. And well you need statistics for that. (Interview 11)
The disciplinary composition of the committees is also worthy of consideration when it comes to the issue of independence, especially as it reveals the under-representation of psychologists: out of ten members, only three were specialized in psychology broadly speaking (one in each committee). Given that social psychology is often criticized by other psychologists for being too broad and not scientific enough (Peterson, 2017), it is worthwhile to take a more precise look at the research specialties of these three people. Doing so, we observe that social psychology is not represented at all within any of the committees, although this was Stapel’s main field of interest. One specializes in psycholinguistics (Levelt), another in applied psychology (Pieters), and the last one in work and organizational psychology (Drenth). According to Web of Science records, none of them published in the Journal of Personality and Social Psychology, one of social psychology’s main scientific venues in which Stapel published no less than 14 articles. While this partly stems from a desire to ensure the independence of the procedure vis-à-vis Stapel’s entourage (the Dutch scientific field being quite small, especially if we restrict it to social psychology), it also appears that one indirect effect of the statistical focalization was to neutralize established disciplinary boundaries. This is made particularly clear in this statistician’s justification of the absence of social psychologists in the committees: It was impossible to . . . To have somebody from . . . From the same group I mean he was the chair of that group and that would have been impossible. And there is also the thing that, I mean, when it comes to fraudulent behavior or sloppy science you don’t need to be a social psychologist. I mean I can really see what is sloppy science, so to say. (Interview 1)
Social psychology, both as a body of knowledge and as a professional practice, is thus placed under the jurisdiction of statistics, a meta-knowledge that allows its representants to judge the soundness of research carried out in another discipline and without having to justify any knowledge of the objects studied in the said discipline. In addition to illustrating the diffusion of sanctioned rules within the scientific field, this phenomenon also reveals the importance acquired by quantification in the regulation of scientific activities, beyond metrics (Biagioli and Lippman, 2020; Gingras, 2016). But it also gives a hint about the place of social psychology in the Stapel investigation: in a way, we can argue that social psychology itself stands as one of the accused. For, as we shall see, the responsibility for Stapel’s frauds is not only individual. According to the committees, the research culture and thus social psychology are responsible as well. Following this logic, social psychology could not possibly be part of the investigations since this would have blurred the line between judges and defendants.
The mouth of numbers? The statistical definition of scientific deviance
Let us now turn to the content of these reports, starting with how the Stapel case was used to formulate an explicit definition of scientific deviance. In so doing, we realize that what might seem prima facie evident is the result of certain situated conceptions regarding what science should be. For beyond fraud per se, which the committees defined as ‘the fabrication, falsification or unjustified replenishment of data, as well as the whole or partial fabrication of analysis results’ (Levelt Committee et al., 2012: 17), the issue of ‘bad science’ (Levelt Committee et al., 2012: 17), that is, the ‘failure to meet normal standards of methodology’ (Levelt Committee et al., 2012: 5), was also at the core of the investigations. Two chapters (4 and 5) of the final report are thus devoted to analyzing how the ‘research culture’ and institutional environment may have permitted or even encouraged Stapel to act in this way.
The whole question was therefore to establish what these normal methodological standards are, a problem that is in many ways akin to the legal fiction of the ‘reasonable person’, as it is used in common law. And just like what constitutes a reasonable person in practice can greatly vary in case law (Gardner, 2015; Langevin, 2005), the determination of methodological standards that would be applicable in any scientific circumstances raises several questions. Instead of recognizing the inherently localized nature of scientificity, which varies from one field of research to another, the committees were to use the Stapel affair as a medium to standardize rules that, at the time of their investigation, were not recognized as binding in social psychology.
The first step, however, was to prove Stapel’s fraud, and here again statistics were paramount. Although the Dutch social psychologist admitted having fabricated or manipulated his data in about half of the reviewed papers (33 out of 65), the other half remained suspect to the investigators. Does the dataset appear to be authentic or does it present signs of tampering? Do the results of the statistical analyses appear plausible? Are the email exchanges between the authors of the article in line with the methodological protocol detailed in the manuscript? These are some of the questions that animated the work of the committees. The final report paid particular attention to ‘data and distributions that deviate from realistic ones’ (Levelt Committee et al., 2012: 20), through five main scenarios: (1) ‘chance variation that are too small’; (2) ‘effects and relationships that are too large’; (3) ‘unusual multivariate relationships’; (4) ‘dependent observations’; and (5) ‘other improbable analysis results’ (Levelt Committee et al., 2012: 21–23). According to the committees, if any of these cases were to be identified in an article, this may indicate that one of two following manipulations has been committed:
The data were real, but some unwelcome scores and/or subjects or statistically exceptional results were not included in the article, which can sometimes be legitimately justified.
The data were fabricated, in whole or in part, leading to statistical results that would (almost) never occur with real data.
The question was therefore to know what is plausible and what is not. For example, at what point can the effect of an independent variable on a dependent variable be considered ‘too large?’ Since normality depends on the research field – one would not expect to find the same effect sizes in nuclear physics and psychology, for instance – the committees had to refer to a standard relevant to social psychology. A first way of knowing what a reasonable Cohen’s d might look like was to peruse meta-analyses in psychology, which set this standard at ‘approximately 0.5’ (Levelt Committee et al., 2012: 21). To obtain a second measure, a team of master’s students supervised by a statistician from the Levelt committee assembled a corpus of experimental social psychology articles published in the European Journal of Social Psychology and the Journal of Personality and Social Psychology, two prestigious journals in which Stapel had published several articles. An examination of these 158 articles revealed that the average Cohen’s d was 0.69, with an average explained variance of 22% and less than 35% in 80% of the cases. Once this ‘context of evidence’ (Pinch, 1985) was established, it became possible for the committees to compare Stapel’s results to the average standards of his research field. The conclusions are quite transparent: The [statistical] relationships identified in various publications by Mr Stapel, with 85% or even 95% explained variance, therefore appear to be extremely rare; indeed, even 55% explained variance does not occur often. Such high values therefore call for further analysis and may indicate fraud. (Levelt Committee et al., 2012: 22)
Some of Stapel’s productions, however, are far more open to discussion and interpretation. The difficulty is particularly pronounced for the committee set up at the University of Amsterdam which, because of the period covered (1993–1999), does not have access to the original material used in the final publications. To get around this problem, statisticians from the committee developed a Bayesian model based on the data present in each publication and aimed at providing a quantified measure of the likeliness that the findings were obtained through legitimate methods of data collection. This statistical model, later rendered public (Klaassen, 2015), still required the investigators to answer the question of what constituted sufficient evidence, a problem that is again well known in the legal context and that is reminiscent of the issues surrounding ‘data and distributions that deviate from realistic ones’. But whereas the effect sizes found in Stapel’s publications could easily be compared to those found in other publications in his field, this time the committee members lacked the context of evidence, that is, a reference point to which Stapel’s Bayes factors could be compared. One consequence of such a loosened epistemic guide is that the interpretation of the Bayesian operations differed importantly among committee members, especially between statisticians and non-statisticians: I remember explaining, sort of explaining in the last meeting ‘okay, so this and this article has like a Bayes factor of, I don’t know, 10, 6′ or something like that. I think that was about the magnitude. There was nothing like a thousand, or a million, I don’t think. And I remember saying, ‘it’s like, yeah, this is modest evidence’, and there was somebody in the committee . . . [. . . ] there was at least someone in the room who was like, ‘what do you mean modest?! This is very clear, damning, bad evidence!’ [. . . ] Like I gave my personal opinion as a statistician, but it was essentially up to them to decide how . . .Yeah merge it all together and to make the formal . . . Yeah, judgement, verdict, I don’t know how you want to call it [laughs]. (Interview 5)
These disagreements illustrate the peculiar position of statistics within the committee. On the one hand, it is the main evidential resource of the investigators. On the other hand, because the actual work of reviewing the literature and running the statistical analyses was largely delegated to people of lower social status and who were not actual committee members, the final interpretation of the results may be at odds with what the principle of skepticism, as interpreted by a statistician, would have required. It is also understandable from the context of this judgment, which comes at a time when Stapel has already confessed certain violations of scientific integrity, that the committees did not necessarily seek ‘to accumulate irrefutable proofs, but only to reach probable presumptions’ (Boltanski, 2012: 37). Perhaps this was also an expedient way of evacuating the problem of Stapel to focus on a much broader issue, that of social psychology.
The scientific supreme court
Indeed, far from being limited to the case of Stapel stricto sensu, the work of the committees also focused more broadly on the prevalence of questionable research practices within social psychology as a whole. In other words, the actions of the Dutch psychologist provided an opportunity for a more systematic investigation of his research field. During the hearings of some of Stapel’s past students and collaborators, the committees managed to evacuate Stapel from the picture to focus instead on the scientific practices observed in social psychology. One of the persons interviewed by a committee recalls it as follows: Yeah, so I remember more that they [the committee members] were interested in sort of the general research culture. At that point, you know, Diederik had already admitted that at least some of those data was fabricated so there wasn’t really a question of his . . . Culpability. I do think that they would have asked, yeah, whether there were projects that we had together and I, I think they asked that of everybody and they were obviously trying to determine which of these papers needed to be looked at more closely, but then my memory is that most of their questions were about . . . Well in general, Diederik or no, ‘how do you guys [social psychologists] do research, what are your practices here, how do you make certain kinds of like analytic or data collection decisions?’ (Interview 13)
The analyses that the committees devoted to the scientific culture of social psychology would lead to strong disagreements and friction, not so much with Stapel himself than with the interviewed researchers but also, more generally, with representants of the discipline once the final report would be rendered public. Suddenly, long accepted practices in social psychology were qualified as questionable research practices by the statisticians and methodologists standing on the committees. At once a judge in the Stapel case and a party in the struggle for the definition of what science should be, statisticians were put in a position where they could reach a ‘verdict on individual verdicts’ (Bourdieu, 2016: 285) and, in so doing, effectively played the role of a scientific supreme court.
The boundary-work carried out by the committees between good and bad science has the particularity of taking the appearance of a neutral, objective statistical inquiry, far from the outpourings and outbursts that characterize the controversies traditionally studied by STS (Science, Technology and Society) scholars (Gingras, 2014; Latour, 1993). Quantification is thus a way of reinforcing the authority of the scientific court (Porter, 1996: 8), which could be presented as the mere ‘mouth of numbers’, by analogy with the idea expressed by Montesquieu in The Spirit of Laws that judges should be ‘no more than the mouth that pronounces the words of the law’. Importantly, the universal appeal of quantification also conceals the extension, by the committees, of the scope of application of statistical rules that were hitherto unknown, or at the very least unenforced, in social psychology. As underlined by a social psychologist and former collaborator of Stapel, The questionable research practices that they [the committees] talked about, some of them were things we were just told to do in university or when we were PhD students. Some of them are questionable, some of them are debatable whether they are questionable or not. And at least these were not used by people in order to mislead the colleagues. One of the things that I think happened quite a bit, at least I have done that and I know all the colleagues that I know from my cohort have done this, and other cohorts as well: we run the study, you know the magical number is p [the value for statistical significance] smaller than 0.05, so you run the study and the rule of thumb was if you do an experiment, we had like 20 to 25 people per condition, and then you run the study and then it turns out that your p is 0.06. In that case you run like 10 more per condition, and then you stop. I mean we have done this for years! Of course I now realize . . . and there was an article by Simmons, Nelson, and Simonsohn in Psych’ Science in 2012 [(Simmons et al., 2011)] that very clearly demonstrated how bad these practices are. So these things we don’t do anymore, but these are things that people have done, and also that reviewers sometimes told us if you submitted a paper, they said, ‘well, you know you report something that is 0.06, why don’t you collect more participants and then make it significant?’ (Interview 10)
To be clear, these are not the views of a marginal, ‘sloppy’ researcher. In their final report, the committees also noted that several co-authors who did perform the analyses [of the reviewed articles] themselves, and were not all from Stapel’s ‘school’, defended the serious and less serious violations of proper scientific method with the words: that is what I have learned in practice; everyone in my research environment does the same, and so does everyone we talk to at international conferences. (Levelt Committee et al., 2012: 48)
Hence, the researchers interviewed by the committees cumulated two roles: that of direct witnesses for Stapel’s misdeeds, but also, as members of the incriminated scientific field, that of being themselves suspects of wrongdoings which, although not fraudulent, remained questionable in the committees’ eyes. These disagreements about what was and was not considered normal in social psychology at the time also came up repeatedly during our interviews. And this is perhaps where the main interest of the Stapel case lies: not the shocking and indisputable frauds, but this gray area where contradictory normative orders conflict with each other.
In this sense, the action of the committees can be analyzed as a ‘symbolic crusade’ (Gusfield, 1986) that is encapsulated in the following motto: ‘Science must be protected at all costs’. In a later publication, a statistician from the Amsterdam committee thus argued that contrary to the cardinal principle of criminal law, according to which any doubt should benefit to the accused (in dubio pro reo), in science ‘the leading principle should be ‘in dubio pro scientia’’ (Klaassen, 2015: 9). The moral nature of the committees’ activities was visible among its members, who sometimes succumbed to ‘irritation’ (interview 6). But it was even more perceptible among the interviewed researchers and students, who felt as if they were suddenly accused of being bad scientists or worse, potential fraudsters. The moral opposition between good and bad was thus invoked to distinguish between ‘honest’ and ‘ill-intentioned’ social psychologists, with Stapel serving as a reference point: one of the positive things that came out of this whole thing with Stapel is how aware we all became that it’s easy to . . . To be an ‘unintentional fraud’ [mimes quotation marks with his hands] I guess . . . To sort of like take the shortcuts and you don’t think necessarily you’re doing anything wrong, you’re just very enthusiastic about your research, right. I mean that’s . . . That was the vibe at the time, I remember that quite well. It’s like, ‘yeah no but look it’s almost significant you can, you can smell it’s there. So we’re just helping, we’re chafing off the rough edges and helping uncover the truth. And I mean back then, and maybe I’m an idealist here but I think a lot of people genuinely thought that way, felt that way. I don’t think they were, they were ‘bad’ [mimes quotation marks with his hands], I think Stapel was bad! (Interview 5)
One consequence of the extension of the scope of application of the statistical rules put forth by the committees is thus to necessitate new social distinctions between ‘good’ and ‘bad’ infringements; between those that were carried out in good faith and those whose primary purpose was deception. This properly moral dimension makes it possible to temper the symbolic violence represented by deviant labeling: suddenly, researchers who were behaving according to the rules of their discipline are accused of having ignored fundamental statistical rules. Putting aside the feeling of meaninglessness that may follow – some social psychologists have publicly expressed their loss of faith in scientific productions following the Stapel affair and other shocking events – this work of categorization constitutes a form of resistance to the social sanctions that the non-respect of these rules could justify.
Discussion and conclusion
Our analysis of the Stapel case allowed us to approach empirically the definition of deviance within the scientific field and the connected issue of rules’ diffusion and application. To be more precise, we showed how Stapel’s sanctioned behavior constituted an opportunity for the investigating committees to impose and apply statistical rules in a research field (social psychology) that was regulated on the basis of divergent methodological assumptions. Hence, far from being ‘no more than the mouth that pronounces the words of the law’, as Montesquieu’s famous aphorism had it, the committees actively participated to extend the scope of application of statistical rules. In so doing, they also transformed Stapel’s individual responsibility into a collective responsibility: the research culture of social psychology was as guilty – perhaps more, even – as the Dutch professor.
These findings contribute to the growing literature on scientific punitiveness (Didier and Guaspare-Cartron, 2018; Dubois and Guaspare, 2019; Hesselmann, 2019; Hesselmann et al., 2017; Hesselmann and Reinhart, 2021; Horbach and Halffman, 2017; Mongeon and Larivière, 2016). Recent research focusing on cases of retraction accurately observed that when ‘the invisibility of dealing with misconduct raises the suspicion of inactivity, or worse, cover-up, sanctions obtain central importance’ (Hesselmann and Reinhart, 2021: 433). While it does not contradict this observation, our analysis provides a more complex picture, as the handling of Stapel’s case resembled traditional criminal proceedings more closely. Not that secrecy and opacity were absent altogether: the investigations were kept secret for months, and we can assume that the published reports only constitute a publicly acceptable depiction of the events that took place. Yet, it is also quite clear that the University of Amsterdam, the University of Groningen, and Tilburg University endeavored to signal their transparency and conscientiousness in dealing with such sensitive issues.
Moreover, even though Stapel lost his job and symbolic capital, the reports seldom discuss sanctions. The focus on collective responsibility, through the overt criticisms of scientific practices in social psychology, effectively led to a de-singularization of the case. In fact, some interviewees have indicated that, in the aftermath of the Stapel case, a higher level of scrutiny was placed on psychological research done in the Netherlands. New policies were implemented in some departments across Dutch universities when it came to data storage and accessibility or graduate students’ training. At the international level, it also prompted the scientific community to be more cognizant of ‘questionable research practices’. Some scholars even began to endorse the role of ‘watchdogs’ (Didier and Guaspare-Cartron, 2018): scrutinizing the published literature, sometimes armed with forensic-like tools such as Statcheck (which, interestingly, was developed by researchers from Tilburg and Amsterdam), and then reporting back to the concerned journals while also alerting peers on their personal blog when they uncover something worth a more detailed investigation. Whereas whistleblowers in the Stapel case were close collaborators, independent scholars with statistical skills are now trying to unearth fraudulent behaviors. This is what happened in the case of French social psychologist Nicolas Guéguen, whose papers were initially scrutinized by ‘data thugs’ Nick Brown and James Heathers (Marcus and Oransky, 2018), or else in the Dirk Smeesters affair, in which methodologist and open science advocate Uri Simonsohn played a key role. From local concerns to be dealt with internally at the university level, scientific misbehaviors thus became collective problems affecting the entire discipline.
This focus on collective responsibility is directly related to the rise of ‘statistical objectivity’, whose central feature is the ‘projection of debates about objectivity and subjectivity onto the patterns of results produced by collections of studies rather than the methodological details of individual studies’ (Freese and Peterson, 2018: 290). Consequently, the target of criticisms somewhat shifts, from individual scientists to scientific institutions and collectives. One important contribution of the Stapel case is to shed light on the fact that episodes of fraud are not only investigations about the merits of accusations, but also moments when the very content of scientific rules is disputed and their scope extended. It also brings out the social authority that statistics, and thus recognized statisticians, can leverage when such events occur. By being able to dictate what good science is, while ignoring the difficulties and idiosyncrasies of social psychological research, statistics serves both as a meta-knowledge and as a meta-set of sanctioned rules. Yet, because this regulative body is not universally known and enforced within the scientific field, nor completely consensual, even among specialists, statistical laws can, as the Stapel case illustrates, conflict with locally recognized rules (as also illustrated by the case of medicine: Marks, 2000: 136–163).
To be clear, this also raises questions that further research could fruitfully address. It begs the question of how scholars come to know the rules of the game, that is, how their socialization affects how they perceive and practice science. The fact that researchers behave in certain ways in some given circumstances does not mean that they have reflexively learned explicit rules that then come to guide their daily practices, as a legalist, mechanistic model would suggest. In fact, there are good reasons to believe that the transmission of knowledge and know-how in expert professions relies on peer observation and reproduction rather than on ‘bookish’, formalized approaches (Marchand, 2012; Wagner, 2015; Zuckerman, 1996). Focusing on social psychology, one important research avenue would be to analyze, for instance, through ethnographic observation and interviews, the training of graduate students and early-career researchers. Grasping the historical evolutions of the scientific apprenticeship and comparing its modalities before and after the diverse events that recently shook the psychological world took place could constitute a way to understand more closely the inculcation, diffusion and enforcement, or lack thereof of statistical rules.
Footnotes
Acknowledgements
I am grateful to the anonymous reviewers and to Michel Dubois, Yves Gingras and Jean-Guy Prévost for their comments and suggestions.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
