Abstract
In response to what are seen as fundamental problems in Psychology, a reform movement has emerged that finds inspiration in philosophy of science, the work of Karl Popper in particular. The reformers attempt to put Popper into practice and create a discipline based on the principles of critical rationalism. In this article I describe the concrete sociotechnical practices by which the reformers attempt to realise their ideals, and I argue that they go a long way towards bridging the gap between rules and practice that sociologists of science Mulkay and Gilbert had identified in their study of the role of Popper’s philosophy in the work of scientists. Second, I note the considerable resistance that the reformers meet and the disruptive force of their work. I argue that this disruption is productive and raises fundamental questions regarding psychology and its object of study.
It is a feature of the reform movement 1 that is sweeping through psychology at the moment that it is explicitly inspired by philosophy of science, the work of Karl Popper in particular. References to Popper and like-minded philosophers of science such as Paul Meehl are used to support arguments about what science is, what has caused the current problems, and how they should be solved. 2 The reform movement is an epistemic project that is informed by epistemology. This would have pleased Popper, who wrote, in the preface to the English edition of The Logic of Scientific Discovery, that the theory of knowledge, including his own, aims to “enable us not only to know more about knowledge, but also to contribute to the advance of knowledge – of scientific knowledge that is” (1972/2002a, p. xxii).
This development may amuse those who, like me, have taken on board the lessons of the empirical turn in the study of science. One of those lessons after all appeared to be that science does not work the way Karl Popper thought it should. When sociologists of science in the 1970s entered laboratories to study the actual process of science they concluded that it doesn’t follow the rules that Popper, or any other philosopher of science for that matter, had laid down. Research is messy, researchers are motivated by more than a desire for objective truth, and facts are not discovered but constructed in a process that involves many more actors than those allowed by traditional philosophers of science. When Bruno Latour was asked (in an interview on Dutch television) whether science proceeds by falsification, he laughed derisively and answered: “That is the textbook version of science, I don’t think there is any case where this works as nicely as this” (Schepens, 2008, at 14:15). However, as much as we post-Kuhnian students of science might be tempted to dismiss the psychology reformers as naive, our first step should be to study this development empirically, including the role Popper’s ideas play in it.
Michael Mulkay and Nigel Gilbert were here before us. In their article “Putting Philosophy to Work: Karl Popper’s Influence on Scientific Practice” (1981) they reported on interviews they had conducted with biochemists in the context of their study of a controversy in that field (see also N. Gilbert & Mulkay, 1984). These researchers regularly mentioned Popper, even though they never referred to him in their primary literature. What was the role of Popper’s ideas in their own thinking and research? What work did Popper do for them? It became clear from the interviews that the rules that Popper had formulated did not function to constrain action. Even the interviewees who identified as Popperians acknowledged they did not look to Popper’s methodological rules as prescriptions for their day to day laboratory work. Rather, Popper was used as an evaluative resource, a way to judge “good” and “bad” after the fact. Popper’s philosophy of science was used to describe what was good about a particular study or researcher, but not to determine how to proceed in doing research.
Mulkay and Gilbert saw a fundamental truth about rules reflected here: the relationship between rules and practice is “essentially indeterminate” (Mulkay & Gilbert, 1981, p. 404). Rules do not determine their own application. Following a rule always requires an interpretation of what the rule means in this particular situation. In science, the gap between rules and action is particularly wide because at the forefront of research, novel situations are being created: new techniques, new instruments, and of course new phenomena and effects, which all raise the question of how the rules apply in these novel circumstances. For example, Popper had argued that one cannot verify a theory, one can only disprove it. Thus, falsifiability is the mark of a scientific theory, and science should proceed by attempts at falsification. That is clear enough. However, Mulkay and Gilbert noted that whether or not a particular experimental result is a falsification depends on a technical, scientific appraisal of the experiment. In situations of scientific uncertainty, in new lines of research, such appraisals will vary between researchers. “Consequently, when there is uncertainty, the Popperian rules cannot provide a straightforward guide for scientists’ actions or decisions. There is a gap between rule and particular action which can only be bridged by the very scientific choice which the rule is intended to constrain” (Mulkay & Gilbert, 1981, p. 398).
Popper’s work does not provide any guidance for how to deal with this interpretative challenge. His rules of method were based on the rational reconstruction of scientific achievements rather than on a description of actual scientific practice. From this hindsight perspective the interpretative work that was required in scientific practice becomes invisible. As a result this (or indeed any) prescriptive philosophy of science on its own can give little guidance for this work. Popper “remains unclear about the connection between the formal analysis of scientific belief systems and the provision of rules of action; and … he hasn’t considered in detail how his rules of scientific method are to be put into practical effect” (Mulkay & Gilbert, 1981, p. 392). To tighten the link between prescription and action, Mulkay and Gilbert argue, the rules must become embodied in a social practice, “so that potential actors have access to a corpus of exemplary instances, they are guided in their efforts by skilled interpreters, and they are subject to various kinds of direct control” (Mulkay & Gilbert, 1981, p. 407). 3 Rules will never determine action unambiguously, but they can be made more effective through the interpretative work of a community of researchers, who translate general methodological rules (such as “always expose your theories to the possibility of refutation”) into more specific directions for what should be done with regard to this specific hypothesis and others of its kind, who negotiate difficult cases, draw up guidelines, and sanction those who contravene them.
The same question that Mulkay and Gilbert asked 38 years ago is relevant today with regard to the current reformers in psychology: how do these psychologists put Popper to work? The history of science studies following Mulkay and Gilbert’s paper has given this question added poignancy. In 1981 Popper’s influence was still great (and he was still alive), and that generation of sociologists of science partly defined itself in relation to his work: by turning Popper on his head, for example, as in this paper, and investigating how norms work in practice. Since then Popper has gradually receded from view in science studies, and is no longer relevant, even as a foil. What then should we make of this apparent renaissance of Popperian thinking in psychology?
Philosophy in practice
Ever since the reform movement in psychology began to coalesce in 2011, replicability has been its main concern. 4 The frequency with which even high-profile studies fail to replicate is seen as an indication of fundamental problems in the usual research practices of the discipline. A lack of transparency is seen as a major cause of the problems: a lack of disclosure about the actual research process and its results allows researchers to present their studies in a favourable but misleading way and keeps null results (“failed experiments”) in the file drawer. As a result, the discipline’s archive may look like a corpus of scientific success stories, but it is actually “a vast graveyard of undead theories” (Ferguson & Heene, 2012).
In the proposals for improvement that have been appearing regularly over the past six years, statements about “what science is” have an important place.
5
In contrast to Mulkay and Gilbert’s biochemists,
6
quite a number of these reformers in psychology read philosophy of science. For instance, a recent blog post in which the author argued that null hypothesis significance testing is compatible with Popper’s ideas about falsification mentioned Duhem, Lakatos, Laudan, Van Fraassen, and Feyerabend as well as Popper (Lakens, 2017a). Another example: Zoltán Dienes’ Understanding Psychology as a Science (2008), a textbook that leans heavily on Popper, is enthusiastically recommended as summer reading (Lakens, 2017b; Srivastava, 2017). Even when Popper is not mentioned, science is depicted in a way that largely conforms to his philosophy of science, in that falsifiability, falsification, and replication are seen as crucial elements of the scientific process. Reformers emphasise that scientific theories must be falsifiable. This is mostly presented as self-evident, but sometimes Popper is referred to (e.g., LeBel & Peters, 2011, p. 373). Falsification, the reformers believe, “is achieved via meticulously executed series of direct replications” (LeBel, 2017, line 8), that is to say, by following the procedure of the original experiment as closely as possible. In this context, the reformers like to quote from section 8 of Popper’s Logic of Scientific Discovery, where he states that observations are only inter-subjectively testable when they can be repeated by following specific instructions. The upshot is, in the words of one prominent reformer, that (1) scientists should replicate their own experiments; (2) scientists should be able to instruct other experts how to reproduce their experiments and get the same results; and (3) establishing the reproducibility of experiments (“direct replication” in the parlance of our times) is a necessary precursor for all the other things you do to construct and test theories. (Srivastava, 2014b, para. 1)
Whereas Popper was primarily an evaluative resource for Mulkay and Gilbert’s biochemists, these psychologists translate Popper’s philosophy of science into a program of reform, with concrete rules of practice, an infrastructure for that practice, and research projects that realise the ideal. The rules and requirements focus, first of all, on restraining the so-called “researcher degrees of freedom,” a term coined by Simmons, Nelson, and Simonsohn (2011) for the leeway that researchers have in the decisions they take for the collection and analysis of their data. Hypothesis testing requires such decisions—sample size, exclusion of outliers, which comparisons to make, etcetera—to be taken before data collection begins, but it is common practice, Simmons et al. noted, to explore various possibilities during data analysis, see which combination produces statistical significance, and only report that result. Another problematic practice, not discussed by Simmons et al. in their paper, is to fit the hypothesis to the data and create a significant result that way, a process known as HARKing: hypothesising after results are known (Kerr, 1998). In all these cases, one is not testing (i.e., attempting to falsify) hypotheses, but generating hypotheses from the data.
The most commonly proposed solution to this problem is pre-registration, where researchers create a detailed plan for data collection and analysis and upload it to a repository, where it gets a date stamp (see, e.g., Bishop, 2013; Wagenmakers, Wetzels, Borsboom, van der Maas, & Kievit, 2012). Once the study is completed, other researchers can consult the repository to see whether the study actually followed the registered research plan. Of course, this system can be gamed by registering multiple plans, but that would require intentional bad faith rather than sloppy or questionable research practice.
A related initiative is the registered report (RR), a publication option devised by Chris Chambers (2013), editor of Cortex, and now offered by an increasing number of journals in psychology and other disciplines. In an RR, a researcher submits a proposal for a study to a journal, including a detailed plan for data collection and analysis. Reviewers then look at the relevance and importance of the question and the quality of the research plan, and if the proposal is accepted, publication of the final report is guaranteed regardless of the outcome of the study, provided the research plan has been followed. The RR procedure combines the advantage of pre-registration with the guarantee that null results will also be published, thus increasing the chance that theories are (recognised to have been) falsified. Moreover, registered reports are often offered as a publication option for direct replication studies (RRR, registered replication reports).
Pre-registration and registered reports are presented as enforcing a distinction between “exploratory” and “confirmatory” research, between generating hypotheses and testing them. Both are important, it is emphasised, but testing has to be separate from exploration to properly count as such (e.g., Bishop, 2013; Chambers, 2017a; Neuroskeptic, 2013). Thus, pre-registration and registered reports give an administrative guarantee that what is presented as the test of a prediction was indeed an attempt to falsify a hypothesis and not actually an inductive process of developing theory from data. In contrast to Popper, these reformers explicitly give induction a place in the scientific process, but on condition that it remains strictly separate from “purely confirmatory research” (Wagenmakers et al., 2012). It’s clear moreover that whereas exploration is acknowledged as important, hypothesis testing is considered necessary: without it there is no science.
Social engineering
To Popper, the rules of method that he proposed in The Logic of Scientific Discovery (1972/2002a) were the application to science of a more general conception of reason that he called critical rationalism. In The Logic of Scientific Discovery and even more in The Open Society and its Enemies (2002b), Popper emphasised that scientific objectivity is not the result of the attitude or efforts of individual scientists, but rather the “product of the social or public character of the scientific method” (2002b, p. 491). The rational route to objectivity is openness and “friendly-hostile co-operation” (Popper, 2002b, p. 489). In line with this ideal, the reform movement pursues transparency in all stages of the research process and it has a penchant for vigorous discussion. Equally in line with Popper’s recommendations, it tries to institutionalise these values. Critical rationalism, Popper wrote, implies the necessity for creating institutions that protect “freedom of criticism, freedom of thought” (Popper, 2002b, p. 511). It “establishes something like a moral obligation” to engage in “practical social engineering” (p. 511). Although The Open Society and its Enemies (Popper, 2002b) is seldom referred to, the reform movement is doing exactly what Popper thought was necessary: not only does it make enthusiastic use of social media to conduct its “friendly-hostile” discussions, it is also engineering its own social-technological infrastructure to enable transparency, collaboration, and replication.
A particularly successful initiative is the Open Science Framework (OSF, https://osf.io/), an online platform developed and maintained by the Center for Open Science (COS), founded by social psychologists Brian Nosek and Jeffrey Spies. The OSF facilitates the kind of open, reproducible science that the reformers advocate. Via the OSF, researchers can pre-register their studies, share materials, store and share data, and collaborate with others. The goal is to make the entire research process transparent, and thus also eminently reproducible. The OSF has hosted a number of replication projects that were started by reformers to put their ideas about replication and falsification into practice. The best known of these was a mammoth collaborative effort to replicate 100 studies from one volume of three psychological journals. Working together in the “Open Science Collaboration” were 269 researchers, mainly from North America and Europe. The results were alarming: according to the team’s assessment, only 39 replication attempts succeeded (Open Science Collaboration, 2015). 7 The OSF has also hosted several so-called “Many Labs” projects, in which a small number of original studies are each replicated by a number of research teams, to see whether replication success depends on contextual factors such as the nationality of the participants (Klein et al., 2014) or the semester in which students participate in studies (Ebersole et al., 2016). Neither factor appeared to have much influence on whether the original result could be reproduced. Thus, the Many Labs projects had produced evidence against contextual moderators as explanations for replication failure (Srivastava, 2015).
The internet is the reform movement’s primary habitat. Blogs are popular outlets for ideas and commentary; blog posts are typically announced on Twitter and Facebook, and then further discussed there, often leading to new blog posts. Web-based platforms like the OSF not only facilitate collaboration but also serve to share preprints (https://osf.io/preprints/psyarxiv), following the example of the successful physics preprint repository arXiv. Such online platforms and social media allow fast, free dissemination of ideas, results, and papers, followed by virtually instant, open discussion and critique, often leading to follow-up collaborative projects to test new hypotheses. Bobbie Spellman has argued that replication studies could only become such an important factor in the crisis because news about them spreads quickly over the internet, whereas before, one would typically learn about a failure to replicate, if at all, in a “fortuitous late night conference conversation” (Spellman, 2015, p. 888). The online world, moreover, is one where traditional gatekeepers such as editors and reviewers are much less powerful because anyone can be an editor—publish their own blog, for example—and a reviewer—for instance on PubPeer, the “online journal club” where anyone can post comments on any scientific article (https://pubpeer.com). The social practice characterised by transparency and mutual criticism which the reform movement is creating embodies Popper’s critical rationalism. A kind of Popperian Open Society is in the making, an Open Psychology, where everything may be subjected to criticism by anyone, and tradition and reputation hold no sway.
Controversy
This budding community of practice is not without its detractors, however. Open Psychology is resisted by a number of social psychologists in particular, who object to what they consider unfair criticism of the status quo, to the way this criticism is expressed, and to the emphasis on direct replication as a sine qua non of science. 8 The discussion between reformers and counter-reformers can become quite heated, as was the case in the controversy in 2014 over the non-replication of a study by Simone Schnall and colleagues. It is worth looking at this controversy in some detail, because it shows how Popper is put to work by the reformers, and how his rules of method and his “friendly-hostile co-operation” are resisted by others.
The original study had produced the kind of attention-grabbing result that the critics see as typical of the discipline’s focus on novelty at the expense of rigour: people who feel clean offer milder judgements of moral transgressions. As the title put it: “Cleanliness Reduces the Severity of Moral Judgements” (Schnall, Benton, & Harvey, 2008). Schnall and colleagues described two experiments that had produced this effect. In the first, participants were first “primed” with cleanliness by having them do a scrambled-sentences task. They had to construct three-word sentences out of sets of four words; in the experimental condition, half the sets contained words related to cleanliness, such as pure, washed, and immaculate. After this task, the participants were asked to rate six morally loaded actions, including putting false information in one’s CV and using a kitten for sexual gratification. In the second experiment, participants were asked to wash their hands after watching a disgusting film clip, and then had to rate the vignettes. In both experiments, participants on average made less severe moral judgements in the experimental condition. In their replication study, Johnson, Cheung, and Donnellan (2014), using the same materials, almost identical procedures, and a much larger sample, failed to find any effect in either experiment.
A rather acrimonious debate ensued. 9 Schnall, supported by luminaries such as Daniel Gilbert (2014), Daniel Kahneman (n.d.), and Matthew Lieberman (2014), complained about the fact that she had not been allowed to review the final report of the replication (it was a Registered Report; Schnall, 2014b). She also criticised the “crime control mindset” of the reform camp, which, she contended, tends to see every non-replication as an indication of problems in the original study (Schnall, 2014a), and then shames the original researcher over online media. A culture of “replication bullying” had emerged (Schnall, 2014b). One of the replicators, for example, had triumphantly called their attempt “an epic fail” (Donnellan, 2013). Rushing to Schnall’s aid, Gilbert accused the “replication police” of being “shameless little bullies” (D. Gilbert, 2014).
But Schnall also raised questions regarding the value of replication studies per se, how they should be conducted, and by whom. She emphasised, first of all, that experimentation in social psychology is very difficult, much more complicated in fact than in hard sciences like physics. “There are always many reasons for a study to go wrong and everything would have to go right to get the effect” (Schnall, 2014a, para. 29). Therefore, when people without expertise in a particular field of study fail to find the same effect in a replication, we shouldn’t read too much into it. “[B]efore you declare that there definitely is no effect, the burden of proof has to be really high” (Schnall, 2014a, para. 29). More generally, the current emphasis on “direct replications” (Schnall preferred the term “method replications”) is misguided. In a complicated field like social psychology, we shouldn’t expect a particular experiment to produce the same effect every time, even when the experiment is done by experts. Human social behaviour is extremely sensitive to variations in the social and cultural context. Schnall noted, for example, that the participants in the replication study (students at an American university) had evaluated the vignettes much more negatively than the English students in the original study, leading to a ceiling effect in the dependent variable. The failed replication was most likely due to the relative moral laxness of English campus culture, she implied. Rather than putting so much weight on a failed direct replication, we should focus on conceptual replications, in which the same theory is tested with a different procedure. In fact, the connection between physical cleanliness and moral judgement had been conceptually replicated many times, Schnall (2014a) insisted.
In response, proponents of direct replications argued that direct replication is a basic requirement in science, and supported this point with references to Popper. Personality psychologist Sanjay Srivastava, for example, argued that “every experimental report comes with its own repeatability theory” (2014a, para. 5) because each methods section implies that someone who follows the same procedure will get the same results. This implicit mini-theory is eminently falsifiable, as long as we specify what will count as “the same result” and we spell out the requirements of the experiment. Schnall’s argument that it is up to the replicating researchers to acquire the necessary expertise is wrong, according to Srivastava: “The onus is on the original experimenter to be able to tell a competent colleague what is necessary to repeat the experiment” (Srivastava, 2014a, para. 9). And if they cannot, there is no reason to have any confidence in the original result. At this point, Srivastava quoted from Popper’s The Logic of Scientific Discovery, which ends with the line: “No serious physicist would offer for publication, as a scientific discovery, any such ‘occult effect,’ as I propose to call it – one for whose reproduction he could give no instructions” (Popper, 2002a, p. 24). Andrew Wilson wrote a scathing reply to Schnall’s and Kahneman’s insistence that the original researcher should always be consulted for a replication: once you have published some work then it is fair game for replication, failure to replicate, criticism, critique and discussion. … We don’t need either your permission or your involvement: the only thing we (should) need is your Methods section and if you don’t like this, then stop publishing your results where we can find them. (2014, para. 3)
Srivastava and other proponents of direct replication acknowledged that falsifying a substantial theory is more complicated than testing the mini-theory implied by the methods section of a report. They admitted the problem noted by Quine (1951) that a theory is never tested in isolation but always in combination with a number of background assumptions, and endorsed Lakatos’s amendments to Popper’s methodology (Srivastava, 2014a). It all starts, however, with establishing the reproducibility of the phenomenon itself, and to this end we must do direct replications: “Only by such repetitions can we convince ourselves that we are not dealing with a mere isolated ‘coincidence’, but with events which, on account of their regularity and reproducibility, are in principle inter-subjectively testable” (Popper, 1972/2002a, p. 23). Were we to drop this requirement, “we would be doing history rather than science” (Srivastava, 2014a, para. 5).
Similar replication controversies have occurred since the Schnall affair. 10 Non-replications keep the debate about fundamental problems in psychology and their solution alive. The proponents of conceptual replication have begun to support their position with arguments drawn from philosophy of science as well. Chris Crandall and Jeff Sherman invoke Duhem and Quine to claim that the “‘failure’ of an empirical test is always ambiguous” (Crandall & Sherman, 2016, p. 94). Conceptual replications, they say, “disperse this ambiguity, and as a result, can contribute more to theoretical development and scientific advance” (p. 94). In a similar vein, Wolfgang Stroebe has attempted to combine a commitment to falsificationism with a preference for conceptual replications. Direct replications are initially important for the original researcher to establish whether the effect is robust, but, given the context-sensitivity of social psychological phenomena, later non-replications do not tell us much. Instead, the focus should shift to conceptual replications, and meta-analyses can determine whether the theory is corroborated or falsified. Psychological theories are not about actual phenomena, which are variable, but about the stable, universal mechanisms that underlie them. With a reference to Popper’s Conjectures and Refutations, Stroebe argued that “theories are not refuted by a single inconsistent finding but by studies that support an alternative theory that has greater empirical content” (2016, p. 143). The reformers respond that conceptual replications are certainly important to refine a theory and develop it further, but direct replications are fundamental: “[I]f a phenomenon is not replicable (i.e., it cannot be consistently observed), it is simply not possible to empirically pursue the other goals of science” (LeBel, Berger, Campbell, & Loving, 2017, pp. 8–9). 11
How and where criticism should be delivered also remains a hotly debated issue. In September 2016, Susan Fiske wrote a brief guest column for the APS Observer, in which she attacked the way psychology’s reformers were going about critiquing the work of others. A first version, which found its way online, contained terms like “trash-talk” and “methodological terrorism” (Fiske as cited in Gelman, 2016). After predictable outrage from the critics, the text was toned down a bit for publication, but Fiske’s point remained the same: the critics (“bullies”) are promoting a culture of shaming, harassment, and unrestrained hostility, and it is creating victims. “[C]olleagues at all career stages have reported leaving the field because of what they see as sheer adversarial viciousness” (Fiske, 2016, para. 5). She thought this culture of “uncurated, unfiltered denigration” is “encouraged by the new media (e.g. blogs, Twitter, Facebook)” (para. 2), where everyone can publish criticism, regardless of whether it is valid or appropriate. Fiske advocated a return to traditional fora, “with their rebuttals and letters-to-the-editor subject to editorial oversight and peer review for tone, substance, and legitimacy” (para. 7). In response, some of the critics posted a conciliatory petition “Promoting Open, Critical, Civil, and Inclusive Scientific Discourse in Psychology,” which attracted 600 signatures (Coan et al., 2016), but the dominant opinion about Fiske’s piece was that she was papering over the real problems in psychology (e.g., Gelman, 2016; Yarkoni, 2016). The “tone debate” shows no sign of abating (Chambers, 2017b; Schwarzkopf, 2017), and increasingly includes issues of diversity (Hamlin, 2017; Ledgerwood, 2017).
Rules and practices
I have argued that the reform movement is guided by a Popperian view of how science should be conducted, both as to its methodological rules (replication and falsification in particular) and as to the general culture of openness, mutual criticism, and collaboration in which it needs to be embedded. That is not to say that all aspects of Popper’s work are represented in the discourse of the critics; 12 nor that they always name Popper as the source of their ideas about replication, falsification, and criticism; nor that there aren’t non-Popperian or even anti-Popperian elements in their ideas. What I have shown is that the critics are trying to create a scientific practice that corresponds to Popper’s philosophy of science in several key aspects, sometimes explicitly referring to and discussing his work and that of like-minded philosophers of science. In that sense they are putting philosophy of science, Popper in particular, to work.
What about the relation between rules and practice in these reforms? As I mentioned in the introduction, Mulkay and Gilbert (1981) based their analysis on the philosophical position that that relation is essentially indeterminate: “[N]o rule can specify completely what is to count as following or not following that rule. The terms of a rule always need to be interpreted in relation to the variable characteristics of specific situations” (p. 400). This conception of rules stems from a reading of Wittgenstein’s discussion of rule-following in Philosophical Investigations (1953). Philosopher Saul Kripke (1982) is its best-known representative. It is a sceptical reading that has been vigorously disputed in philosophy (Baker & Hacker, 1985), as well as in science studies—see, for instance, the discussion between Michael Lynch and David Bloor in Pickering (1992). In the non-sceptical view there is no essential gap between rule and practice, even though on occasion it may be unclear what rule applies and how. Interpretation, translating a rule to a specific situation, is sometimes required, but not “always,” as Mulkay and Gilbert claimed. Following a rule is not the same as interpreting it. Usually we follow a rule blindly; if interpretation were constantly required it would cease to be a rule.
Even if we accept this non-sceptical view of rules, however, we may concede that the meaning of rules is more ambiguous, less self-evident in situations characterised by novelty and uncertainty. Mulkay and Gilbert (1981) argued this is inherently the case in scientific research, where new techniques, instruments, and effects regularly raise the question of how rules should be applied. In such circumstances, disambiguating the relation between methodological rules and scientific practice requires creating “interpretative procedures and social relationships” (p. 404) to make Popper’s methodological rules “effective as constraints” (p. 407). Indeed, such interpretative work is happening in the current debate, where every new failed replication instantly leads to fresh discussion about how this result should be interpreted, and whether it suggests further rules and procedures—the Schnall affair is typical in this regard. But the reformers do more than discuss the interpretation of methodological rules. Where Mulkay and Gilbert speak of a social practice that guides and controls the interpretation of rules, thus bridging the gap between rules and practice, what the reformers are trying to create goes further by giving these rules administrative form (pre-registration, Registered Reports) and creating infrastructure to facilitate an Open Psychology. The rule that scientific statements must be exposed to the risk of falsification is expressed in and as pre-registration, just as openness to criticism is expressed in and as the technology of the Open Science Framework. The rules are institutionalised and materialised, a process I have compared to Popper’s “social engineering.”
At the same time it is clear that this is a work in progress, and the crisis is not over yet. Two issues remain particularly contentious, as I have shown. The first is the question of how scientific debates should be conducted in the context of the new online collaboration and communication platforms. This is a context that is quite different from the one that Popper had in mind when he described his “open society” and its “friendly-hostile co-operation.” It is a technological landscape in which a pre-print (reporting, say, a failed replication of a classic experiment in psychology) can be made available to anyone with an internet connection and announced on Twitter, where it subsequently may be discussed by anyone with an account. Within hours, opinions are formed and exchanged, counter-opinions appear, and soon a debate develops that needs only a slight exaggeration, an unfortunate phrase, or a bad joke to spiral out of hand in the way that so many online conversations do. It is doubtful that a return to the classic mode of peer-reviewed discussion in scientific journals is the solution, as Susan Fiske (2016) thinks, or even likely as an option, but it is clear that academia in general is looking for a new ethics of academic debate in this online world. This too is a problem of interpretation: what does “friendly-hostile co-operation” mean in this novel context? What does civility look like in a tweet, or in a blog post? Is it civil to comment anonymously? And so on.
Whereas the first issue is a more general academic problem, the second is specific to psychology and concerns the question of whether or not direct replications have a role in social psychology, and if so, when. As we have seen, some social psychologists resist the reforms because they do not think it is reasonable to demand that the results of social psychological experiments always be reproducible by following the description in the methods section of the original report. Social behaviour is too sensitive to the social, cultural, and historical context, and this context too diverse and changeable, to expect this kind of stability. This issue is at once methodological and ontological: do Popper’s rules of method apply in a field with a uniquely difficult object of research? And is that object really so unique? Ironically, it is precisely the direct replications and their recurring failure to reproduce earlier results that have provoked such questions about the appropriateness of a methodology in which direct replication is fundamental. Direct replications have been a disruptive force in psychology over the last six years, and in response arguments for a special status of social psychology, such as those of Crandall and Sherman (2016) and Stroebe (2016), mentioned above, have been formulated. In fact, one could argue that the variability they see as typical of social behaviour has been made visible by direct replications.
This ironic effect is a product of the reformers’ emphasis on methodological rules. 13 As Mulkay and Gilbert (1981) noted, rules are constraints. Some rules, however, are more constraining than others. Conceptual replication leaves the researcher more freedom than direct replication. In social psychology, that freedom has been used over the years in the production of sameness. By working on a high level of abstraction, that of the theory’s constructs, researchers can claim that their studies provide evidence for the same general theory, even though what happens in the experiments is different in each case. Thus, variations in experimental procedures are used in the production of sameness. If this strategy is combined with publication bias against null-results, as has been the case for decades, the production of sameness is facilitated further, since falsifying instances never see the light of day. Alternatively, if experiments do not quite pan out as hypothesised, the theory may be amended by the addition of further variables (perhaps the relation only holds for women, or it requires a minimum level of anxiety). On this basis, further hypotheses are tested, 14 and again only the successful studies are published. The result is a pattern that was noted (more or less simultaneously) by Paul Meehl (1990) and by Michael Billig (1990), namely that theories in social psychology tend to start out as bold statements of straightforward relations between a few variables, but then amass an increasing number of “refinements” until they become so unwieldy that the field simply loses interest. All the while, the illusion of sameness is maintained, because although the theory gets more and more nuanced, incorporating more and more sources of variability, at some level it has remained the same, and it is shielded from falsification. Variability is never interesting in itself, except as a source of publications: research aims at finding the basic, underlying psychological processes that, in conjunction with a changeable context, “produce behaviour.” Nor is variability ever a risk, if only because the publication bias keeps failed studies well out of sight.
In contrast, in a research practice that puts direct replication up front, variability can assert itself, as it were, on its own terms. It is precisely because researchers limit themselves to following the same procedure as an earlier experiment, that variability stands out as anomalous. Moreover, it is because researchers severely constrain their own freedom with methodological strictures like pre-registration, that variability may appear unimpeded. 15 This is the irony of the reformers’ emphasis on methodological rules: strict, direct replication, maximally constraining the experimenter, has produced disruptions, because it allows the object to object, to borrow a phrase from Latour (2000). Social psychology finds itself confronted with an epistemic device, direct replication, that over the last few years has regularly produced interesting differences in the form of non-replications. The ease of making such results public online offsets the publication bias of traditional peer reviewed journals and allows each non-replication to become a spectacle on social media, demanding a response from the original researchers and the field of social psychology in general.
Conclusion: Psychology’s epistemic project
The latest crisis in psychology has spawned a reform movement that is proposing thorough changes to psychology’s epistemic practices, and is creating the sociotechnical conditions for what it sees as a better, more scientific psychology. Indeed, there is now a Society for the Improvement of Psychological Science (http://improvingpsych.org/), which meets yearly to discuss ideas for further changes in methods and practices. With its emphasis on hypothesis testing, direct replication, collaboration, and open, critical debate, the reform program is distinctly Popperian, and proposals and arguments are frequently supported with references to Popper and like-minded philosophers of science. Thus, the reformers stay well within the bounds of a rather traditional conception of science. Nonetheless, this effort to put Popper to work is innovative simply because such a strict adherence to these methodological principles has not been attempted before in psychology. Warnings about low power, publication bias, questionable research practices, and replication failures have been sounded before, but it is the first time these issues are addressed with such a comprehensive program of methodological reform.
Moreover, Popper may also turn out to be a source of renewal in spite of himself. As I’ve argued, direct replications have been a disruptive force in the discipline recently, regularly producing results that are so different from those of the original study that fundamental questions are raised not only about methodological and statistical practices, but increasingly about the object of study itself. Critics of the reform movement have countered its proposals and projects with the argument that people are intensely context-sensitive beings in an extremely variable and changeable environment. As Crandall and Sherman put it, “In matters of social psychology, one can never step in the same river twice” (2016, p. 94). During the previous crisis in psychology, Ken Gergen (1973) argued something very similar, but he drew from this the conclusion that social psychology is a form of history and should let go of its ambition to be a natural science. As yet, this is a step that critics of the reform movement are unwilling to take. Behind the variable and diverse behaviour they still presume to lie a stable, universal cognitive mechanism, which will ultimately be described by theories that are built and tested by doing conceptual replications. This makes social psychology a highly theoretical field, however, with no clear relevance for practice. 16 Such a retreat into abstraction may not be to everyone’s liking, and as non-replications keep piling up, putting the robustness of results in doubt, some may choose to look for different approaches altogether, away from quantitative methods and a search for causal laws. They will ask a question that is largely absent from the current crisis debate: What is psychology good for? And are quantitative methods and experiments always the best way to bring it about?
Footnotes
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
