Abstract
Successful science needs deviant ideas that may challenge established norms. The last decade saw an unprecedented science-engineering project, with strict rules on preregistration, statistical testing, result-independent guaranteed publication, replication, and openness badging being enforced by psychological journals. These normative methodologies seek to prevent failure (negative deviance) rather than promote success (positive deviance), and run counter to the historical development of successful science. By narrowly focusing on research data, while avoiding theoretical bias, they are inadequate for tackling, often intractable, scientific problems. Instead, unconventional, exceptional, and even initially implausible hypotheses should be fostered. A novel connection is drawn between positive deviance and the unplanned, haphazard evolution of successful science. Hypotheses compete for the highest fitness while probing an ever-changing, infinitely wide, empirical and theoretical landscape. The winner constitutes the positive deviant, but always remains subject to future competition. Losing negative deviants, which may share characteristics with winners, become irrelevant, sometimes long after their inception, and eventually sink into oblivion. Normative methodologies aim to curb negative deviants at their source, but also cut off positive deviants and may freeze successful science. More room for deviance and a theory primacy are advocated, allowing research to generate discovery and innovation in psychological science.
In communities throughout the world, there are a few ‘deviant’ individuals whose uncommon behaviours or practices enable them to outperform or find better solutions to pervasive problems than their neighbours with whom they share the same resource base. Jerry Sternin (2002, p. 57)
Deviance, a term rooted in sociology, refers to an exceptional behavior, practice, or idea that breaks with the norm and is often not deemed socially acceptable (cf. Heckert & Heckert, 2002). Deviants, hence, usually carry a negative value (i.e., negative deviants). In a few exceptional cases, the norm obscures vital information that has the potential to improve the world. The stunning benefits of identifying successful outliers, and then spreading these practices throughout the group, were first discovered in economically disadvantaged communities (e.g., Sternin, 2002). Positive deviants correspond here to individuals in challenging situations who achieve outcomes much better than those typical of their group. The approach quickly spread to other challenges (e.g., healthcare) owing to its high yield in virtually intractable problems. Positive deviance, not previously considered within science dynamics, is proposed here to also underlie successful science.
This forward-looking perspective focuses on an evolutionary conception of research, allowing for competing theories to generate discovery and innovation in psychological science. It begins with a brief sketch of the origins of positive deviance, and parallels are drawn with the unguided historical development of successful science. The rare emergence of a norm-breaking, positive deviant comes at the cost of producing numerous, equally norm-breaking, negative deviants that have little substance and are often unreproducible. In the last decade, a reform movement sought to respond to the perceived reproducibility crisis by safeguarding against negative deviants, but did not consider their positive counterparts. Particularly in experimental psychology, the reformers’ methodological regulations led to an unprecedented “science-engineering” project. The reformers assume a greatly simplified, “linear” conception of the advancement of science, presupposing that research is directed toward the discovery of facts about the “truth” that can be derived from the data in a relatively straightforward, “theory-free” manner. This view contrasts sharply with an alternative evolutionary view, which is argued here to engender a fundamental open-endedness in the plethora of paths scientific developments can take. In the history of science, a slow competitive selection process between theoretical hypotheses (i.e., the “genes”) has gradually self-organized, more or less by trial and error. This results in the incremental construction of sophisticated models by continuously probing (e.g., through deviance) an ever-changing, infinitely wide, empirical and theoretical landscape for the fittest hypotheses.
Positive deviants are even rarer in psychology than in other scientific fields. Working from the presupposition that the dynamics are largely similar in different science domains, many historical examples of positive deviants are borrowed here from other fields. The scarcity in psychology may be due not only to the complexity of the subject but also to the restrictive, a-theoretical, methodologies dominating psychology. The latter typically revolve around statistical testing and try to shield data treatment from theoretical bias (e.g., by massaging the data toward a desired hypothesis). They aim to curb norm-breaking negative deviants at the outset, for instance, by formally preregistering the intended analyses and experimental procedures. However, this approach may unintentionally also block positive deviants, as they initially resemble the far more prevalent negative deviants. As a result, the normative methodologies may throw the baby out with the bathwater. The evolution of successful science requires a careful, slow consideration of deviants rather than a fast pre-emptive deselection due to their perceived norm-breaking nature. This perspective seeks to reinforce experimental psychology by shifting the focus from narrowly data-limited research methods to slower, theoretically motivated investigations, from which successful deviants are more likely to eventually emerge.
Positive Deviants
Universities should exist to encourage heresy, not to create conforming clones. Denis Noble (2010, p. 101)
The classic example demonstrating the exceptional optimization power of positive deviance is Save the Children’s strategy to reduce malnutrition in Vietnam (Sternin, 2002). From 1991 to 1999, the program helped recover an estimated 50,000 malnourished children from over 250 communities and may have prevented malnutrition in many more who were not yet born at that time. Briefly, the strategy involved detecting the very few well-nourished children from extremely poor families (i.e., “competition winners”), and trying to discover what practices enabled them to avoid malnutrition. These families collected shrimp and crabs from paddy fields, which other villagers considered “taboo” and dangerous (i.e., norm-breaking). Additionally, instead of the customary two meals a day, the healthier children were fed more actively and received three to four smaller meals. Overcoming resistance to these norm-breaking practices was challenging, but eventually cooking sessions unobtrusively incorporating these practices allowed their spread through the communities.
A positive deviance approach with spontaneous variation and selection of the most successful cases as its core elements, enables a wide diversity of applications. The huge utility lies in the discovery of previously unexpected solutions to complex, often intractable problems. Positive deviance has steadily expanded the range of fields where it opened doors for innovation, from nutrition research, to high-school absenteeism, gang violence, timely graduation, low deforestation cattle farming, improving sales-force productivity, reducing transmission of resistant hospital bacteria, and other healthcare issues. Among the best known and more recent examples is a study aiming to minimize the delay between arrival at the hospital and the insertion of a stent to reopen the blocked artery (i.e., the door-to-balloon time) for patients with acute myocardial infarction (Bradley et al., 2005). From U.S. national registry data, the researchers were able to select positively deviant hospitals with door-to-balloon times of less than 90 minutes for their last 50 cases, and then conducted in-depth visits (i.e., tours and interviews) at 11 sites with the shortest times. From this analysis, they derived several recommendations (e.g., early activation of the catheterization team) that considerably reduced door-to-balloon times and mortality rates in American hospitals.
In quantitative approaches, deviants are often considered rare outliers from a bell-shaped curve, with normative behavior at the center. Disapproved, negatively valued actions that surpass arbitrary tolerance limits (i.e., negative deviants) are at one extreme of the curve. The other end, exceeding norms and expectations, represents the positive deviants. Big-data analyses for detecting positive deviants can offer advantages over labor- and time-intensive in-depth collection of primary data (cf. Albanna et al., 2022). Compared to traditional methods, where the positive-deviance rate can be extremely low or zero, big data can confer a numerical benefit. Moreover, unlike smaller datasets in traditional positive deviance research, yielding only a static, cross-sectional performance record, big data can paint a more dynamic picture via longitudinal coverage. Driesen and collaborators (2021), for instance, identified positive deviants from Germany’s 401 administrative districts in their ability to control SARS-CoV-2 transmission. They based the identification not only on daily cases per district but also on weather reports, weekly mobility data, and structural data on ruralness and socio-economic status of the districts. The researchers qualitatively analyzed the factors that rendered these districts successful (e.g., courage to deviate from bureaucratic procedures) and formulated practical recommendations to contain the pandemic.
It is argued here that a conceptually similar approach has unintentionally grown in science. Gradually, a selection process developed relying on competition for publications, citations, public recognition, funding, etc., from which the successful long-term winners, the positive deviants, could emerge. Quite possibly, the most prominent example of a positive deviant in the history of science, breaking many norms at the time, is Charles Darwin’s conception of the theory of natural selection. Even though Darwin’s famous book, On the Origin of Species, was originally published in 1859, his detractors, particularly among religiously motivated biologists and philosophers, far outnumbered his proponents, at least until the first decades of the 20th century. Despite fierce opposition to his deviant ideas, Darwin eventually grew into, arguably, the most influential scientist ever (cf. Mayr, 2000).
The practical applications of positive deviance differ in some respects from its spontaneously evolved deployment in science. For instance, which deviant hypothesis is successful can often only be reconstructed in hindsight, occasionally after a long time. It is not guaranteed to produce only successes; it can also yield false leads and even suggest innovations that only appear to be improvements at first. Even worse, deviants can be initially labeled as negative, only to discover later that they are actually highly favorable. When Paul Dirac devised his Nobel Prize-winning Dirac equation, for instance, one of its strongly counterintuitive implications, the existence of antimatter, was initially frowned upon with much skepticism (see Firestein, 2012). Dirac’s equation could have two solutions, one for the well-known electron, and one for an anti-electron, the positron, having the same mass but the opposite electric charge. The equation was not derived from data, but arose from imaginative theoretical work. Initial positron observations, predating Dirac’s predictions, from cosmic-ray traces in a cloud chamber were even dismissed as flukes. Only after Carl Anderson interpreted these traces as positrons, was their existence accepted by the scientific community. Unimaginable to Dirac, this discovery much later led to the development of the Positron Emission Tomography (PET) scanner, which proves highly valuable to medical science.
Negative Deviants
For “open science,” it feels quite ironic to put up relatively rigid, narrow criteria by which we judge scientific contributions as worthy, especially since those criteria are likely to shift over time. Mary Murphy (Shiffrin et al., 2021, p. 268)
For over a decade, experimental psychology seems to have been suffering from an avalanche of failed replications (i.e., negative deviants), which has become known as the reproducibility crisis (e.g., Open Science Collaboration, 2015). Most nonreplications can be blamed on so-called false positives in the original results (i.e., a majority of significant findings may be “false”; Ioannidis, 2005), or false negatives in the replication attempts (i.e., a majority of nonsignificant findings may be “true”; Hartgerink et al., 2017), but questionable research practices (QRP), such as p-hacking or selective reporting, and even outright fraud undoubtedly played a role as well. Moreover, sometimes replicators seemed unaware of the study’s theoretical background and overlooked crucial aspects of the experimental procedure, raising questions about whether it should be called a nonreplication at all (e.g., Phaf, 2016). The inability to replicate sometimes even basic findings was so disconcerting to many researchers that it sparked a broad methodological reform movement that focused mainly on preventing QRPs and fraud (e.g., Yong, 2012). As most QRPs consist of working toward a desired hypothesis (e.g., selective reporting of only supportive results), the measures often aim at shielding data from theory, for instance, by blinding researchers to experimental conditions and hypotheses (see Munafò et al., 2017). Therefore, the reform movement addressed statistical effects rather than theoretical hypotheses, and confined itself primarily to data treatment (e.g., Wagenmakers et al., 2012).
Only if all intended statistical tests are specified in advance, to the exclusion of all other possible tests, can they achieve any semblance of validity. In addition to the latter type of confirmatory research, however, Wagenmakers et al. (2012) also allowed for “fishing expeditions” in the early stages of research. They referred to this as exploratory research where statistical tests cannot be specified beforehand and should not be applied (for criticism on the confirmatory/exploratory distinction, see Rubin & Donkin, 2022). The confirmatory statistical enterprise appears to be invalidated by the surprisingly high prevalence of false negatives (e.g., Hartgerink et al., 2017) and of false positives that, according to Ioannidis (2005), can occur in majority, even without any QRPs or fraud. Nevertheless, rather than abandoning these “fatally flawed” statistical practices (cf. Cumming, 2014), reformers advocated the tightening of confirmatory research, for instance, by preregistration of the analysis plan and experimental procedures. The reform movement culminated in a normative manifesto purporting to accelerate scientific discovery (Munafò et al., 2017). The proposals included blinding to hypotheses, improving statistical training, promoting preregistration and replication, guaranteed publication after reviewed preregistration, encouraging collaboration and transparency, and rewarding open and reproducible practices. Some guidelines, like openness and transparency, should be a matter of course, while others, such as statistical practices and preregistration, are more debatable (e.g., Szollosi et al., 2020). The narrow limitation to data, preregistration, replication, guaranteed publication, and the prohibited post-experimental theorizing prescribed by normative methodologies are specifically criticized here.
Non-compliance with the guidelines, while not mandatory, comes at a cost to individual researchers, teams of scientists, and journal publishers. Maintaining a good reputation is of paramount importance for all parties involved. Researchers, for instance, run the risk of not being published or funded if they do not explicitly comply with at least some guidelines. The number of journals awarding badges for open scientific practices is on the rise, and not just in psychology. To receive the three kinds of badges, study data, materials, and preregistration (of the study design and, optionally, the analysis plan) must be archived in an open-access repository with a permanent identifier. To further nudge scientific journals to adopt open practices, the Transparency and Openness Promotion (TOP) committee at the Center for Open Science in Charlottesville, developed a scoring scheme that combines four levels of compliance (0, no explicit compliance - 3, standards are required of all authors by the journal) for eight standards into a TOP factor (Nosek et al., 2015). The standards include citation standards, data transparency, code transparency, materials transparency, design and analysis transparency, study preregistration, analysis preregistration, and replication submission options. Journals that require all standards at the highest level will not publish authors who do not comply. The adherence levels are combined into TOP factors, and a recent list of 3006 journals across scientific fields can be viewed at https://topfactor.org/. As of January 2024, 367 out of 514 psychology journals scored higher than zero on the TOP factor, indicating partial adherence to at least one guideline. Increasingly, nudging has grown into a social imperative in the scientific community. In the absence of counterpoints, it is to be anticipated that eventually no research journal will be able to disregard the normative methodologies.
The social pressures exerted on researchers, journals, and funding agencies drive a growing compliance with methodological regulations, culminating in a “science-engineering” project unprecedented in the history of science. Similar to social-engineering projects, reformers subscribe to a kind of “makeability” ideal and pursue centralized planning and nudging to bring about changes and regulate the future development and conduct of scientific research. The top-down nature is well illustrated by the mandatory open-science requirements imposed by many funding agencies, universities, and journals (cf. Munafò et al., 2017). At the author’s university, for instance, a psychological experiment can only be run after completing a web application that manages ethical approval, legal privacy regulations, participant recruitment, preregistration, data-analysis planning, data archiving, and public data access. As long as there are journals remaining that do not explicitly conform to these norms, a comparison with those that do would in principle still be possible in a poorly controlled quasi-experiment (see Appendix A). Considering the possibility that safeguarding against QRPs also have unintended adverse effects, such a comparison seems urgently needed, given the dwindling number of control-condition options.
A Linear Conception of Science
The mistake is to think that any published paper or journal article is the end of the story and a statement of incontrovertible truth. It is a progress report.
The reformers seem to share with the general public a glorified view of the pursuit of scientific research as a quasi-linear, rule-based, methodical system for objectively establishing “facts” that are either true or false (i.e., effects either being statistically significant or nonsignificant). The teleological nature is evident in moving toward a single goal, which is reaching the unequivocal truth. Certainly, turns and detours are factored in, but the linearity refers to the way the goal is achieved. Put simplistically, in the eyes of reformers, doubling the amount of data, would mean doubling the progress toward the goal. The reform movement was rooted in the disappointment that methodological guidelines, based on the linear conception, were not followed in practice and did not seem to work well in experimental psychology. Despite the long-standing controversial issues and potential inconsistencies surrounding the institutionalized hybrid of null-hypothesis testing (see Gigerenzer, 1987), reformers sought to strengthen these statistical practices.
Psychological research papers often assume that the data “speak for themselves” in mechanically approaching the truth, and refrain from explicating the theoretical implications of their statistical conclusions (cf. Davidson, 2018). Flis (2022) argued that this linear, a-theoretical methodology stems from the behaviorist approach, which presupposes that all empirical “facts” fit the predetermined stimulus-response theory and that research serves only to fill in the remaining gaps. The narrowly data-limited research paper persists to this day in experimental psychology, but now probably because convergence to a comprehensive theory seems unattainable (cf. Sanbonmatsu & Johnston, 2019). The statistical reform movement further encouraged a-theoretical papers by shielding them from biases caused by a-priori and post-hoc hypotheses. Increasingly, theoretical hypotheses were replaced by only statistical hypotheses, which serve as a proxy for the former (cf. Davidson, 2018). The rejection of the null hypothesis, however, allows for a myriad of different theoretical conclusions and, thus, lacks substance. The neglect of theory in experimental psychology, further exacerbated by the normative methodologies, has led to a heavily fragmented research field, paradoxically moving away from the imaginary “truth,” with largely disconnected data sets and many seemingly incongruent publications (cf. Phaf, 2020; van Zomeren, 2024).
The idealized, data-oriented view on research seeking absolute foundations of knowledge pursued by “objective” scientists has been falsified by almost all major scientific discoveries, and has been denounced by Popper (2005). The highly structured format of the research paper, presenting an idealized chain of events, from hypotheses, through the evidence, to the conclusions, gives rise to a “fraudulent” misrepresentation of the actual thought processes that led to the work (Medawar, 1964). This perpetuates the myth of scientists doggedly abiding by the linear method. “There is no such thing as unprejudiced observation” (p.42, Medawar, 1964). A researcher pretending to be disinterested (i.e., “objective”) would lack transparency and could be seen as hypocritical and intellectually dishonest. New discoveries typically arise from an idiosyncratic, sometimes erratic, highly nonlinear quest for a better understanding in uncharted territories, characterized by wrong turns, failures, and rare successes (cf, Firestein, 2015; Lehrer, 2009). Reformers sought to “linearize” this messy research process by implementing strict methodological guidelines and getting rid of theoretical “chatter.” The linear conception hinges on illusory retrospective reconstructions of the research process as the privileged path to “the truth,” while ignoring the probably infinite plurality of viable options faced prior to it.
Radical Uncertainty
By “uncertain” knowledge, let me explain, I do not mean merely to distinguish what is known for certain from what is only probable. The game of roulette is not subject, in this sense, to uncertainty; nor is the prospect of a Victory bond being drawn. Or, again, the expectation of life is only slightly uncertain. Even the weather is only moderately uncertain. The sense in which I am using the term is that in which the prospect of a European war is uncertain, or the price of copper and the rate of interest twenty years hence, or the obsolescence of a new invention, or the position of private wealth-owners in the social system in 1970. About these matters there is no scientific basis on which to form any calculable probability whatever. We simply do not know. John Maynard Keynes (1937, pp. 213–214)
The failure to predict any crisis from theory led to a growing sense of humility in economic science. Leading up to the 2008 crisis, for instance, every economic forecast “…was not just wrong but spectacularly so.” (p.7; Haldane, 2016). From a strict, normative, methodological stance, the standard economic models behind these predictions should be considered falsified (cf. Kanazawa, 2021). However, prediction is impossible unless we can enumerate all possible options and their implications. In the “mini-world” of the stock market, for instance, it is not feasible to list all factors that might affect a share price. Both a myriad of local factors, such as the company’s performance and public image, board members' personalities, employees’ behavior, accidents and fires, and a myriad of global factors, such as the weather, economic conjuncture, fluctuations in commodity supplies, social unrest, pandemics, and wars, can influence stock trading. What portion of these factors are unknown in advance (the “unknown unknowns”) is impossible to estimate. Their number is infinite in all practical terms. While in the stock market possible outcomes (i.e., the quotes) can be specified in advance, in the “science market” future outcomes (i.e., scientific discoveries) are by definition completely unknown. The radical uncertainty concept has taken root most prominently in economics. It entails a profound ignorance of future events making their probabilities unknowable and unquantifiable (e.g., Bresser-Pereira, 2012; Spiegelhalter & Riesch, 2011; Volz & Gigerenzer, 2012). If economics is amenable to radical uncertainty, psychology must also be subject to it. Macro- and micro-economic events influence human behavior, and the psychology of market players contributes hugely to economic changes.
Unpredictability was generalized to all sciences by Popper (1957). He famously asserted that the growth of human knowledge cannot be predicted in either the natural sciences or the social and behavioral sciences. This seems at odds with the apparently excellent ability to predict within the natural sciences. However, he argues that upon closer inspection, a large degree of indeterminism and unpredictability is present even in classical physics. Popper cites the example of Newton’s apple falling to the ground due to gravity. The actual and concrete succession of events, however, cannot be predicted, as many unknown factors, such as wind pressure and tension in the apple’s stalk, need to be considered. Indeterminism does not imply that there are no phenomena that can be accurately predicted, but means that there is at least one that cannot. It would take too far to pursue these matters here, but recent advances in theoretical physics vindicate Popper’s conjectures. For instance, it has now been shown conclusively that the classical three-body problem regarding the motions of three bodies in mutual gravitational attraction (e.g., sun, moon, and earth) suffers from an inescapable unpredictability in the long run (Boekholt et al., 2020). While discussing problems related to infinite precision, Del Santo and Gisin (2019) concluded that indeterministic interpretations of classical physics are highly plausible. They also noted that the law of large numbers masks the indeterministic behavior of individual particles and results in apparent determinism at the larger scale. In the social and behavioral sciences, unpredictability may be more evident because the numbers of similarly behaving units are immensely smaller. Radical uncertainty, which stems from an infinity of widely divergent, largely unknown, future options must also be negotiated in experimental psychology (cf. Smedslund, 2016; see Appendix B).
Methodological Unpredictability
..you can’t connect the dots looking forward; you can only connect them looking backward. So you have to trust that the dots will somehow connect in your future.
The prominent reformer, and cognitive neuroscientist, Chambers (2019) conceded that preregistration was not suited to capture the effects of unpredictable events, such as brain strokes, solar flares, and floods (presently, one may add pandemics and wars). Also at the psychological level, behaviors and contexts often change through the intervention of fortuitous events (cf. Smedslund, 2016). Chambers seemed to refer, however, to the unpredictability in the time rather than in the nature of an occurrence. If we follow Popper (1957), the inability to predict when an event will happen is only part of a more general concept of unpredictability. Preregistration always entails a certain degree of prediction, which is not feasible if completely unexpected and unforeseeable events emerge in probing the infinite reservoir of unknown options. Many events, certainly those studied in psychology and the social sciences, are essentially unpredictable and would thus not be suited for the normative methodologies according to Chambers’ reasoning.
Unpredictability poses insurmountable problems not only for preregistration but also for data archiving. An ideal preregistration involves an exhaustive specification of all factors and conditions likely to impact the results, which is made impossible by the (virtually) infinite number of unknown unknowns. Therefore, preregistration and data archiving will always be incomplete. Recently, for instance, Lin et al. (2023) highlighted a systematic failure in psychological research papers to report characteristics of the visual stimuli being used and their presentation modus, such as luminance, color, contrast, spatial frequency, visual angle, central or parafoveal (i.e., horizontal and vertical) presentation, stimulus set size and selection, ambient lighting, chin-rest use, and display settings such as refresh rate, gamma, resolution, and screen type. While these are relatively “knowable,” concrete properties, a vastly larger range of less obvious factors may complicate these studies (e.g., eye dominance and dopaminergic lateralization; Phaf, 2023). On a broader note, psychological methodologist Scheel (2022) expressed concerns about registered reports containing an excessive number of ill-defined claims. The fundamental information deficiency also makes it inherently unfeasible to ascertain that a repeated experiment is ever a faithful replication (see Appendix C). Furthermore, the implicit reliance on “known” characteristics means that reformers cannot achieve their desired blinding to theory, and inevitably employ hidden theories. They merely lack awareness of the implicit theories they use to select the known, registered characteristics.
Psychologists grossly overestimate their ability for probing theoretical hypotheses by experimental means. Radical uncertainty entails that experimental psychology suffers from a massive underdetermination of models by data (cf. Klein, 2021), and defeats “the data speak for themselves” view. Even reformers are starting to recognize the inadequacy of this approach. A wide variety of bona-fide explanations can be derived from the same data set by researchers who are not subject to theoretical biases or “perverse” incentives to find positive results (Silberzahn et al., 2018). These reformers also conceded that preregistration would not have prevented such a broad variety of valid analyses. All hypotheses aim to reduce uncertainty by negotiating infinity, but none can fully escape it. Given the immense complexity of psychological phenomena, there can be no hope of achieving full experimental control over all theoretically relevant variables. Additionally, an infinite number of tests would be needed to attain some illusory certainty. At best, because of the extremely uncertain connection between theory and data, experimental results may suggest a specific hypothesis, but itcan never be “proven.” In light of radical uncertainty, even the more modest term evidence seems overstated.
The infinite number of factors involved in human behavior should not be a reason to abandon experimental psychology altogether (cf. Smedslund, 2016). The present perspective is concerned with how experimental psychology can still advance considering the countless potential bifurcations that future developments may take. While recognizing that experimental research may not be applicable to many psychological problems, experimentation remains a valuable tool for negotiating infinity. However, it can only be conducted within the current empirical and theoretical constraints, not for attaining some ultimate truth. Hypotheses are not “half a dozen of one, six of the other,” but they can sometimes be weighed against each other in experimental research. A very specific type of theoretically motivated experimental research in psychology is advocated here. Every experiment should explicitly preselect at least two substantive hypotheses that are made to compete under the current environmental conditions. Without ever becoming entirely subjective, factors other than empirical results also weigh in when settling the competition, such as consistency with other theories, integrative potential, simplicity, elegance, and even beauty (cf. Colless, 2019). Rather than striving for “mechanical objectivity” (cf. Davidson, 2018), researchers should explicitly acknowledge that the reformers’ ideal of complete objectivity is unattainable. “The effect size reported should also be situated in its research and historical context, where subjective decisions made at each stage of research are made transparent.” (p. 217; Pek & Flora, 2018).
The biophysicist Platt (1964) proposed the concept of “strong inference,” which entails an undirected, iterative research strategy in generating multiple alternative hypotheses and designing selective experiments to refine the hypotheses. Exploration of relevant substantive hypotheses and elaboration of previously successful hypotheses (e.g., through posterior hypothesizing) are essential in this process. The conception of research as a goal-directed process and the conflation of insubstantive statistical hypotheses with substantive theoretical hypotheses have created the illusion that inference can be achieved in a single or a few “confirmatory” experiments (cf. Wagenmakers et al., 2012). However, radical uncertainty about the meaning of experimental data tends to make research a protracted and haphazard process in which only a fleeting and relative belief in veracity can be gained. Platt very aptly likened it to climbing a tree, where branches are scaled that ultimately do not converge.
An Evolutionary Conception of Science
The empirical basis of objective science has thus nothing ‘absolute’ about it. Science does not rest upon solid bedrock. The bold structure of its theories rises, as it were, above a swamp. It is like a building erected on piles. The piles are driven down from above into the swamp, but not down to any natural or ‘given’ base; and if we stop driving the piles deeper, it is not because we have reached firm ground. We simply stop when we are satisfied that the piles are firm enough to carry the structure, at least for the time being. Karl Popper (2005, pp. 93–94)
The normative, linear conception of science sharply contrasts with more descriptive notions of an open-ended evolution of science. Aligning with many philosophers of science (e.g., see Gontier & Bradie, 2021; Hull, 1988; Marcum, 2017), developments in experimental psychology can also be conceived as an evolutionary process of variation in genotypes and successive selection of phenotypes (e.g., Phaf, 2020; Shiffrin et al., 2018). Theoretical models and hypotheses serve as genes in the scientific context. Their experimental operationalizations in scientific research can be regarded as phenotypes. Researcher creativity fueled by anomalous observations may represent a variation component (see Appendix D). After competing on empirical, theoretical, and subjective (e.g., perceived beauty, Colless, 2019) grounds, selection occurs through publication gatekeeping (e.g., peer review), research funding, and sustained citations. In biological, and scientific, evolution, positive (i.e., fitness-enhancing) and negative (i.e., fitness-reducing) deviants (i.e., mutations) are closely intertwined, with the former constituting a small minority. The two types of deviants can be initially indistinguishable, as both may simultaneously bear fitness costs and benefits. Positive deviants may remain dormant for a long period (i.e., “sleeping beauties”; van Raan, 2004), and competition winners may only be recognized after many decades.
In the history of science, the competitive selection process appears to have self-organized, by trial and error. According to Deutsch (2011), this process has been gathering pace since the beginning of the Enlightenment. As a result, sophisticated models are incrementally constructed by probing, often through deviance, an ever-changing, infinitely wide, empirical and theoretical landscape. Science evolution is not aimed at approaching the “truth,” but reduces uncertainty through an undirected, but deceptively powerful, optimization process (cf. Marcum, 2017). Science innovates along paths that cannot be precomputed and are non-algorithmic. Rather than inventing novel features from scratch, it often combines existing elements and thus transitions from a prior function to an abutting new function. Evolution works not like an engineer following a preconceived plan, but more like a tinkerer trying out loose parts lying around (Jacob, 1977). The infinite number of potentially useful parts and outcomes leads to an inherent open-endedness. Of course, scientists may have specific goals in mind, which vary so widely that the overall result is virtually undirected research. Scientific research negotiates infinity by endeavoring to find the hypotheses with the highest utility (i.e., fitness) under the current theoretical and empirical conditions.
Evolutionary reasoning resolves the great paradox of science that important advances are made in the absence of a-priori design. The process appears noisy, but the results are far from random. It grows intricate “organisms” through successive selection from the fittest “genes” of the past, and can solve complex problems arising from environmental selection pressures in relatively few generations (cf. Dawkins, 1986). Computational search methods implementing natural selection procedures, such as genetic algorithms, have been successfully applied to intractable problems, such as the traveling salesman problem (see Forrest, 1993; Miikkulainen & Forrest, 2021). Indeed, similar algorithms have been employed to computationally model specific aspects of scientific evolution. Smaldino and McElreath (2016), for instance, found in their evolutionary simulations, using a laboratory’s publication count as fitness criterion, that selection for high publication output results in poorer methods and rising false discovery rates. Even unrealistically high rates of simulated replication attempts, as high as 50%, failed to halt this proliferation of bad science.
The nonalgorithmic and unpredictable evolution of science thrives on pluralism, diversity, and flexibility, allowing for both coincidental positive and negative deviants, the latter being supplanted by the former over time in the course of their competition (see Appendix E). The historical paths taken in this haphazard selection process matter less than the fitness of their current outcomes. The high degree of nonconformity (i.e., with social norms), and even eccentricity, exhibited by brilliant, creative scientists, such as Isaac Newton, Paul Dirac, Charles Darwin, Francis Crick, John Forbes Nash, and others, attests to the diversity that successful science calls for, already at the individual level. However, there are certainly many alternative routes to successful science besides those taken by these male and white scientists. Creative contributions can also originate from outside academia, but they may struggle to survive the scientific selection process. Even though the anarchist philosopher of science Feyerabend (1975) contended that “non-scientific” beliefs, such as astrology, voodoo, witchcraft, and Chinese traditional medicine, should be treated on an equal footing with established scientific theories, his catchphrase that “anything goes” in science does not appear to be far off the mark. Rather than providing constructive guidance for scientists, he thought of this maxim as highlighting the naivety of prescribing universal methodological schemes in the face of the historical development of science.
Normative Methodologies Curtail Deviance
Science does err but that is an integral part of a process that produces valid and valuable result at the end, just as in a Darwinian biological world. (Shiffrin et al., 2018, p. 2639)
Positive and negative deviants are similar in that they both challenge established norms and are initially deemed “ugly,” “discordant,” and not scientifically acceptable (cf. Colless, 2019). The reform movement specifically targets these deviants in explicit as well as implicit ways. A finding’s perceived “ugliness,” for instance, has often been used by reformers as a subjective criterion to select “effects” to replicate from the vast body of published research (cf. Davidson, 2018). Whether new ideas constitute positive or negative deviants, however, is virtually impossible to determine, at the moment or even for extended periods after their inception (cf. Shiffrin et al., 2018; van Raan, 2004). It is the rule rather than the exception that publications on groundbreaking positive deviants, like the Higgs boson, lasers, quasicrystals, nuclear magnetic resonance, symbiogenesis, CRISPR, cancer immunotherapy, mRNA vaccines, retinal feature detectors, optogenetics, and mirror neurons, are first rejected by leading scientific journals and have to be deferred to lower impact journals (cf. Ricón, 2020; Siler et al., 2015). It is likely that in many of these instances, methodological issues were cited for the early rejections. In experimental psychology, and all other sciences where normative methodologies are imposed, they have the potential to further curtail positive deviants.
Starkly maladaptive research practices (i.e., negative deviants), such as data falsification, other forms of fraud, or unethical behavior harming experimental participants, should not be condoned in any form (see Appendix F). Nevertheless, even the most stringent methodologies will not be able to stop ill-intentioned researchers entirely from engaging in them. Gopalakrishna et al. (2022) found that self-reported adherence to scientific norms was the most important factor in reducing both fraud and QRPs. Rather than imposing restrictive methodologies, it would seem better to appeal to the researcher’s scientific responsibility. Ultimately, it is more productive to place trust in highly qualified scientific professionals than treating them with distrust from the outset. Paradoxically, QRP use may rise due to the prescribed protocols, as researchers may feel relieved from personal responsibility if they simply follow the formal rules. This perspective does not seek to promote the prevalence of QRPs or fraud, but rather questions the pre-emptive methodological measures.
Ambitious researchers aspire to the highest levels of innovation in their field, and tend to exaggerate their claims (cf. Lilienfeld, 2017). Consequently, also negative deviants will first be posited as positive deviants. The scientific community overcorrects the preponderance of heretics by initially regarding genuine revolutionaries as cranks or charlatans. Well-known examples in the history of science include Einstein, who at first could not get a proper academic position (cf. Noble, 2010). More recent examples are also readily available. Due to their opposition to the conventional wisdom that stress causes ulcers, the medical community initially treated the later Nobel laureates Robin Warren and Barry Marshall as heretics (cf. Pincock, 2005). To convince the highly skeptical and dismissive physicians that bacteria cause stomach ulcers, they resorted to drastic measures. Marshall infected himself in 1984 with Helicobacter Pylori and promptly developed severe gastritis. A course of antibiotics cured him, providing the first direct indication for this causal relationship (see Charitos et al., 2021). The bacteria were observed in the stomach before, but Warren and Marshall were the first to posit the discordant hypothesis that they cause ulcers.
Besides QRPs, researchers can invent a plethora of other deviant practices. Normative methodologies will curtail a virtually infinite number of deviants, including potentially fruitful ones, that far exceeds the limited number of QRPs. Preregistration of the self-inflicted gastric infection with Helicobacter Pylori by Marshall, for instance, would be unthinkable (cf. Charitos et al., 2021). No one would have dreamed beforehand of simply leaving contaminated Petri dishes while on vacation, as Fleming did (cf. Rudd, 2017). Dishwashing soap was not involved in the nonreplications of G-protein experiments (cf. Firestein, 2012). Experimenter gender (cf. Georgiou et al., 2022) and pet ownership (cf. Panksepp, 2005) do not figure in preregistrations of animal and human stress experiments to date. Neglecting moderators (e.g., in nonreplications) suggested by existing hypotheses regarding eye-movement influences on emotion and memory could potentially block useful applications like EMDR therapy (cf. Phaf, 2023). Countering deviant practices risks “throwing the baby out with the bathwater.”
Almost every positive development, not just in science, has its drawbacks (see Appendix F). Selection between competing hypotheses always involves trade-offs between positive and negative attributes. The negative aspects of positive deviants are often glossed over retrospectively, when they are fitted into an idealized conception of science. However, sometimes they are acknowledged as being heavily outweighed by their positive outcomes. Medications provide a good example of this. Even the most healing and health-promoting drugs have some negative side effects. Selection plays an important role in science evolution, but it should focus on a slow, meticulous selection of the fittest, to promote success, rather than on the premature deselection of the apparently unfit, to prevent failure. Pre-emptive normative methodologies curb negative deviants rapidly, although not completely, but this comes at the cost of also suppressing positive deviants, which take longer to mature. Approximately half of published research papers are not cited even once and thus will sink into oblivion (cf. Shiffrin et al., 2018). Receiving a very high number of citations in the long run, is one of the hallmarks of positive deviance, implying that negative deviants will eventually become irrelevant and need not be deselected early on. Retrospectively, the latter are rarely corrected explicitly in the literature; they just fade away.
Relaxing the Norms
All the formal rules of ‘how to keep from fooling ourselves’ have been followed. And yet no progress could possibly be made, because it was not being sought: explanationless theories can do no more than entrench existing, bad explanations. David Deutsch (2011, p. 320)
Normative methodologies are unable to differentiate between positive and negative deviants either beforehand (i.e., through preregistration) or afterward (i.e., through replication). The narrow limitation to data (i.e., statistical testing) and the eschewing of pre- and post-experimental theorizing impede rather than strengthen the tinkering necessary for the successful evolution of psychology. Many reformers argued that data should be shielded from theory to prevent biased data processing (Munafò et al., 2017). In their quest for humility, Hoekstra and Vazire (2021) even suggested that researchers should remain agnostic about which interpretation is most likely to be valid for their data. The reformers oppose “hypothesizing after the results are known” (HARKing; Kerr, 1998), thereby conflating insubstantive statistical hypotheses with substantive theoretical hypotheses, as demonstrated by their many failures to replicate statistically in the absence of any theoretical elaboration. Firestein (2015) argues that failure is the basis for successful science. The reformers focus on data failures, which they attempt to prevent at the outset. Pure data, however, can never fail, only hypotheses can fail, and exclusively in competition with other hypotheses. Furthermore, by prohibiting HARKing, researchers are denied the chance to learn from failure. As theory is greatly underdetermined, and certainly not dictated, by empirical data, imaginative, substantive insights are invariably required (i.e., theory-laden guesses; cf. Deutsch, 2011). Banning HARKing means that hypotheses can only be rejected but not replaced by more useful ones, resulting in “a race to the bottom.”
Null-hypothesis significance testing, which creates the illusion of fast, explanationless data analysis, has been devastatingly criticized (cf. Cumming, 2014; Davidson, 2018) but stubbornly persists in normative methodologies. One may question whether a probabilistic approach is at all appropriate for the radically uncertain research landscape (cf. Keynes, 1937; Spiegelhalter & Riesch, 2011; Volz & Gigerenzer, 2012). Nearly all methodological norms, such as prescribing preregistration and prohibiting HARKing, are predicated on statistical testing. By renouncing significance testing, dynamic adjustments of hypotheses during research, “fishing expeditions” and HARKing are no longer questionable, but constitute fruitful research strategies that promote much greater flexibility (i.e., tinkering) and higher productivity than, for instance, exhaustive preregistration allows. Replication efforts also too often rely on statistical “true” or “false” decisions regarding experimental effects (see Appendix C), rather than sound theoretical judgment (cf. Shiffrin et al., 2021; Trafimow, 2019). There is no argument here against conducting replication attempts per se, insofar as they are possible in principle. However, their results should only be published in scientific journals if they explicitly contribute to our understanding of the phenomenon. Explanationless replication attempts can at best maintain the status quo, but they will never produce positive deviants. Pure data reports add to the deluge of meaningless research publications (cf. McPhetres et al., 2021), and are better suited for publicly accessible data repositories. Guaranteed publication of preregistered research or replication studies is beyond the pale if they concern merely a-theoretical reports, leaving the reader to guess their meaning. Thus, both null and positive results should be published only if not presented in null-theory papers.
Innovative hypotheses that act as mutations in scientific evolution inevitably carry a high level of uncertainty and may initially suffer from an exaggerated degree of deviance, without which they might not have emerged at all. The extreme initial variability is a key factor in the Proteus phenomenon, for instance, where incompatible, or even seemingly contradictory, research findings are published, mostly within short time intervals but sometimes also after longer periods (Ioannidis & Trikalinos, 2005). Likewise, during the process of replication, the occasionally inflated effect sizes of new findings tend to wear down (cf. Lehrer, 2010). Successful science is driven by exploratory research, but there may also be limited room for purported confirmatory research (cf. Wagenmakers et al., 2012), if the distinction can be made at all (cf. Rubin & Donkin, 2022). One should question what is confirmed in narrowly data-limited, explanationless, confirmatory research. Theory-free confirmatory data, significantly rejecting that “nothing is there,” allow for an infinity of interpretations of “what is there,” and are thus effectively meaningless. The alternative confirmation of a well-specified, isolated, substantive hypothesis, not competing with other hypotheses, does not play an important role in fundamental science, and should probably be relegated to applied science, where simple yes-or-no decisions are often sought (cf. Gigerenzer, 1987). It may be justified when the costs of potential failures are high, such as in drug research where patients' lives could be at stake (cf. Shiffrin et al., 2018).
So far, the unprecedented science-engineering project in psychology does not seem to have led, at least ostensibly, to appreciable advances in psychological theorizing or practical (e.g., societal or clinical) applications. Reformers would be hard-pressed to name concrete innovations emerging from their efforts. One may even ask how many useful ideas have not survived the premature deselection of deviants by the normalizing of research processes. The most pernicious methodological proposal is probably “blinding to theory” (cf. Munafò et al., 2017). This leads to a drowning of the rare positively deviant ideas in a sea of meaningless, narrowly data-limited, research papers, and a “freezing” of psychology’s evolution. Comparison with pre-reform progress remains challenging (see Appendix A), but if an innovation decline can be observed, the normative methodologies should be considered falsified or have effectively lost the competition with evolutionary conceptions of science. If they are indeed damaging to the exceptional and mostly unconventional approaches that have propelled successful science, they should be relaxed or even abandoned altogether.
The Primacy of Theory
Whether you can observe a thing or not depends on the theory which you use. It is theory which decides what can be observed. Albert Einstein (cited in Fullbrook, 2012, p. 20)
Popper used to begin his lecture course on the philosophy of science by asking the students simply to ‘observe’. Then he would wait in silence for one of them to ask what they were supposed to observe. This was his way of demonstrating one of many flaws in the empiricism that is still part of common sense today. So he would explain to them that scientific observation is impossible without pre-existing knowledge about what to look at, what to look for, how to look, and how to interpret what one sees. And he would explain that, therefore, theory has to come first. It has to be conjectured, not derived. David Deutsch (2011, p. 403)
Psychological science should not remain “a tiny, frozen island of explanation in an ocean of incomprehensibility” (Deutsch, 2011, p. 446). Only through radical innovation, which may involve breaking with current methodological norms, can it escape local fitness maxima and achieve higher levels of usefulness, without ever reaching a nonexistent global maximum. To promote productive science, the susceptibility to, rather than the prevalence of, deviant findings and ideas must be enhanced. Positive deviants rarely, if ever, arise in a theoretical void, and will emerge more readily if the literature is filled with publications that articulate their theoretical hypotheses. Today, the most impressive spin-off from psychological research, the “AI revolution,” is due to theoretical work, specifically the computational modeling of learning and memory, and is only loosely built on experimental results (cf. LeCun et al., 2015). These Large Language Models, which may not be artificially intelligent but have almost supra-human learning capacities, could well revolutionize our society. In turn, these learning systems can expedite successful science by enabling automated semantic analyses that can discover latent hypotheses in the overwhelming volume of publications and ultimately suggest disruptive innovations (cf. Sourati & Evans, 2023; Weeber et al., 2001).
Instead of pushing for an apparent homogenization of data (e.g., through replication), scientific journals can better encourage well-grounded theoretical elaborations of all unexpected results and purported nonreplications. HARKing is highly desirable as long as the post-hoc nature of conferring meaning to results is acknowledged. The evolution of psychological science will remain haphazard (“evolution is a tinkerer”; Jacob, 1977), but emphasizing competition between theoretical hypotheses will help stem the publication deluge of meaningless data reports and improve the coherence of research literature (cf. van Zomeren, 2024). Ongoing competition, through experimentation or other means, is essential here to progressively separate the wheat from the chaff. Additionally, the current peer review process is biased against ideas that deviate from the norm as they are likely to clash with the beliefs of the majority of reviewers (cf. Perry et al., 2004). Novel ideas tend to be more heavily questioned by reviewers than less novel ideas (Johnson & Proudfoot, 2024). A shallow theoretical treatment is then a wise strategy for getting a paper accepted. Reviews of research papers and grant proposals should focus on the arguments for a hypothesis, and reviewers' convictions should be made explicit and subsequently discounted. Reviewer disagreement (i.e., norm-breaking) may even indicate the potential for positive deviance. To maximize openness to positive deviants, the data primacy prevailing in most experimental psychology journals (cf. McPhetres et al., 2021) must be replaced by a theory primacy (cf. Deutsch, 2011; Phaf, 2020). A research journal’s promotion of explicit theory building could be highlighted by open-theory badging.
This forward-looking perspective extends the concept of positive deviance to the development of a successful psychological science. The recent reform movement and the ensuing normative methodologies are criticized for solely focusing on preventing negative deviants and unintentionally hindering positive deviants. Experimental psychology would be better served by transitioning from fast narrowly data-limited research methods to slower, theoretically motivated investigations. Positive deviants fit into an evolutionary conception in which science proceeds in an open-ended fashion branching out in an infinite number of directions. Empirical research can only pursue the paths that are currently most useful when negotiating this infinity, but it can never come to a definitive “truth.” Allowing positive deviants to surface in the competition between alternative hypotheses will foster the discovery and innovation process, which remains largely unpredictable. Psychological science should be open to anomalous, exceptional, but explicitly articulated and well-argued hypotheses, even if they initially challenge the normative beliefs of most. Variation and the ability to overcome the previously fittest hypotheses are driving forces behind the evolution of successful science.
Footnotes
Acknowledgments
I am deeply indebted to Gezinus Wolters for his valuable comments on previous versions of the manuscript. I also want to express my thanks to Maarten Derksen, Kathleen Slaney, David Trafimow, and two unnamed reviewers for the helpful feedback on the manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
