Abstract
Readers of peer-reviewed research may assume that the reported statistical analyses supporting scientific claims have been closely scrutinized and surpass a high-quality threshold. However, widespread misunderstanding and misuse of statistical concepts and methods suggests that suboptimal or erroneous statistical practice is routinely overlooked during peer review in psychology. Here, we explore whether psychology journals could ameliorate some of the field’s statistical ailments by adopting specialized statistical review: a focused technical assessment, performed by statistical experts, that addresses the analysis and presentation of quantitative information and supplements regular peer review. We discuss evidence from a recent survey of journal editors suggesting that specialized statistical review may be unusual in psychology journals and is regarded by many editors as unnecessary. We contrast these views with those in the biomedical domain, where statistical review has been considered a partial preventive measure against the improper use of statistics since the late 1970s. We suggest that the current “credibility revolution” presents an opportune occasion for psychology journals to consider adopting specialized statistical review.
Keywords
After moving to a system of having a statistician present at every meeting, none of the editorial team could imagine moving back to a system where they were not present.
Scientific claims in psychology often rely on a scaffold of statistical analyses that support inductive inferences from samples of data (Rosnow & Rosenthal, 1989). The appropriate selection, implementation, reporting, and interpretation of these analyses is necessary for the validity of the associated claims (Cook & Campbell, 1979; García-Pérez, 2012). Readers of the peer-reviewed literature may assume that reported statistical analyses have been closely scrutinized for quality. But serious concerns about the credibility of psychological research have been raised (Baker, 2016; Pashler & Wagenmakers, 2012), and the misunderstanding and misuse of statistical methods has been implicated as an important cause (Button et al., 2013; Gigerenzer, 2018; Munafò et al., 2017; Simmons, Nelson, & Simonsohn, 2011).
In this article, we explore whether psychology journals could ameliorate some of the field’s statistical ailments by adopting specialized statistical review: a focused technical assessment, performed by statistical experts, that addresses the analysis and presentation of quantitative information, supplementing regular peer review. In biomedicine, statistical review has been considered a partial preventive measure against the improper use of statistics since the late 1970s (Altman, 1982, 1994, 1998; Smith, 2005; Sox, 2009). In a recent survey, we found that 71 of 107 editors (66%) at leading biomedical journals reported that they routinely employed statistical review for 10% or more of submitted manuscripts, and 25 (23%) said they used statistical review for all manuscripts (Hardwicke & Goodman, 2019; also see George, 1985; Goodman, Altman, & George, 1998). By contrast, the survey responses from a sample of 39 psychology-journal editors, reported in this article, suggest that specialized statistical review is unusual in psychology journals and often regarded as unnecessary. We summarize evidence suggesting that statistical problems are commonplace in the published literature and discuss whether the apparent value of statistical review in biomedical journals could translate to psychology. We suggest that the current “credibility revolution” (Nelson, Simmons, & Simonsohn, 2018; Vazire, 2018) presents an opportune occasion for psychology-journal editors to consider adopting specialized statistical review.
Disclosures
Data, materials, and online resources
All data (https://osf.io/nquws/files/), survey materials (https://osf.io/tmah8/files/), and analysis scripts (https://osf.io/4zurk/files/) related to this study are publicly available on the Open Science Framework. To facilitate reproducibility, we wrote this manuscript by interleaving regular prose and analysis code, using knitr (Xie, 2018) and papaja (Aust & Barth, 2019), and have made the manuscript available in a software container (https://doi.org/10.24433/CO.8241121.v3) that re-creates the computational environment in which the original analyses were performed. Detailed methods and results for the survey of psychology editors is provided in the Supplemental Material (available online at http://journals.sagepub.com/doi/suppl/10.1177/2515245919858428).
Reporting
The survey data reported here represent the subsample of psychology journals included in a broader survey of statistical-reviewing policies at biomedical journals. The findings for biomedical journals will be reported elsewhere (Hardwicke & Goodman, 2019), and the findings for psychology journals are reported here for the first time. We report how we determined our sample size, all data exclusions, all manipulations, and all measures in the study.
Ethical approval
This study was approved by the institutional review board of the Stanford University School of Medicine.
What Do Psychology-Journal Editors Think About Statistical Review? Results of a Survey
To gauge the current use of statistical review, we surveyed a sample of high-impact psychology journals (full methods and results are provided in the Supplemental Material available online). We received responses from editors (all but one an editor-in-chief) at 39 of 118 psychology journals representing 13 subfields (Fig. S1 in the Supplemental Material). We asked respondents about the frequency of statistical review in their journal, the nature of their statistical reviewers and how they are chosen, the procedures and outcomes of statistical review, their ability and willingness to use statistical review, and their perception of the value of statistical review.
An unexpected observation both complicated interpretation of the data and motivated this commentary; 17 (44%) respondents stated that no additional specialized statistical review was warranted and that regular peer reviewers are both capable of evaluating and expected to evaluate the statistical aspects of submitted manuscripts (see Results in the Supplemental Material). These views contrast starkly with those of biomedical editors and statisticians (Wasserstein & Lazar, 2016), who almost universally accept the notion that statistical errors or suboptimal analyses can go undetected by regular peer review, and that specialized and targeted statistical review is required (Hardwicke & Goodman, 2019).
Does Psychology Need Statistical Review?
Researchers have highlighted a litany of statistical ailments that afflict the psychology literature, ranging from simple reporting errors to wholesale misunderstanding and misapplication of fundamental statistical concepts and techniques (see Table 1). One striking example is the pervasive problem of inadequate statistical power that persists in several domains of psychology. Many published psychology studies have such small sample sizes that statistical tests are unlikely to be sufficiently powered to detect plausible effects (e.g., Button et al., 2013; Cohen, 1962; Fraley & Vazire, 2014; Sedlmeier & Gigerenzer, 1989; Stanley, Carter, & Doucouliagos, 2018; Szucs & Ioannidis, 2017; Vankov, Bowers, & Munafò, 2014). Smaldino and McElreath (2016) examined 44 studies of statistical power in the social and behavioral sciences and found that the average power to detect small-size effects (d = 0.2) was very low (M = 0.24, assuming α = .05). Moreover, there has generally been no increase in power over time despite repeated calls to address the issue (Button et al., 2013; Cohen, 1962; Sedlmeier & Gigerenzer, 1989; but see Sassenberg & Ditrich, 2019). Because statistical power is a function of multiple factors, the problem may be less severe in domains (such as psychophysics) that commonly feature low intrasubject variability, within-subjects designs, and multiple measurement trials per subject (Rouder & Haaf, 2017). Inadequate statistical power, coupled with publication bias, can lead to inflated effect-size estimates and increases the likelihood of false negatives and false discoveries (Button et al., 2013; Fraley & Vazire, 2014; Ioannidis, 2005). Survey evidence and examination of articles’ Method sections suggests that many psychologists choose sample sizes on the basis of typical practice in their domains of research rather than formal power analysis (Sedlmeier & Gigerenzer, 1989; Vankov et al., 2014). As these domain experts are also training the next generation of scientists and scrutinizing their colleagues’ work during the peer-review process, a self-reinforcing cycle of suboptimal practice may follow. Independent statistical review that focuses on issues like those listed in Table 1 (as occurs at biomedical journals; e.g., Cobo et al., 2007; Gore, Jones, & Thompson, 1992) could help to break such cycles.
Statistical Ailments in the Published Psychology (and Related) Literature, With References Providing Further Detail and Empirical Evidence
The pervasiveness of statistical ailments in the published literature suggests that peer review in psychology journals is not sufficient to identify and minimize those problems. Quantitative training programs in psychology are typically slow to incorporate contemporary developments, avoid advanced topics, and provide only superficial treatment of fundamental statistical concepts (Aiken, West, & Millsap, 2008; Aiken, West, Sechrest, & Reno, 1990). Much quantitative training in psychological science neglects historical and philosophical foundations (Gigerenzer, 2004, 2018), proliferating confusion about core statistical concepts and facilitating widespread adoption of suboptimal practices (Wasserstein & Lazar, 2016). Statistical misconceptions are prevalent among instructors and deeply embedded in mainstream research-methods curricula (Brewer, 1985; Haller & Krauss, 2002; for a review, see Gigerenzer, 2018). Some research practices taught to undergraduates are now recognized as questionable (Bem, 2004; Wagenmakers, Wetzels, Borsboom, Maas, & van der Kievit, 2012).
Why Is Statistical Review Used in Medicine?
Leading biomedical journals have been adopting statistical review and refining their policies since the 1970s (Altman, 1982, 1994, 1998; Smith, 2005). Most biomedical-journal editors in our survey (Hardwicke & Goodman, 2019) indicated that they believed statistical review provides substantial incremental value beyond regular peer review and results in important changes to manuscripts around 60% of the time—even though many biomedical articles have Ph.D.-level methodologists among the authors. This view is supported by empirical work evaluating leading medical journals, including The BMJ, The Lancet, and Annals of Internal Medicine, which has consistently indicated that statistical review can play an important role in improving manuscript quality (Gardner & Bond, 1990; Goodman, Berlin, Fletcher, & Fletcher, 1994; Gore et al., 1992; Prescott & Civil, 2013; Schor & Karten, 1966). In a 2017 Annals of Internal Medicine survey of 337 corresponding authors of research published between 2012 and 2016, 57% reported a moderate or large increase in their article’s overall quality as a result of the statistical editorial process; only 15% reported “no” impact, and only 2% reported a “negative” impact. In addition, 58% reported making considerable effort to respond to statistical comments, and 54% felt that such effort was “definitely” worthwhile (Stack et al. 2017).
To our knowledge, there has been only one randomized control trial designed to evaluate the effectiveness of statistical review (Cobo et al., 2007). That study, conducted at the biomedical journal Medicina Clinica, involved 115 articles, 16 of which were ultimately not published. The addition of statistical review to regular peer review led to small quality increases for all but 3 of 36 assessment criteria, resulting in overall modest but discernible improvements in manuscript quality. Although this improvement could have been due to simply adding a reviewer, it was the statistical aspects of the manuscripts that improved most.
Providing statistical guidelines for authors makes a journal’s expectations transparent and may help to improve statistical practice (Bailar & Mosteller, 1988; Smith, 2005). Many psychology journals indicate that authors should adhere to statistical-reporting guidelines, such as those from the American Psychological Association (APA Publications and Communications Board Working Group on Journal Article Reporting Standards, 2008; Wilkinson & APA Task Force on Statistical Inference, 1999) and the Psychonomic Society (2019).
The evidence for the effectiveness of statistical guidelines in biomedical journals is mixed (Dexter & Shafer, 2017). In psychology, the introduction of journal-specific statistical guidelines at the journal Psychological Science was associated with a number of modest improvements in statistical reporting (Giofrè, Cumming, Fresc, Boedker, & Tressoldi, 2017). Such guidelines are most effective if enforced by a reviewing editor (Dexter & Shafer, 2017) and may improve the efficiency, completeness, and standardization of statistical review, but they are unlikely to supplant expert statistical review (Altman, 1998).
In summary, there is a reasonable body of evidence to suggest that specialized statistical review in biomedicine has been effective in preventing many serious analytic and inferential errors from reaching the published literature. Could psychology journals improve the validity and reproducibility of their content by adopting a similar model?
How Would Statistical Review Work in Psychology?
In the biomedical domain, there is no single model for statistical review (Hardwicke & Goodman, 2019), and policies have evolved gradually (Altman, 1998; Cobo et al., 2007; Gardner & Bond, 1990; Goodman et al., 1994; Gore et al., 1992; Prescott & Civil, 2013; Schor & Karten, 1966; Smith, 2005). Drawing from that experience, we address four key logistical issues: Who should conduct statistical review, which manuscripts should undergo statistical review, at what stage should statistical review be performed, and how should statistical review be incorporated into the editorial process?
Who should conduct statistical review?
Psychology statistical reviewers do not necessarily need to be statisticians per se, but should have advanced (Ph.D.-level) quantitative training (Goodman et al., 1998; Hardwicke & Goodman, 2019). Ideally, they should understand the terminology, conventions, and practices of psychology research. The majority of psychology-journal editors responding to our survey reported that difficulty in finding appropriate reviewers affected their willingness to conduct statistical review (see Fig. S3 in the Supplemental Material). It is not clear whether this difficulty reflected a lack of potential reviewers or problems identifying them.
The number of statistical reviewers required for a journal will depend on the model of statistical review employed. Just over half of the biomedical journals we surveyed indicated that they typically relied on around two statistical experts on their internal editorial teams to conduct all of the statistical review (Hardwicke & Goodman, 2019), although the largest journals tended to have more. Just over a third relied on a pool of external reviewers, with a median size of 11 members. In our psychology survey, the majority of respondents who reported using statistical review indicated that their statistical reviewers were typically identified on an ad hoc basis (58%; see Fig. S4a in the Supplemental Material). Many relied on from 1 to 40 editorial-team members (Mdn = 20, plus 3 missing responses), although it was unclear whether these individuals had specialized statistical expertise. Only one respondent indicated that there was a predesignated pool of external statistical reviewers, consisting of 25 members.
A starting point for psychology journals could be to recruit one statistical expert to serve on the editorial board or be retained as a regular consultant. If the expert is not someone who would see this as a professional service or a vehicle for career advancement, compensation might be required. Whereas about half of biomedical journals pay their statistical reviewers, only one respondent in our psychology survey did so (Fig. S4c in the Supplemental Material).
Which manuscripts should undergo statistical review?
Optimally, all likely-to-be-accepted manuscripts with relevant statistical content should undergo statistical review (George, 1985; Schor & Karten, 1966). In our psychology survey (see Fig. S2 in the Supplemental Material), 15 (38%) respondents indicated that statistical review was used for all relevant articles, and 15 (38%) respondents indicated that statistical review was rare (≤ 10% of articles). However, free-text comments indicated that at least 8 of the 15 who said all manuscripts received such review did not differentiate it from regular peer review; some of these editors indicated that peer reviewers had sufficient statistical training.
If all articles cannot be statistically reviewed, editors have to prioritize. Smith (2005) noted that it took 5 to 10 years for The BMJ to reach the point where all published articles with a statistical component were undergoing statistical review. The Annals of Internal Medicine had one statistical reviewer in the early 1980s and added a second in 1987. The team grew steadily over the ensuing 30 years to its current size of 10. The journal Psychological Science has recruited a pool of 6 statistical advisors who can be called upon by the journal’s editors at their discretion (Association for Psychological Science, 2016).
Targeting manuscripts with complex methods for statistical review makes some sense, but a number of commentators in the biomedical domain have noted that routine statistical analyses tend to be the most problematic (Schor & Karten, 1966; Smith, 2005). Sophisticated analyses may be conducted by individuals with more statistical expertise (Schor & Karten, 1966). Many of the statistical ailments in the psychology literature relate to foundational issues, not advanced techniques. Consequently, the most impactful contribution of statistical review might come from evaluating what appear to be routine analyses.
At what stage should statistical review be performed?
An important question for journals is, at what stage of the publication process should manuscripts undergo statistical review? In our survey of biomedical journals, the majority of respondents indicated that statistical review was either solicited at the same time as regular peer review (36%) or after regular peer review and before a provisional acceptance decision (27%). In our psychology survey, although the majority of respondents (71%) indicated that statistical review was solicited at the same time as regular peer review (Fig. S5c), many did not differentiate between regular and statistical review. The model will ultimately be journal-specific, dependent on the journal’s capacity for statistical review.
How should statistical review be incorporated into the editorial process?
How editors should incorporate the input of statistical reviewers is an important issue, particularly for journals unused to such review. Smith (2005) described the slow process of mutual education that had to occur at The BMJ: We worried that the gulf between medical editors and statisticians with no knowledge of medical research would be unbridgeable. . . . In the early days we made the mistake of thinking that statistics was a much more exact science than clinical research and that we had to go along with exactly what the statisticians advised. Eventually we learnt that there was room for negotiation over what was acceptable . . . recognizing the inevitable trade-offs between statistical purity [and] what can actually be done in clinical research. . . . (p. 2)
Smith’s observations illustrate that effective statistical review requires not only the addition of a statistical reviewer, but also “cross-cultural” education and communication, which takes time. The reviewers need to understand and absorb the values of the research community they are serving, and that community, and the editors, needs to understand how the changes requested by such reviewers are improving the validity of its research. External statistical reviewers who are not part of the journal can make unrealistic requests, which must be adjudicated or modified by an internal editor. Statistical experts directly incorporated into the editorial process absorb journal and disciplinary norms and are also able to educate editors.
Statistical Review, Open Science, and Metaresearch
Psychological science is in the midst of a credibility revolution (Nelson et al., 2018; Vazire, 2018), and this is an opportune time for journal editors to consider adoption of statistical review. There is growing awareness that the credibility of scientific claims depends on transparent reporting (Klein et al., 2018; Munafò et al., 2017). Statistical review is likely to be most effective when reviewers have access to all of the raw research artifacts (materials, data, analysis scripts, and preregistered protocols when relevant), which enable a fully informed assessment. Having access to data, and ideally analysis scripts, enables verification of analytic reproducibility (Hardwicke et al., 2018; Sakaluk, Williams, & Biernat, 2014) and assessment of analytic robustness (LeBel, McCarthy, Earp, Elson, & Vanpaemel, 2018; Localio et al., 2018 Steegen, Tuerlinckx, Gelman, & Vanpaemel, 2016), and it can facilitate detection of fraud (Simonsohn, 2013; Smith, 2005). Research materials can convey statistically relevant information about data collection, and availability of survey instruments can help reviewers raise questions about psychometric issues (McPherson & Mohr, 2005). Preregistration of study protocols (Nosek, Ebersole, DeHaven, & Mellor, 2018) could facilitate identification of questionable research practices such as p-hacking and HARKing (i.e.,
Statistical review might be enhanced by the use of computer algorithms to automatically screen for and flag potential errors in submitted manuscripts. The free software statcheck (http://statcheck.io/), for example, can automatically extract certain statistical outcomes reported in American Psychological Association style and check the internal consistency of p values, test statistics, and degrees of freedom (Nuijten, Hartgerink, van Assen, Epskamp, & Wicherts, 2016). Semiautomated tools like this require little resource investment and could reduce the burden of statistical review. One psychology editor who responded to our survey already required authors to submit statcheck reports with their manuscripts.
Increasing input from quantitative experts before a study begins could be an especially impactful approach to improving the quality of statistical scaffolding. A growing number of psychology journals, such as Nature Human Behaviour and Collabra: Psychology, are adopting the Registered Report format, which involves peer review of study protocols before the study has begun (Chambers, 2013; Hardwicke & Ioannidis, 2018). The contribution of expert statistical reviewers in the early stages of a research project could well be the most effective and efficient use of their time.
Finally, psychologists not only are driving the development of new reform initiatives, but also are conducting empirical investigations to evaluate the effectiveness of these initiatives in order to iteratively improve upon them (e.g., Hardwicke & Ioannidis, 2018; Hardwicke et al., 2018; Kidwell et al., 2016; Nuijten et al., 2017). These exercises in metaresearch (Hardwicke et al., 2019; Ioannidis, Fanelli, Dunne, & Goodman, 2015) should be extended to statistical review. A series of prospectively registered randomized control trials designed to evaluate various models of statistical review would be a valuable tool for gathering evidence relevant to this issue.
Conclusion
In this article, we have advocated that psychology journals consider adopting specialized statistical review to complement regular peer review. We have been partly informed by the results of a survey of psychology-journal editors; however, given the small number of respondents, likelihood of self-selection bias, and reliance on self-report, only tentative inferences can be drawn from these data. Our arguments are mainly based on the apparent benefits of statistical review in the biomedical domain and the documented statistical problems pervading the psychology research literature. We contend that there is sufficient evidence to support pilot testing expert statistical review in psychology journals, with concomitant monitoring and evaluation.
Statistical review will not cure all of psychology’s statistical ailments, just as it is no panacea in biomedicine. The most effective antidote is likely to involve efforts to improve statistical competence among psychology researchers (Aiken et al., 2008), and to promote more open science, which would enable more effective postpublication review. This will require nontrivial reforms in training curricula and normative structures surrounding design, analysis, and inference. If psychology is to break free of problematic statistical rituals (Salsburg, 1985) and make better use of the analysis toolbox (Gigerenzer, 2014, 2018), it will require an infusion of fresh thinking from well-trained quantitative experts at all stages of the teaching, research, funding, and publication pipeline.
Supplemental Material
Hardwicke_Open_Practices_Disclosure_Rev – Supplemental material for Should Psychology Journals Adopt Specialized Statistical Review?
Supplemental material, Hardwicke_Open_Practices_Disclosure_Rev for Should Psychology Journals Adopt Specialized Statistical Review? by Tom E. Hardwicke, Michael C. Frank, Simine Vazire and Steven N. Goodman in Advances in Methods and Practices in Psychological Science
Supplemental Material
Hardwicke_Rev_Supplemental_Material – Supplemental material for Should Psychology Journals Adopt Specialized Statistical Review?
Supplemental material, Hardwicke_Rev_Supplemental_Material for Should Psychology Journals Adopt Specialized Statistical Review? by Tom E. Hardwicke, Michael C. Frank, Simine Vazire and Steven N. Goodman in Advances in Methods and Practices in Psychological Science
Footnotes
Acknowledgements
We thank Lisa Ann Yu for assistance collecting journals’ contact details and Daniele Fanelli for discussions about the survey design. We are grateful to all the respondents for taking the time to complete the survey.
Action Editor
D. Stephen Lindsay served as action editor for this article.
Author Contributions
T. E. Hardwicke and S. N. Goodman designed the survey. T. E. Hardwicke conducted the survey and analyzed the survey data. T. E. Hardwicke, M. C. Frank, S. Vazire, and S. N. Goodman wrote the manuscript.
Declaration of Conflicting Interests
The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this article.
Funding
This work was enabled in part by a general support grant to the Meta-Research Innovation Center at Stanford (METRICS) from the John and Laura Arnold Foundation. The Meta-Research Innovation Center Berlin (METRIC-B) is supported by a grant from the Einstein Foundation and Stiftung Charité.
Open Practices
Open Materials: https://osf.io/tmah8/files/, ![]()
Preregistration: no
All data and materials, including analysis scripts, have been made publicly available via the Open Science Framework and can be accessed at https://osf.io/nquws/files/, https://osf.io/tmah8/files/, and https://osf.io/4zurk/files/, respectively. The complete Open Practices Disclosure for this article can be found at http://journals.sagepub.com/doi/suppl/10.1177/2515245919858428. This article has received badges for Open Data and Open Materials. More information about the Open Practices badges can be found at ![]()
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
