Abstract
More and more psychological researchers have come to appreciate the perils of common but poorly justified research practices and are rethinking commonly held standards for evaluating research. As this methodological reform expresses itself in psychological research, peer reviewers of such work must also adapt their practices to remain relevant. Reviewers of journal submissions wield considerable power to promote methodological reform, and thereby contribute to the advancement of a more robust psychological literature. We describe concrete practices that reviewers can use to encourage transparency, intellectual humility, and more valid assessments of the methods and statistics reported in articles.
Psychological science is undergoing a “renaissance” (Nelson, Simmons, & Simonsohn, 2018) or “credibility revolution” (Vazire, 2018) in understanding of statistical inference, in standards for methodological rigor, and in expectations of what should be reported in scientific communications. These developments have come with a realization that previous standard practices, most notably the focus on multiple conceptual replications in a single research article, were not enough to ensure replicable and robust science. There is a growing call to raise the field’s standards (Vazire, 2018), and this in turn will require access to more details of studies’ methods, analyses, and data than was previously typically provided—information that is still often omitted from reports.
Our aim in this article is to provide recommendations for reviewers to promote transparency, statistical rigor, and intellectual humility in research publications. Well-informed peer reviewers help journal editors make better decisions not only about whether a piece of research should be published, but also about how the work is reported if it is published. Reviewers can influence reporting practices by requesting the transparency necessary for all readers to assess the quality of the evidence and the validity of conclusions (Morey et al., 2016; Vazire, 2017). Our advice applies particularly to quantitative research in psychology, but is also relevant to research in other fields of science, especially those that use inferential statistics.
This article grew out of a workshop, “How to Promote Transparency and Replicability as a Reviewer,” at the 2017 meeting of the Society for the Improvement of Psychological Science. Workshop participants (including this article’s authors) read existing advice on reviewing provided for the occasion by 22 journal editors (see Lindsay, 2017, and Lindsay, Giner-Sorolla, & Sun, 2017), Roediger’s (2007) “Twelve Tips for Reviewers,” a chapter on reviewing by Tesser and Martin (2006), and an excerpt from Commitment to Research Transparency (Schönbrodt, Maier, Heene, & Zehetleitner, 2015). Workshop members then put together a set of new recommendations aimed at promoting transparency and replicability. In this article, we first explain some of the issues underlying our advice and then present our recommendations.
The New Approach to Statistical Inference and Reporting
Most empirical reports in psychology use null-hypothesis significance testing (NHST) as a metric of evidence. In NHST, inferential analyses such as
NHST is accurate only in confirmatory research, in which the hypotheses to be tested and the method of testing are specified
This sort of flexible, post hoc approach to NHST has been common practice in many areas of psychology (John, Loewenstein, & Prelec, 2012). Unfortunately, these practices make
It is good and proper for researchers to conduct exploratory research as well as hypothesis-testing research. Poking around in one’s data, speculating about unexpected patterns, is a great way to generate ideas. For conducting such exploratory analyses, confidence intervals and estimates of effect size are useful tools (e.g., McIntosh, 2017). But NHST
Vazire (2017) drew an analogy between readers of science articles and used-car shoppers: Transparent reporting puts readers in a better position to tell the difference between “lemons” and trustworthy findings. One powerful tool for promoting such transparency is preregistering the research plan (see Lindsay, Simons, & Lilienfeld, 2016; van ’t Veer & Giner-Sorolla, 2016). Preregistration makes clear which aspects of a study and its analyses were planned in advance of data collection. Openly sharing data and materials (e.g., tests, stimuli, programs), and explicitly declaring that methodological details have been completely reported (e.g., Simmons et al.’s, 2012, “21 word solution”), can also help readers to assess the evidence value of an empirical report.
To allow for correction of mistakes in reporting and for exploration of alternative analyses and explanations, transparency requires that researchers make their raw data available to other researchers, along with codebooks and analysis scripts. Despite protocols requiring such sharing for verification (e.g., Section 8.14 of the American Psychological Association’s, 2017, ethical principles), the availability of data has often been poor (e.g., Wicherts, Borsboom, Kats, & Molenaar, 2006). Finally, authors can also advance transparency by providing more comprehensive descriptive statistics, such as data graphs that show the distribution of scores. Making defensible claims in research reports also entails intellectual humility about the limitations of one’s own perspective and findings (Samuelson et al., 2015). Scientific claims require a realistic perspective on the generalizability of one’s own research and views. In moving from a standard that prioritizes novelty to one that emphasizes robustness of evidence, claims about the importance of any one study or series of studies should be limited, and replications should be encouraged. Researchers should also strive to be aware of the assumptions they bring to conducting and evaluating research—for example, their ideas about what constitutes a “standard” or “unusual” sample (see Henrich, Heine, & Norenzayan, 2010) or their preconceptions about research that has political implications (Duarte et al., 2015).
Over the past decade, some journals in psychology and other fields have adopted more open reporting requirements, such as those outlined in the Transparency and Openness Promotion (TOP) guidelines (Center for Open Science, n.d.-b; Nosek et al., 2015). More than 5,000 journals and organizations have become signatories of the TOP guidelines, and more than 850 journals have implemented the standards. However, many journals have not changed their policies, and editors and reviewers vary in implementing these reforms. Our aim with the following recommendations is to provide concrete guidelines showing how you, as a peer reviewer of empirical research articles, can encourage transparency, statistical rigor, and intellectual humility. We organize these guidelines, roughly, in the order they will come up as you deal with a review. Appendix A gives a slightly reorganized outline of our advice that can be used as a checklist during the review process.
Preparing to Review: Know Your Stuff
To understand and communicate criticism of research you review, you need to have a solid grasp of the key statistical issues. Appendix B lists selected educational resources, and we discuss some of these issues in the next section. Although specific statistics applications vary across fields, you should sharpen your understanding of the following concepts that often are forgotten after postgraduate statistical training:
The logic of NHST: If you understand why the
The need for a priori specification of hypothesis tests: In addition, it is important to know about methods used to control selective reporting, such as preregistering experiments; reporting all analyses, including those that might be labeled as exploratory and post hoc; providing methodological disclosure statements (Simmons et al., 2012); and openly sharing materials.
Assumptions underlying frequently used statistical tests in your research area: In particular, it is important to know when a given test is not robust to violations.
One source of inspiration is the American Psychological Association’s Journal Article Reporting Standards (JARS; Appelbaum et al., 2018). These guidelines list desirable features for reporting in all types of research articles, including those involving qualitative, meta-analytic, and mixed methods. Using JARS as a checklist, you can look for the methodological and statistical considerations that are particularly important to report in your area of research and carry out further reading to ensure that you understand their rationales.
Reading and Evaluating the Manuscript
Evaluate statistical logic and reporting
You might think that all editors of scientific journals in psychology are statistically savvy, but you would be wrong. Unfortunately, it is possible to become an eminent scholar and gatekeeper in psychology while keeping one’s statistical knowledge focused on the skills that help get articles published, rather than on best statistical practices. Even if journals espouse improved statistical standards or refer authors to general guidelines, such as those in the
Of course, editors and authors may privilege other goals, such as manuscript readability or word-count limits, above full statistical reporting. Your suggestions for increasing the amount of reporting should take into account what is possible at the journal, as specified in its submission guidelines (sometimes known as “Guide for Authors” or “Instructions for Authors”), which should be available on the journal’s Web site. Limitations caused by restricted word counts, for example, can be overcome by adding details in supplementary online materials (which many journals now offer) or on public repositories, such as the Open Science Framework (http://osf.io).
Beyond the journal’s standards, the issues you look for will depend on your own knowledge and preparation. Here are several frequently encountered issues:
Many psychology studies cannot obtain precise results because their sample sizes are not sufficient to provide accuracy in parameter estimation (AIPE; Maxwell, Kelley, & Rausch, 2008; see also Cumming, 2014). That fact has been known for decades (Cohen, 1962), but only recently has awareness of it become widespread. Accuracy allows inference to go beyond a merely directional finding, allowing comparison of the observed effect size with effect sizes for other known influences on the outcome and evaluation of the finding as a potential basis for real-world applications. Precision for planning (Cumming & Calin-Jageman, 2017), AIPE, and statistical power analysis can all help readers judge the sensitivity of methods, which has implications for interpreting both positive and null results. Rather than criticizing a study on the basis of your idea of what a “low
Effect sizes, and related statistics such as confidence intervals, are important adjuncts to significance tests that help readers interpret data more fully, especially when samples are unusually large or small (Cumming, 2014; Howell, 2010). Even if effect sizes are reported in Results sections, check to see that the discussion of results takes into account their magnitude and precision, and that conclusions are not based only on
Power analysis tests the likelihood of rejecting the null hypothesis if the alternative hypothesis is true, and journals are increasingly requiring that such analyses be reported. Not all power analyses are equal, though. Post hoc power analyses, for instance, are uninformative, being merely a function of the
Descriptive statistics, such as cell
Basic statistical errors are surprisingly common in published research (Nuijten, Hartgerink, van Assen, Epskamp, & Wicherts, 2016). Being roughly familiar with the formulas for degrees of freedom in commonly used statistical tests (e.g., Howell, 2010) can help you detect discrepancies between reported subject numbers and the actual numbers tested. There are also tools for checking whether the figures after the decimal point in a reported mean are impossible to obtain given the reported
Assess any preregistrations
To increase the appearance of confidence in results, it has been a common practice in psychology to report the outcome of exploratory analyses as though they had been planned a priori (John et al., 2012). Preregistration involves posting a time-stamped record of method and analysis plans online prior to data collection. It is intended to make analytic flexibility transparent, helping reviewers better evaluate the research. A common misconception is that a preregistration is meant to restrict the analyses that are performed; actually, preregistration does allow additional post hoc analyses, but the purpose of preregistration is to make sure that post hoc analyses are clearly labeled as such (e.g., van ’t Veer & Giner-Sorolla, 2016).
If a preregistered plan for the research is available, it is important to assess the level of completeness and detail in that plan compared with the procedures reported in the article. Some “preregistrations” are so brief and vague that they do little to identify when post hoc liberties have been taken, providing only the illusion of transparency. Norms for assessing the quality of preregistrations are still in development (for one protocol, see Veldkamp, 2017). If researchers deviated substantially from their preregistered analyses, even for good reasons (e.g., the data failed to meet assumptions of the proposed test), you can ask them to also report the outcome of the preregistered analyses (e.g., in an appendix) for full transparency.
If the research under review was not preregistered, it may be difficult to tell which analyses were planned in advance and which were data dependent, but some clues may lead you to suspect post hoc analysis. For example, data exclusion rules or transformations might be reported only in the Results sections and without any explicit rationale, or may vary from one study to the next without justification. The concern here is that the researchers may have (not necessarily intentionally) made analytic decisions to produce significant results that would not be replicated if alternative reasonable analytic specifications were used or if a new data set were analyzed. That does not mean that those results have no value, but they should be viewed with skepticism pending direct replication.
You can ask researchers to address concerns about post hoc flexibility in your review. The strongest reassurance would come from a direct, preregistered replication. However, you can also ask the authors to indicate which analyses, if any, were exploratory or to adopt a more stringent standard for statistical significance (e.g.,
Check data and materials
If the authors submitted data, materials, or analysis code as part of the review process, or if they provided a link to a preregistration document detailing their data-collection and analysis plans, you should determine whether these resources are in a usable form. If the data and materials are not available or usable, let the editor know and ask if there is a way to obtain them. When they are available, we encourage you to examine them for completeness and accuracy. Variables in the data set should clearly correspond to the variables reported in the text. Materials should allow a third party to rerun the study, and should map clearly onto the conditions, variables, and reporting. Running analyses with available data is usually beyond the call of a reviewer’s duty, but might be worth the effort if it is helpful for checking apparent errors or identifying strong alternatives to the authors’ conclusions.
Go beyond “p < .05 per study”
For a long time, in many areas of psychology, reviewers judged whether a study supported a hypothesis by whether its key test was significant at
The distribution of
So, be wary of multiple studies, each with the key
Inzlicht (2015) gave an account of a lab that was encouraged to report all studies it had run to test a hypothesis, instead of just the significant ones, precisely because a manuscript it had submitted showed a pattern of
Reviewers should also place less emphasis on the
Aggregate evidence, however, becomes unreliable if only significant studies are reported. To mitigate publication bias, you can ask for an internal meta-analysis of all relevant studies conducted by the research team, which may include studies that were not included in the original report. But, by the same token, you should have realistic expectations about what a fully reported set of tests of a true hypothesis looks like (Lakens & Etz, 2017). Even if the proposition is strongly supported, this set can sometimes include nonsignificant results here and there.
Also, these considerations should not stop you from recommending publication of methodologically strong single-study manuscripts. One high-powered study can be more informative than several underpowered studies (Schimmack, 2012).
Evaluate measurement validity
Reviewers should make sure that the constructs discussed in a manuscript were indeed the constructs that were measured in the project. Ideally, an assessment should be sensitive to the differences that the researchers intended to measure (Borsboom & Mellenbergh, 2004). The interpretation of findings based on improperly validated measures can be meaningless at worst, and is suspect at best. Accessible discussions of these issues can be found in Flake, Pek, and Hehman (2017) and Fried and Flake (2018). Questions relevant to the validity of measures include the following:
Have the authors reported scale reliabilities computed from their data? Indicators of internal consistency, such as Cronbach’s alpha, are important to include but are commonly misreported as indicators of validity (Flake et al., 2017). In particular, a high alpha does not speak clearly to whether constituent items represent a single dimension or multiple dimensions. Factor analysis is needed to assess whether item intercorrelations match the intended structure, one aspect of valid measurement.
Did the authors use previously validated measures? Check for reporting of, or references to, validation studies of the measures, including tests for construct, convergent, and divergent validity.
Did the authors use measures as originally developed and validated, or have they modified the original scales? Are any modifications well justified and fully reported? Modifying scales without reporting the full details can complicate replication studies, and making modifications without assessing the validity of the resulting scales can lead to uncertainty in measurement.
Did the authors report findings based on single-item measures? Single-item measures may not adequately capture the intended constructs. They require special consideration and validation (see Flake et al., 2017).
If you find that answers to any of these questions are unclear, it is important to request the missing information in your review. Authors should be encouraged to address weaknesses in measurement validity in the Discussion section of their manuscript, where they can describe specifically how uncertainty in the measures used may affect the interpretation of the results and the generalizability of the study.
Evaluate sensitivity as well as validity
Measurement concerns are part of a larger issue that is becoming more important with increased understanding of methodology: sensitivity. Traditionally, psychology reviewers have been keen to point out alternative explanations for a significant, or
In contrast, psychology reviewers are often less attuned to problems that might compromise the interpretation of nonsignificant findings, such as small sample size, weak manipulations, poor measurement reliability, restricted range, and ceiling or floor effects. Such flaws can reduce a method’s
Low sensitivity raises the likelihood that a significant result is a false positive, especially when the finding is unlikely (Ioannidis, 2005, 2008; Zöllner & Pritchard, 2007). For example, if a finding is only 10% likely to be true and statistical power is low (50%), then 47% of
Low-sensitivity methodology also sets a bad example. A lab that uses it is more likely than other labs to waste their effort on a false-negative finding, and their findings are less likely to be replicated. And in a climate of low-sensitivity methodology, selective reporting can be justified more readily. If a study did not work, it is easy to say that the methods must have been bad, rather than to take the results as evidence against the hypothesis (LeBel & Peters, 2011). Finally, many inferential statistical tests lose their robustness to violations of data assumptions when sensitivity is low (e.g., because of a small sample size).
In experimental research, a particularly relevant sensitivity issue is manipulation validity. It is common for researchers to take a shortcut and assume that an effect of an independent variable on a dependent variable is sufficient proof that a manipulation is valid. But this assumption conflates the effect being tested (does change in the independent variable relates to change in the dependent variable?) with the validity of the manipulation (does the manipulation effectively change the independent variable?). Especially when results are null, either in original research or in a subsequent replication, showing that the manipulation is valid in the sampled population can help rule out manipulation failure as a prosaic explanation.
Ideally, a manipulation will be validated on a criterion variable that directly measures the independent variable. For example, if thoughts about power are being manipulated to be more accessible, then power words in a decision task should be responded to more quickly in the experimental than in the control condition. This test of the manipulation might be done in the same study that tests the main hypothesis, as a manipulation check. If there are concerns about subjects’ awareness of the manipulation, though, the testing can be done on a separate sample (Kidd, 1976). Although manipulation checks have previously been criticized as unnecessary (Sigall & Mills, 1998), such critiques were based on their inability to shed light on positive results. With an increased emphasis on publishing and evaluating null results, testing manipulations has become more important.
Know how to evaluate null claims
Nonsignificant
The general problem of drawing misguided inferences from nonsignificant
Moreover, the difference between significant and nonsignificant is often itself not statistically significant (Gelman & Stern, 2006; Nieuwenhuis, Forstmann, & Wagenmakers, 2011). Be especially wary if authors interpret a significant effect in one condition or experiment versus a nonsignificant effect in another as informative without reporting a test of the interaction between condition or experiment and effect. Similarly, when one correlation or regression coefficient is significant, another is not, and the authors claim that the first coefficient is significantly larger than the second, you can ask for appropriate statistical comparisons to support this claim (Clogg, Petkova, & Haritou, 1995; Steiger, 1980). These nonexhaustive examples illustrate the need for reviewers to be vigilant about appropriate interpretations of nonsignificant results.
Assess constraints on generality
Researchers have always been expected to describe limitations of their research in the Discussion section, but such statements are often pallid, incomplete, and drowned out by louder claims of the importance of the findings. Simons, Shoda, and Lindsay (2017) proposed a stronger and more structured
Writing the Review
Address replicability
An important question to ask yourself when reviewing is, “How confident am I that a direct replication of this study would yield a similar pattern of findings?” Replicability is not the only characteristic of good science—the best work is also interesting, informative, and relevant—but it is a fundamental starting point. We recommend that you cite in your reviews specific reasons why you have (or lack) confidence in the replicability of the work. For example, you may cite statistical robustness, open reporting, and methodological sensitivity as reasons for your confidence in the reported findings.
If replicability is in question, you might suggest that the authors be invited to conduct a preregistered direct replication, perhaps with increased statistical power or other improvements, but designed to replicate the same study as exactly as possible. This invitation may include a no-fault clause making it clear that the new study will be evaluated independently of what the results show, as long as the overall case for the hypothesis is presented reasonably. This approach assumes that similar data can be obtained without tremendous burden (e.g., the methods are not intensive, a convenience sample can be used). If not, you can insist that conclusions be calibrated to the strength of the data. Similarly, openly exploratory work may still be worth publishing if the discussion of results and limitations is appropriate, if the findings are theoretically informed and have potential to generate new hypotheses, and if the data and materials are publicly available (e.g., McIntosh, 2017).
Communicate your own limits
When you are not familiar with a methodology or statistical test used in a manuscript, it is important to communicate this to the editor, at the same time recognizing that your perspective on other issues may still be valuable. Acknowledging your limits is part of the practice of intellectual humility, and it helps editors become aware when they do have the expertise they need on board. This may lead them to seek out the opinion of an expert in the topic.
Take the right tone
When we asked 22 editors what they would say to reviewers, the most frequent advice was to keep a constructive, respectful tone (see Lindsay et al., 2017). When reviewing with attention to transparency and replicability, it can be tempting to frame departures from best practices as dishonesty or cheating. Indeed, making accusations can be psychologically rewarding (Hoffman, 2014). Not surprisingly, researchers tend to respond defensively when terms like questionable research practices and
Promote transparency
If the authors of a manuscript have not followed open-science practices that give reviewers access to materials, analysis code, and data, you may include in your review arguments for making such materials available in subsequent revisions. Your arguments may be directed to the editor as much as to the authors. For example, if the journal endorses the American Psychological Association’s ethical standards for publishing, you could ask for a statement of full disclosure of measures, manipulations, and exclusions, because those standards prohibit “omitting troublesome observations from reports to present a more convincing story” (American Psychological Association, 2010, p. 12). To support full disclosure, you could also invoke the American Statistical Association’s guideline that
If the authors did provide data, materials, or analysis code, or if they preregistered their research, report in your review what depth of scrutiny you gave to these additional materials. Note any obstacles or limitations you encountered; for example, you might have been unable to check the analysis code because you are not familiar with the programming language used. It is not necessarily your job to make sure those resources are usable and correct. However, reporting the depth of your own efforts will help the editor fulfill his or her obligation.
Some journals offer special recognition in the form of badges granted to articles that meet criteria for transparent processes (e.g., an open-data badge, a preregistration badge, and an open-materials badge; see Blohowiak et al., 2018). If the journal for which you are reviewing offers such badges, consider mentioning that fact, with the aim of encouraging the authors to share more information and improve the review process. If the authors have already applied for one or more badges, keep in mind that most journals rely on authors’ declarations that the archived documents are adequate. Authors and readers might benefit from your input if you check badge-supporting material for usefulness and completeness.
Think about signing reviews
Finally, you may also consider breaking the usual anonymity of peer review, signing your reviews to promote transparency and openness on your side of the process. There are good arguments for either signing or not signing all reviews (e.g., Peters & Ceci, 1982, and accompanying peer commentary; Tennant et al., 2017). We recommend adopting a general policy about whether you will or will not sign all reviews, taking into consideration your career stage (see the next paragraph). Without a general policy, you may be tempted to associate yourself with only the reviews that make a favorable impression (e.g., positive feedback) while avoiding accountability by not signing reviews that make a less favorable impression (e.g., critical feedback). If you do sign, we recommend that you state explicitly that this is a general policy for you, after giving your name.
Signed reviews can have tangible benefits for authors, providing context for suggestions and a sense of fairness in critique, and they give reviewers exposure, credit, and accountability. But signing also carries risk, especially if you are not yet permanently employed. Some authors may seek retribution if they feel their submissions have been inappropriately criticized. Reviewers with more job security and seniority, however defined, have less to lose by signing. These concerns are also relevant when deciding whether to accept requests to review for journals that have adopted open review practices, such as unblinded review, publication of reviews alongside the final article, or direct interaction between authors and reviewers during the review process (see Ross-Hellauer, 2017; Walker & Rocha da Silva, 2015).
Special Cases
Replication studies
The new approach to methods includes a growing willingness to publish reports on close replications of previous research, which previously might have been rejected because they lacked novelty. Main concerns for reviewers are somewhat different for a replication study than for primary research. You do not need to evaluate the theoretical rationale, and your analysis of methods should focus on how closely the replication followed the original, and whether any changes in method were necessary or justified. Brandt et al. (2014) have provided detailed guidance on what makes a replication strong. In brief, just as in the case of original studies, reviewers should give more credence to replications that were preregistered, had adequate power, used methods shown to be sensitive (e.g., manipulations and measures were validated in the new context), and are reported with detailed method sections, open data, and analysis scripts. Given that most journals will publish replication results even if null, it is especially important to reduce the risk that a failure to replicate was due to insensitive methods.
If the authors bill their study as a close (or
In reviewing replications, you may have to assess claims about the new state of evidence, taking into account both the original and the replication studies. Gelman (2016) suggested using a time-reversal heuristic to assess the evidence in a replication and the original study: If the replication result had been published first, would it have seemed more compelling than the original result? Just as no single study can determine whether an effect exists, neither can any replication. So, do not be too concerned with judging replications as “successful” or “failed.” Instead, think meta-analytically, across the individual studies. Does the replication reinforce or change your beliefs about the effect (or does it do neither)? In any event, it is important to treat positive and negative results in a replication evenhandedly. Although failing to replicate a well-known effect may be more newsworthy than successfully replicating it, both types of evidence need to be reported for science to progress.
Some editors may ask you to judge how important it was to replicate the effect in the first place, just as they would ask you to judge the importance of any novel research. In this case, weigh the strength of existing evidence and the original research’s impact on scholarship and society. If the effect has been closely replicated numerous times, has little theoretical or societal value, or has been largely ignored in the academic literature and press, then the replication may be judged as relatively unimportant (Brandt et al., 2014).
Registered Reports
More and more journals are inviting Registered Reports (RRs; see Center for Open Science, n.d.-a) as a special form of preregistered article. Researchers submit a detailed proposal of a study to a journal for peer review before collecting the data. After data are collected, they submit the complete manuscript reporting results, and the manuscript will be accepted in principle regardless of results if the approved proposal has been followed faithfully. RRs are quite new, but their adoption appears to be increasing rapidly (see Nosek & Lindsay, 2018). Anecdotal reports indicate that reviewers find being involved with RRs gratifying. They can help researchers avoid mistakes in the first place, rather than just pointing out mistakes after they are made.
Peer review of RRs will potentially involve you at two stages. In Stage 1, you will be asked to evaluate the importance and quality of the proposed study prior to data collection. At this stage, evaluate the proposal as you would a normal introduction and Method section, and consider whether the analysis plan makes sense as the complete basis for a Results section. As is true with replications, the possibility of null results means that sensitivity of the methodology is especially important.
After data are collected and analyzed according to the plan, the editor may ask you to assess the report at Stage 2. The manuscript will now have Results and Discussion sections based on the data. At this stage, evaluate whether the research conformed to the plan, whether any changes from the proposal were well justified, and whether other conditions for validity were met (e.g., whether floor and ceiling effects were avoided, the manipulation passed manipulation checks, and the study is accurately and clearly reported). If the answer to these questions is yes, then the manuscript should ultimately be accepted, although revisions might be required to improve readability or to modify the conclusions.
Conclusion
Serving as a peer reviewer provides opportunities to learn about your academic field, to become known and respected (or at least known to and respected by editors), and, most important, to shift norms and shape the future of the field. As best practices in research evolve, so too will best practices in peer review. To contribute to psychology’s renaissance (Nelson et al., 2018) and credibility revolution (Vazire, 2018), peer reviewers should promote the good practices of transparency, validity, robustness, and intellectual humility. We hope that these concrete guidelines can help peer reviewers at all career stages provide more effective reviews, and thereby improve the trustworthiness of the published literature and scientific progress as a whole.
Footnotes
Appendix A: Outline of Advice for Promoting Robustness and Transparency When Reviewing Psychology Manuscripts Reporting Quantitative Empirical Research
Appendix B: Resources on Robustness and Transparency in Psychological Research
This appendix is a list of resources intended to be a useful starting point for reviewers seeking to improve their understanding of the methodological and statistical concepts underlying psychology’s credibility revolution. We recognize that there are many more references and resources available; we do not claim that this list is comprehensive or that the resources included represent the “gold standard” among all possible resources.
Action Editor
Alexa Tullett served as action editor for this article.
Author Contributions
The authors are listed in alphabetical order. All the authors contributed to the generation of the ideas presented in this article and to an initial collaborative draft. Further refinement of this draft proceeded with the input of all the authors, but with D. S. Lindsay and R. Giner-Sorolla doing most of the rewriting.
Declaration of Conflicting Interests
The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this article.
