Abstract
According to research lore, the second peer reviewer (Reviewer 2) is believed to rate research manuscripts more harshly than the other reviewers. The purpose of this study was to empirically investigate this common belief. We measured word count, positive phrases, negative phrases, question marks, and use of the word “please” in 2546 open peer reviews of 796 manuscripts published in the British Medical Journal. There was no difference in the content of peer reviews between Reviewer 2 and other reviewers for word count (630 vs 606, respectively, P = .16), negative phrases (8.7 vs 8.4, P = .29), positive phrases (4.2 vs 4.1, P = .10), question marks (4.8 vs 4.6, P = .26), and uses of “please” (1.0 vs 1.0, P = .86). In this study, Reviewer 2 provided reviews of equal sentiment to other reviewers, suggesting that popular beliefs surrounding Reviewer 2 may be unfounded.
According to research lore, the second peer reviewer (Reviewer 2) is believed to rate research manuscripts more harshly than the other reviewers, yet this has not been empirically investigated.
This is the first empiric analysis comparing Reviewer 2 to other reviewers in biomedical research.
Contrary to popular belief, Reviewer 2 may not rate research manuscripts more harshly.
Introduction
The editorial and peer review process for research manuscripts can be challenging for investigators as they submit their work to journals for publication. Despite being considered a fundamental aspect of the dissemination of scientific research, the peer review process is flawed and susceptible to bias.1,2 For example, a single-blinded peer review process (where reviewers know who the authors are but the authors do not know who the reviewers are) has been associated with preferential publishing of studies from high-profile authors and institutions compared to a double-blinded process, which may make it more difficult for less experienced investigators to publish their work. 3
One aspect of the peer review process that has attracted academic and popular attention is the level of criticism offered by specific reviewers. In particular, some researchers believe that the second peer reviewer of submitted research manuscripts (fondly referred to as “Reviewer 2”) rates the manuscript more harshly than other reviewers, as evidenced by the Facebook group “Reviewer 2 Must Be Stopped!”, which has over 76,000 members as of this writing. Although this appears to be a popular perception, empirical evidence to support or refute this hypothesis is scant, suggesting currently held beliefs about Reviewer 2 are largely based on individual anecdotal experiences that may be subject to confirmation bias based on existing legend in a flawed peer review landscape.
Methods
To test differences between Reviewer 2 and other reviewers, we analyzed 2546 initial open peer reviews of 794 research manuscripts published in the British Medical Journal (BMJ) from 2015 to 2020 that were evaluated by 2 to 5 peer reviewers. We focused on peer reviews from manuscripts’ first decision because not all manuscripts had subsequent peer reviews. In BMJ decision letters, Reviewer 2 is the second reviewer to return comments, irrespective of the order in which reviewers were asked to evaluate a manuscript. For each review, we tallied the word count, negative phrases, positive phrases, question marks, and use of the word “please,” under the assumption that harsher peer reviews would be longer, with more negative phrases, questions, and requests (eg, “please” conduct a given analysis), but fewer positive phrases. We manually classified phrases as negative or positive based on a blinded assessment of an automatically generated list of commonly used phrases in the sample (Supplementary Table 1). To test differences between Reviewer 2 and other reviewers, we estimated a separate linear regression for each of the outcomes (ie, 5 separate regression models with outcomes including: number of words, negative phrases, positive phrases, question marks, and word “please”) with a binary variable for Reviewer 2 (key independent variable). We first estimated an unadjusted review-level model, which simply compared means of each of the above outcomes between reviews performed by Reviewer 2 vs other reviewers. Next, we estimated an adjusted model accounting for manuscript-level fixed effects (primary analysis), making the adjusted model a within-manuscript analysis. By including manuscript-level fixed effects into each outcome regression, this approach effectively compared the sentiment of reviews performed by Reviewer 2 with the sentiment of reviews performed by other reviewers for the same manuscript. After estimation of each adjusted model, we calculated the adjusted mean outcomes (eg, adjusted number of words in reviewer reports by Reviewer 2 vs other reviewers) using the marginal standardization form of predictive margins. 4
Results
The average peer review was 612 words, used 8 negative phrases, 4 positive phrases, 5 question marks, and the word “please” once. Of 794 articles, 249 (31.4%) had 2 reviewers, 244 (30.7%) had 3 reviewers, 189 (23.8%) had 4 reviewers, and 112 (14.1%) had 5 reviewers.
For each of the 5 outcomes, there were no significant differences between Reviewer 2 and other reviewers both before and after regression adjustment (Figure 1; Supplementary Table 2). The adjusted word count for Reviewer 2 and other reviewers was 630 and 606, respectively (P = .16). The adjusted number of negative phrases was 8.7 for Reviewer 2 and 8.4 for other reviewers (P = .29), and the adjusted number of positive phrases was 4.2 for Reviewer 2 and 4.1 for others (P = .10). The adjusted number of question marks was 4.8 for Reviewer 2 and 4.6 for other reviewers (P = .26). Finally, the adjusted number of instances the word “please” was used was 1.0 for both Reviewer 2 and other reviewers (P = .80).

Comparison of reviews performed by Reviewer 2 and other reviewers. Note: Adjusted averages with error bars representing 95% confidence intervals.
Discussion
In a text analysis of open peer reviews of published medical research manuscripts, we found that contrary to common belief, there was no difference in sentiment in reviews by Reviewer 2 compared to those of other reviewers. These findings are consistent with a study in political science that was focused on reviewer recommendations and not text analysis. 5 Our study was limited by the consideration of accepted manuscripts at a single journal, which may result in greater concordance across reviews, and at a journal with an open review policy, representing a small fraction of articles submitted to scientific journals. 6 However, open review policies have been shown not to affect review quality and publication recommendation, suggesting that our results may hold even when reviews are not public.7,8 In the BMJ, Reviewer 2 is simply the second reviewer to return an evaluation; in other journals, this may not be the case, limiting generalizability of this study to journals who use a similar review process. Our study was also limited by a manual determination of key words or phrases that might suggest a review was negative. Another approach would have been to manually characterize all reviews as being positive or negative or alternatively, classify a subset of reviews manually and then train a machine learning based algorithm to predict review sentiment. Finally, the origin of the Reviewer 2 lore is also unclear and may simply reflect a general frustration of authors with the peer review process rather than an issue with any specific reviewer. Reviewer 2, unfortunately, seems to have received the brunt of this frustration, though our findings suggest that is unwarranted.
Supplemental Material
sj-docx-1-inq-10.1177_00469580221090393 – Supplemental material for An Empirical Assessment of Reviewer 2
Supplemental material, sj-docx-1-inq-10.1177_00469580221090393 for An Empirical Assessment of Reviewer 2 by Christopher Worsham, Jaemin Woo, André Zimerman, Charles F. Bray and Anupam B. Jena in INQUIRY: The Journal of Health Care Organization, Provision, and Financing
Footnotes
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Dr. Jena reports receiving (in the last 36 months): consulting fees unrelated to this work from Bioverativ, Merck/Sharp/Dohme, Janssen, Edwards Life Sciences, Novartis, Amgen, Eisai, Otsuka Pharmaceuticals, Vertex Pharmaceuticals, Celgene, Sanofi Aventis, Precision Health Economics, and Analysis Group; income unrelated to this work from hosting the podcast Freakonomics, M.D., and from book rights from Doubleday Books. Dr Worsham reports receiving (in the last 36 months): consulting fees unrelated to this work from Analysis Group, FVC Health, Chronius, NuvoAir, and Alosa Health; income unrelated to this work from book rights from Doubleday Books.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
