| Biomedical clinical research |
Moderate |
Studies with significant or positive results were more likely to be published than those with non-significant or negative results, thereby confirming findings from a previous HTA (Health Technology Assessment) report
. There was convincing evidence that outcome reporting bias exists and has an impact on the pooled summary in systematic reviews. |
Song et al.
14
|
| Psychology |
Moderately high |
The extreme view of the “file drawer problem” is that journals are filled with only about the 5% of the studies that show type-I errors, while
the file drawers are filled with the 95% of the studies that show non-significant results
. Quantitative procedures for computing the tolerance for filed and future null results are reported and illustrated, and the implications are discussed. |
Rosenthal
15
|
| Human genetics |
Moderately high |
Here, we have evaluated by meta-analysis 370 studies addressing 36 genetic associations for various outcomes of disease. We show that significant between-study heterogeneity (diversity) is frequent, and that the results of the first study correlate only modestly with subsequent research on the same association. The first study often suggests a stronger genetic effect than is found by subsequent studies. Both bias and genuine population diversity might explain why early association studies tend to overestimate the disease protection or predisposition conferred by a genetic polymorphism. We conclude that a systematic meta-analytic approach may assist in estimating population-wide effects of genetic risk factors in human disease. |
Ioannidis et al.
16
|
| Human genetics |
Moderate |
Maximal between-study variances were more likely to be recorded early in the 44 eligible meta-analyses of genetic associations than in the 37 meta-analyses of health-care interventions (p = 0.013). At the time of the first heterogeneity assessment, the most favorable-ever result in support of a specific association was more likely to appear than the least favorable-ever result (22 vs. 10, p = 0.017); the opposite was seen at the second heterogeneity assessment (15 vs. 5, p = 0.031). Such a sequence of extreme opposite results was not seen in the clinical trials meta-analyses. The estimated between-study variance decreased over time in genetic association studies (p = 0.010), but not in clinical trials (p = 0.30). In contrast to prospective trials, a rapid early sequence of extreme, opposite results is frequent in retrospective hypothesis-generating molecular research. |
Ioannidis and Trikalinos
17
|
| Psychology |
Minimal |
Some scientists attribute the “decline effect” to statistical self-correction of initially exaggerated outcomes, also known as “regression to the mean.”
No one can be sure of this interpretation, or even test it, because we do not generally have access to ‘negative results’: experimental outcomes that were not noteworthy or consistent enough to pass peer review and be published
. |
Schooler
18
|
| Genetic epidemiology |
Moderate |
Newly discovered true (non-null) associations often have inflated effects compared with the true effect sizes. The main reasons for this inflation are as follows: First, theoretical considerations prove that when true discovery is claimed based on crossing a threshold of statistical significance and the discovery study is underpowered, the observed effects are expected to be inflated. Second, flexible analyses coupled with selective reporting may inflate the published discovered effects
. Third, effects may be inflated at the stage of interpretation due to diverse conflicts of interest. Fourth, discovered effects are not always inflated, and under some circumstances, may be deflated, for example, in the setting of late discovery of associations in sequentially accumulated overpowered evidence, in some types of misclassification from measurement error, and in conflicts causing reverse biases. |
Ioannidis
19
|
| Biomedical clinical research |
Moderately high |
One hundred two trials with 122 published journal articles and 3736 outcomes were identified. Overall, 50% of efficacy and 65% of harm outcomes per trial were incompletely reported.
Statistically significant outcomes had a higher odd of being fully reported compared with non-significant outcomes for both efficacy (pooled odds ratio, 2.4; 95% confidence interval [CI], 1.4–4.0) and harm (pooled odds ratio, 4.7; 95% CI, 1.8–12.0) data
. In comparing published articles with protocols, 62% of trials had at least 1 primary outcome that was changed, introduced, or omitted. Eighty-six percent of survey responders (42/49) denied the existence of unreported outcomes despite clear evidence to the contrary. The reporting of trial outcomes is not only frequently incomplete but also biased and inconsistent with protocols. Published articles, as well as reviews that incorporate them, may therefore be unreliable and overestimate the benefits of an intervention. To ensure transparency, planned trials should be registered, and protocols should be made publicly available prior to trial completion. |
Chan et al.
20
|
| Biomedical clinical research |
Moderately high |
Forty-eight trials were identified with 68 publications and 1402 outcomes. The median number of participants per trial was 299, and 44% of the trials were published in general medical journals. A median of 31% (10th–90th percentile range 5–67%) of outcomes measured to assess the efficacy of an intervention (efficacy outcomes) and 59% (0–100%) of those measured to assess the harm of an intervention (harm outcomes) per trial were incompletely reported.
Statistically significant efficacy outcomes had a higher odd than non-significant efficacy outcomes of being fully reported (odds ratio 2.7; 95% confidence interval 1.5–5.0).
Primary outcomes differed between protocols and publications for 40% of the trials. Selective reporting of outcomes frequently occurs in publications of high-quality government-funded trials.
|
Chan et al.
21
|
| Biomedical clinical research |
Moderately high |
Results of 519 trials with 553 publications and 10,557 outcomes were identified. Survey responders (response rate 69%) provided information on unreported outcomes but were often unreliable—for 32% of those who denied the existence of such outcomes there was evidence to the contrary in their publications. On average, over 20% of the outcomes measured in a parallel group trial were incompletely reported. Within a trial, such outcomes had a higher odds of being statistically non-significant compared with fully reported outcomes (odds ratio 2.0 (95% confidence interval 1.6 to 2.7) for efficacy outcomes; 1.9 (1.1 to 3.5) for harm outcomes).
The most commonly reported reasons for omitting efficacy outcomes.
Incomplete reporting of outcomes within published articles of randomized trials is common and is associated with statistical non-significance. The medical literature therefore represents a selective and biased subset of study outcomes
, and trial protocols should be made publicly available. |
Chan and Altman
22
|
| Medical/pharmacological researchers |
Moderate |
The frequency with which scientists fabricate and falsify data or commit other forms of scientific misconduct is a matter of controversy. Many surveys have asked scientists directly whether they have committed or know of a colleague who committed research misconduct, but their results appeared difficult to compare and synthesize. This is the first meta-analysis of these surveys. To standardize outcomes, the number of respondents who recalled at least one incident of misconduct was calculated for each question, and the analysis was limited to behaviors that distort scientific knowledge: fabrication, falsification, “cooking” of data, and so on. Meta-regression showed self-reports surveys, surveys using the words “falsification” or “fabrication,” and mailed surveys yielded lower percentages of misconduct. When these factors were controlled for, misconduct was reported more frequently by medical/pharmacological researchers than others. Considering that these surveys ask sensitive questions and have other limitations, it appears likely that this is a conservative estimate of the true prevalence of scientific misconduct. |
Fanelli
23
|
| Psychology |
Moderately high |
Cases of clear scientific misconduct have received significant media attention recently, but less flagrantly questionable research practices may be more prevalent and, ultimately, more damaging to the academic enterprise. Using an anonymous elicitation format supplemented by incentives for honest reporting, we surveyed over 2000 psychologists about their involvement in questionable research practices. The impact of truth-telling incentives on self-admissions of questionable research practices was positive, and this impact was greater for practices that respondents judged to be less defensible. Combining three different estimation methods, we found that the percentage of respondents who have engaged in questionable practices was surprisingly high. This finding suggests that some questionable practices may constitute the prevailing research norm.
|
John et al.
24
|
| Biomedical and life-science research |
Moderate |
A detailed review of all 2047 biomedical and life-science research articles indexed by PubMed as retracted on May 3, 2012 revealed that only 21.3% of retractions were attributable to error. In contrast, 67.4% of retractions were attributable to misconduct, including fraud or suspected fraud (43.4%), duplicate publication (14.2%), and plagiarism (9.8%). Incomplete, uninformative or misleading retraction announcements have led to a previous underestimation of the role of fraud in the ongoing retraction epidemic. The percentage of scientific articles retracted because of fraud has increased almost 10-fold since 1975. Retractions exhibit distinctive temporal and geographic patterns that may reveal underlying causes
|
Fang et al.
25
|
| Biomedical clinical research |
Moderate |
There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias.
|
Ioannidis
26
|
| Psychology |
Moderate |
Humans make decisions with uncertainty. Evaluations (scientific and non-scientific) are subject to unconscious heuristics and biases. The author shows that our decisions about how to design experiments and how to interpret the results are subject to bias, sometimes with serious results. For the person who believes that professional scientists can somehow easily “control” bias away, Rosenthal presents an important counterargument. The problems documented by the author have been taught previously in many disciplines; however, these teachings are not studied today. These teachings may have gone out of fashion, but most certainly the problems not gone away. |
Rosenthal
27
|
| Animal behavior research |
Minimal |
Authors reviewed several hundred published articles from 1970 to 2010 in five leading animal behavior journals and found that these two methods for minimizing or eliminating bias were rarely reported (<10% of articles reviewed). In contrast, a journal focusing on human infant behavior research was far more rigorous in incorporating methods to avoid bias (>80% of articles reviewed). The lack of reporting attempts to minimize bias in animal behavior studies suggests that, at best, many researchers view blind analyses of data or inter-rater reliability as unimportant components of research or, if carried out, unnecessary to report in a manuscript. At worst, the lack of reporting attempts to minimize bias suggests that some published behavioral research may be unreliable. We are aware of constraints imposed by fieldwork and data collecting issues that make blind data comparisons or inter-rater reliability assessments sometimes difficult or unfeasible. However, given that research ethicists often emphasize the fundamental importance of trust and transparency in science, we urge authors, reviewers, and editors of manuscripts to ensure that at least one of these two methods of reducing and reporting observer bias occurs. |
Burghardt et al.
28
|
| Physical sciences, biological sciences, social sciences |
Moderate (depending on the science tested) |
Fanelli analyzed 2434 articles published in all disciplines and that declared to have tested a hypothesis. It was determined how many articles reported a “positive” (full or partial) or “negative” support for the tested hypothesis. If the hierarchy hypothesis is correct, then researchers in “softer” sciences should have fewer constraints to their conscious and unconscious biases, and therefore report more positive outcomes. Results confirmed the predictions at all levels considered: discipline, domain and methodology broadly defined. Controlling for observed differences between pure and applied disciplines, and between articles testing one or several hypotheses, the odds of reporting a positive result were around five times higher among articles in the disciplines of psychology and psychiatry and economics and business compared to space science, 2.3 times higher in the domain of social sciences compared to the physical sciences, and 3.4 times higher in studies applying behavioral and social methodologies on people compared to physical and chemical studies on nonbiological material. In all comparisons, biological studies had intermediate values. These results suggest that the nature of hypotheses tested and the logical and methodological rigor employed to test them vary systematically across disciplines and fields, depending on the complexity of the subject matter and possibly other factors (e.g. a field’s level of historical and/or intellectual development). On the other hand, these results support the scientific status of the social sciences against claims that they are completely subjective, by showing that, when they adopt a scientific approach to discovery, they differ from the natural sciences only by a matter of degree. |
Fanelli
29
|
| Crystallography |
Minimal |
There has been a number of high-profile academic fraud cases in China recently underscoring the problems of an academic-evaluation system that places disproportionate emphasis on publications. Chinese universities often award cash prizes, housing benefits or other perks on the basis of high-profile publications, and the pressure to publish seems to be growing. The journal Acta Crystallographica Section E has retracted 70 published crystal structures that they allege are fabrications by researchers at Jinggangshan University in Jiangxi province. Further retractions, the editors say, are likely.
|
Qiu
30
|
| Physical sciences, biological sciences, social sciences |
Minimal |
How does publication pressure in modern-day universities affect the intrinsic and extrinsic rewards in science? Using a worldwide survey among demographers in developed and developing countries, the authors show that the large majority perceive the publication pressure as high, but more so in Anglo-Saxon countries and to a lesser extent in Western Europe. However, scholars see both the pros (upward mobility) and cons (excessive publication and uncitedness, neglect of policy issues, etc.) of the so-called publish-or-perish culture. By measuring behavior in terms of reading and publishing, and perceived extrinsic rewards and stated intrinsic rewards of practicing science, it turns out that publication pressure negatively affects the orientation of demographers towards policy and knowledge sharing. There are no signs that the pressure affects reading and publishing outside the core discipline. |
van Dalen and Henkens
31
|
| Genetics |
Moderate |
There is increasing concern that the genetic literature may be distorted by various biases, such as publication bias, which may lead to a misleading impression of the strength of evidence for a putative gene–disease association
. Meta-analysis is one means by which a more accurate estimate of the strength of evidence for such association may be obtained, as well as offering a means by which potential biases may be identified. The authors present evidence that the location where a study is conducted is associated with the degree to which it represents an over-estimate of the true effect size, as subsequently estimated using meta-analytical techniques.
The results indicate that studies published in North America may represent a relative over-estimate of the true effect size, compared to those published in Europe or elsewhere.
|
Munafò et al.
32
|
| Industrial relations research |
Minimal |
This article develops and applies several meta-analytic techniques to investigate the presence of publication bias in industrial relations research, specifically in the union-productivity effects literature.
Publication bias arises when statistically insignificant results are suppressed or when results satisfying prior expectations are given preference. Like most fields, research in industrial relations is vulnerable to publication bias
. Unlike other fields such as economics, there is no evidence of publication bias in the union-productivity literature, as a whole. However, there are pockets of publication selection, as well as negative autoregression, confirming the controversial nature of this area of research.
Meta-regression analysis reveals evidence of publication bias (or selection) among U.S. studies.
|
Doucouliagos et al.
33
|
| Physical sciences, biological sciences, social sciences |
Moderate |
Concerns that the growing competition for funding and citations might distort science are frequently discussed but have not been verified directly.
Of the hypothesized problems, perhaps the most worrying is a worsening of positive-outcome bias. A system that disfavors negative results not only distorts the scientific literature directly but might also discourage high-risk projects and pressure scientists to fabricate and falsify their data. This study analyzed over 4600 articles published in all disciplines between 1990 and 2007, measuring the frequency of articles that, having declared to have “tested” a hypothesis, reported a positive support for it. The overall frequency of positive supports has grown by over 22% between 1990 and 2007 with significant differences between disciplines and countries. The increase was stronger in the social and some biomedical disciplines.
The United States had published, over the years, significantly fewer positive results than Asian countries (and particularly Japan) but more than European countries (and in particular the United Kingdom). Methodological artifacts cannot explain away these patterns, which support the hypotheses that research is becoming less pioneering and/or that the objectivity with which results are produced and published is decreasing. |
Fanelli
34
|
| Physical sciences, biological sciences, social sciences |
Moderate |
The growing competition and “publish or perish” culture in academia might conflict with the objectivity and integrity of research, because it forces scientists to produce “publishable” results at all costs.
Papers are less likely to be published and to be cited if they report “negative” results (results that fail to support the tested hypothesis). Therefore, if publication pressures increase scientific bias, the frequency of “positive” results in the literature should be higher in the more competitive and “productive” academic environments. This study verified this hypothesis by measuring the frequency of positive results in a large random sample of papers with a corresponding author based in the United States. Across all disciplines, papers were more likely to support a tested hypothesis if their corresponding authors were working in states that, according to NSF data, produced more academic papers per capita. The size of this effect increased when controlling for state’s per capita R&D expenditure and for study characteristics that previous research showed to correlate with the frequency of positive results, including discipline and methodology. Although the confounding effect of institutions’ prestige could not be excluded (researchers in the more productive universities could be the most clever and successful in their experiments),
these results support the hypothesis that competitive academic environments increase not only scientists’ productivity but also their bias. The same phenomenon might be observed in other countries where academic competition and pressures to publish are high. |
Fanelli
35
|
| Psychology |
Minimal |
In this article, we accomplish two things. First, we show that despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings (≤0.05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis
. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process. |
Simmons et al.
36
|
| Psychology |
Minimal |
Empirical replication has long been considered the final arbiter of phenomena in science, but replication is undermined when there is evidence for publication bias
. Evidence for publication bias in a set of experiments can be found when the observed number of rejections of the null hypothesis exceeds the expected number of rejections. Application of this test reveals evidence of publication bias in two prominent investigations from experimental psychology that have purported to reveal evidence of extrasensory perception and to indicate severe limitations of the scientific method. The presence of publication bias suggests that those investigations cannot be taken as proper scientific studies of such phenomena, because critical data are not available to the field.
Publication bias could partly be avoided if experimental psychologists started using Bayesian data analysis techniques.
|
Francis 2012
37
|
| Psychology |
Minimal |
The article examines a sample of 91 recent meta-analyses published in American Psychological Association and Association for Psychological Science journals and the methods used in these analyses to identify and control for publication bias.
Of the 91 studies analyzed, 64 (70%) made some effort to analyze publication bias, and 26 (41%) reported finding evidence of bias.
Approaches to controlling publication bias were heterogeneous among studies. Of these studies, 57 (63%) attempted to find unpublished studies to control for publication bias. Nonetheless, those studies that included unpublished studies were just as likely to find evidence for publication bias as those that did not. Authors of meta-analyses themselves were overrepresented in unpublished studies acquired, as compared with published studies, suggesting that searches for unpublished studies may increase rather than decrease some sources of bias. A subset of 48 meta-analyses for which study sample sizes and effect sizes were available was further analyzed with a conservative and newly developed tandem procedure of assessing publication bias. Results indicated that publication bias was worrisome in about 25% of meta-analyses. |
Ferguson and Brannick
38
|
| Psychology |
Minimal |
A recent set of articles in Perspectives on Psychological Science discussed inflated correlations between brain measures and behavioral criteria when measurement points (voxels) are deliberately selected to maximize criterion correlations (the target article was Vul et al., 2009). However, closer inspection reveals that this problem is only a special symptom of a broader methodological problem that characterizes all paradigmatic research, not just neuroscience. Researchers not only select voxels to inflate effect size, they also select stimuli, task settings, favorable boundary conditions, dependent variables and independent variables, treatment levels, moderators, mediators, and multiple parameter settings in such a way that empirical phenomena become maximally visible and stable. In general, paradigms can be understood as conventional setups for producing idealized, inflated effects. Although the feasibility of representative designs is restricted, a viable remedy lies in a reorientation of paradigmatic research from the visibility of strong effect sizes to genuine validity and scientific scrutiny. |
Fiedler
39
|
| Psychology |
Minimal |
In order to study the prevalence, nature (direction), and causes of reporting errors in psychology, we checked the consistency of reported test statistics, degrees of freedom, and p values in a random sample of high- and low-impact psychology journals. In a second study, we established the generality of reporting errors in a random sample of recent psychological articles. Our results, on the basis of 281 articles, indicate that around 18% of statistical results in the psychological literature are incorrectly reported. Inconsistencies were more common in low-impact journals than in high-impact journals. Moreover, around 15% of the articles contained at least one statistical conclusion that proved, upon recalculation, to be incorrect; that is, recalculation rendered the previously significant result insignificant, or vice versa. These errors were often in line with researchers’ expectations. |
Bakker and Wicherts
40
|
| Health services research |
Moderate |
Health services research often involves multilevel data, with participants nested in families, clinicians, case managers, community care programs, HMOs, or countries. Longitudinal studies are also common, and studies may be both multilevel and longitudinal. Flexible and efficient analytic methods for data that are multilevel, longitudinal, or both are increasingly accessible to researchers. However, the implications of these methods for research design have not yet been well explored. As a result, researchers remain quite uninformed about crucial design decisions.
|
Raudenbush et al.
41
|
| Physical sciences, biological sciences, social sciences |
Moderate |
The use of multilevel modeling to investigate organizational phenomena is rapidly increasing. Unfortunately, little advice is readily available for organizational researchers attempting to determine statistical power when using multilevel models or when determining sample sizes for each level that will maximize statistical power. This often leads to bias in studies. This article presents an introduction to statistical power in multilevel models. The unique factors influencing power in multilevel models and calculations for estimating power for simple fixed effects, variance components, and cross-level interactions are presented. |
Scherbaum and Ferreter
42
|
| Psychology |
Moderate |
MLM is a key method used by social and personality psychologists in their research. The rationale for MLM is important to understand, especially as applied to the scientific fields. Bias can be introduced into studies in a variety of way. MLM can assist researchers to avoid unnecessary bias by illustrating how the method can and should be applied in research. |
Nezlek
43
|
| Biomedical Clinical Research |
Moderate |
Manuscript selection bias is the selective publication of manuscripts based on study characteristics other than quality indicators
. One reason may be a perceived editorial bias against the research from less-developed world. We aimed to compare the methodological quality and statistical appeal of trials from countries with different development status and to determine their association with the journal impact factors and language of publication. Four hundred records of clinical trials were explored by country economic status from 1993 to 2003. Country income had an inverse linear association with the presence of randomization (χ
2 for trend = 5.6, p = 0.02) and a direct association with the use of blinding (χ
2 for trend = 6.9, p = 0.008); although in low-income countries the probability of blinding was increased from 36% in 1993 to 46% in 2003. In 1993, the results of 68% of high-income trials and 64.7% of other groups were statistically significant; but in 2003, they were 66% and 82%, respectively. Study sample size and income were the only significant predictors of journal impact factor.
The impact of country development on manuscript selection bias is considerable and may be increasing over time. It seems that one reason may be more stringent implementation of the guidelines for improving the reporting quality of trials on developing world researchers. Another reason may be the presumptions of the researchers from developing world about the editorial bias against their nationality.
|
Yousefi-Nooraie et al.
44
|
| Biomedical clinical research |
Moderate |
Many authors believe that there are biases in scientific publications. Editorial biases include publication bias, which refers to those situations where the results influence the editor’s decision, and editorial bias refers to those situations where factors related to authors or their environment influence the decision. This article includes an analysis of the situation of editorial biases.
One bias is where mainly articles with positive results are accepted, as opposed to those with negative results. Another is latent bias, where positive results are published before those with negative results
. In order to examine editorial bias, this article analyses the influence of where the article originated; the country or continent, academic center of origin, belonging to cooperative groups, and the maternal language of the authors. The article analyses biases in the editorial process in the publication of funded clinical trials.
Editorial biases exist
. Authors, when submitting their manuscript, should analyze different journals and decide where their article will receive adequate treatment. |
Matías-Guiu and García-Ramos
45
|
| Medical research |
Moderate |
Reviewers increasingly are asked to review manuscripts from outside their own country, but whether they are more likely to recommend acceptance of such manuscripts is not known. To assess whether US reviewers or non-US reviewers evaluate manuscripts differently, depending on whether the manuscripts are submitted from outside the United States or from the United States. A retrospective analysis of all original submissions received by Gastroenterology in 1995 and 1996. Reviewers ranked manuscripts in four decision categories: accept, provisionally accept, reject with resubmission, or reject. The percentage of non-US manuscripts placed in each decision category by United States (n = 2355) and non-US reviewers (n = 1297) was nearly identical (p = 0.31). However, US reviewers recommended acceptance of articles submitted by US authors more often than did non-US reviewers (p = 0.001). Non-US reviewers ranked US articles slightly more favorably than non-US articles (p = 0.09), while US reviewers ranked US articles much more favorably (p = 0.001).
Reviewers from the United States and outside the United States evaluate non-US papers similarly and evaluate papers submitted by US authors more favorably, with US reviewers having a significant preference for US papers.
|
Link
46
|
| Medical research |
Moderate |
Two hundred and nine manuscripts were reviewed. Commercial funding was not found to be associated with a positive study outcome (p = 0.668). Studies with a positive outcome were no more likely to be published than were those with a negative outcome (p = 0.410). Studies with a negative outcome were of higher quality (p = 0.003) and included larger sample sizes (p = 0.05). Commercially funded (p = 0.027) and US-based (p = 0.020) studies were more likely to be published, even though those studies were not associated with higher quality, larger sample sizes, or lower levels of evidence (p = 0.24–0.79). Commercially funded studies submitted for review were not more likely to conclude with a positive outcome than were nonfunded studies, and studies with a positive outcome were no more likely to be published than were studies with a negative outcome. These findings contradict those of most previous analyses of published (rather than submitted) research. Commercial funding and the country of origin predict publication following peer review beyond what would be expected on the basis of study quality.
Studies with a negative outcome, although seemingly superior in quality, fared no better than studies with a positive outcome in the peer-review process; this may result in inflation of apparent treatment effects when the published literature is subjected to meta-analysis.
|
Lynch
47
|
| Research in animal behavior |
Negligible |
Confirmation bias is a tendency of people to interpret information in a way that confirms their expectations. A long-recognized phenomenon in human psychology, confirmation bias can distort the results of a study and thus reduce its reliability. While confirmation bias can be avoided by conducting studies blind to treatment groups, this practice is not always used. Authors conducted a meta-analysis, using studies on nestmate recognition in ants, to compare the outcomes of studies that were conducted blind with those that were not. Nestmate recognition studies typically perform intra- and inter-colony aggression assays, with the a priori expectation that there should be little or no aggression among nestmates. Aggressive interactions between ants can include subtle behaviors such as mandible flaring and recoil, which can be hard to quantify, making these types of assays prone to confirmation bias. The survey revealed that only 29% of our sample of 79 studies were conducted blind. These studies were more likely to report aggression among nestmates if they were conducted blind (73%) than if they were not (21%). Moreover, it was found that the effect size between nestmate and non-nestmate treatment means is significantly lower in experiments conducted blind than those in which colony identity is known (1.38 vs. 2.76). The implications of the impact of confirmation bias for research that attempts to obtain quantitative synthesizes of data from different studies are discussed. |
van Wilgenburg and Elgar
48
|
| Physics |
Negligible |
The beliefs of physicists can bias their results toward their expectations in a number of ways. The authors survey a variety of historical cases of expectation bias in observations, experiments, and calculations.
|
Jeng
49
|
| Physical sciences, biological sciences, social sciences |
Minimal |
The authors failed to observe any “US effect” at all among genetic research, thus contradicting observations made by an independent, smaller study in genetic epidemiology. This suggests that the prevalence of this and similar patterns needs to be assessed field by field.
However, previous observations suggest that a US propensity to report positive and statistically significant results is not limited to particular methodologies and may cut across many disciplines.
|
Fanelli
50
|
| Biomedical clinical research |
Moderate |
Statistical tests of heterogeneity and bias, in particular publication bias, are very popular in meta-analyses. These tests use statistical approaches whose limitations are often not recognized. Moreover, it is often implied with inappropriate confidence that these tests can provide reliable answers to questions that are not of statistical nature. Statistical heterogeneity is only a correlate of clinical and pragmatic heterogeneity and the correlation may sometimes be weak.
Similarly, statistical signals may hint to bias, but seen in isolation they cannot fully prove or disprove bias in general, let alone specific causes of bias, such as publication bias in particular.
Both false-positive and false-negative signals of heterogeneity and bias can be common, and their prevalence may be anticipated based on some rational considerations. The author discusses the major common challenges and flaws that emerge in using and interpreting statistical tests of heterogeneity and bias in meta-analyses. I discuss misinterpretations that can occur at the level of statistical inference, clinical/pragmatic inference, and specific cause attribution. |
Ioannidis
51
|
| Physical sciences, biological sciences, social sciences |
Negligible |
By integrating and translating the current methodological and statistical work into a practical guide, the authors of this text provide readers with a state-of-the-art introduction to the various approaches to doing meta-analysis and provide the readers with information on various sources of bias. |
Lipsey and Wilson
52
|
| Physical sciences, biological sciences, social sciences |
Negligible |
The metafor package provides functions for conducting meta-analyses in R. The package includes functions for fitting the meta-analytic fixed- and random-effects models and allows for the inclusion of moderators variables (study-level covariates) in these models. Meta-regression analyses with continuous and categorical moderators can be conducted in this way. Functions for the Mantel–Haenszel and Peto’s one-step method for meta-analyses of 2 × 2 table data are also available. Finally, the package provides various plot functions (for example, for forest, funnel, and radial plots) and functions for assessing the model fit, for obtaining case diagnostics, and for tests of publication bias.
|
Viechtbauer
53
|
| Biomedical clinical research |
Moderate |
Clinicians commonly misinterpret systematic review abstracts: a recent study showed many arrived at incorrect conclusions, and only 62% correctly identified the direction of the main effect. Interpreting numerical results requires statistical knowledge that many clinicians lack. To ensure correct interpretation, abstracts should give the direction and size of effects both in words and numerically. Because systematic reviews are important and widely used summaries of primary research. The authors decided to examine a sample of systematic review abstracts to assess the nature and extent of any deficiencies in reporting. For 42% of abstracts of systematic reviews, the direction of the main effect either could not be determined or needed to be inferred. Statistical uncertainty was also poorly reported: 24% of abstracts reported neither a confidence interval nor a p-value. Because many readers can only, or will only, read a systematic review’s abstract, clear presentation of the main results is vital. Although abstracts should present estimates of effect and confidence intervals, interpretation of the results should not require statistical knowledge. Given the high level of innumeracy among journal readers, the main results should be presented in both words and numbers. Although replication in a wider sample of journals is desirable, the apparent poor quality of systematic review abstracts deserves attention from authors, reviewers, and journal editors. |
Beller et al.
54,55
|
| Biomedical clinical research |
Moderate |
ORB in randomized trials has been identified as a threat to the validity of systematic reviews. Previous work highlighting this problem is limited to considering a single primary review outcome. The aim of this study was to assess ORB across all efficacy outcomes in the Cochrane systematic reviews of cystic fibrosis. Systematic reviews of interventions for cystic fibrosis published on the Cochrane Library by the Cochrane Cystic Fibrosis and Genetic Disorders Group before 2010 were assessed for discrepancies in outcomes between review protocol and full review. ORB in eligible trials was also assessed for all efficacy review outcomes. Two authors independently classified each outcome using a nine-point classification system developed by the Outcome Reporting Bias in Trials study. These classifications were used to inform the assessment of the risk of bias for selective outcome reporting for each trial. Forty-six Cochrane cystic fibrosis systematic reviews were included. The median number of primary outcomes, number of trials, and participants per trial in the reviews were 3 (IQR 2, 3), 4 (IQR 2, 8), and 21 (IQR 14, 41), respectively. Eighteen reviews (39%, 18/46) had a discrepancy in outcomes between protocol and full review. Thirty-seven reviews were eligible to be included in the ORB assessment. When considering review primary outcomes and all review outcomes, ORB was suspected in at least one trial in 86% and 100%, respectively. Assessment of ORB within a systematic review of a single primary outcome underestimates the risk of ORB in comparison with the assessment of multiple primary and secondary outcomes. ORB in trials is highly prevalent within systematic reviews of cystic fibrosis when assessed across all outcomes. This could be reduced by the development of a core outcome set for trials and systematic reviews in cystic fibrosis. |
Dwan et al.
56
|
| Biomedical clinical research |
Moderate |
Outcome reporting bias (ORB) has been identified as a threat to the validity of evidence-based-medicine. Trial outcomes with statistically significant results are more likely to be published than non-significant outcomes
. The ORBIT study investigated ORB in Cochrane reviews but only considered primary outcomes and included only two CFGD reviews. The prevalence and impact of ORB in CFGD reviews is unknown. Eighty-two reviews published prior to 2010 were included; 21 identified no randomized controlled trials. Sixty-one reviews were considered further containing 405 included trials and 21 trials excluded due to “no relevant outcome data.” Between protocol and review, there were three reviews that upgraded secondary outcomes to primary, 15 downgraded primary outcomes to secondary, two reviews included outcomes that were not in the protocol and two reviews excluded outcomes that were originally in the protocol. Nine reviews removed secondary outcomes between protocol and review and six reviews added secondary outcomes. Sixteen review protocols did not distinguish between primary and secondary outcomes. The assessment of ORB is ongoing at present. ORB is a problem in all areas of research. A core set of outcomes for genetic disorders will be an important step in reducing ORB and standardizing outcome measures in these clinical areas. |
Dwan et al.
57
|
| Biomedical clinical research |
Minimal |
Cochrane abstracts are the most frequently accessed and used part of a Cochrane review. In randomized trials, evidence shows that authors don’t always report the primary outcome in the abstract and are more likely to report a clinically or statistically significant outcome. This may also be the case in abstracts of systematic reviews. To assess whether reporting of outcomes is consistent between the full text and abstract of Cochrane reviews, the authors included all new reviews published in Issue 4, 2009 of The Cochrane Database of Systematic Reviews, where the primary outcome(s) were clearly stated in the full text and a meta-analysis had been conducted (n = 64); they excluded nonintervention reviews. The median number of primary outcomes per review was two (range 1–10). Only 44 (69%) reviews reported all primary outcomes from the text in the abstract. Twelve (19%) reported only some of the primary outcomes in the abstract, compared with the full text, and eight (13%) failed to report any primary outcomes in the abstract. Of the 56 (88%) reviews that reported one or more primary outcomes in the abstract, only four (7%) stated this was a primary outcome and only eight (14%) reported the relative and absolute effect size and 95% confidence interval or p-value. In 33 reviews (59%), there was no absolute effect size given, in 11 (20%), the result was only stated as “significant” or “not significant,” in three (5%), only a NNT was given, and in one (2%), only the relative effect size was stated with no p-value or confidence interval. The preliminary findings suggest evidence of incomplete and selective reporting in abstracts of Cochrane reviews.
|
Hopewell and Beller
58
|
| Biomedical clinical research |
Minimal |
The purpose of prespecifying primary outcomes in the systematic review process is to define the most clinically relevant outcomes and to protect against bias. Adding, omitting, or changing review outcomes once the protocol is published can result in bias. The objective was to investigate the discrepancies between primary outcomes listed in protocols and the subsequent reviews published in The Cochrane Library. To identify non-publication of review protocols and ill-defined primary outcomes. The authors examined new systematic reviews published between Issue 4, 2006 and Issue 2, 2007 of The Cochrane Library. For each review, discrepancies between the primary outcome(s) listed in the review protocol and the review itself were identified by a statistician and lead review authors were contacted to provide reasons for the discrepancies. A total of 297 reviews were in the study cohort. For the primary outcome measures, there were 49 (16%) reviews that disagreed with the primary outcome(s) specified in the protocol; nine including at least one new primary outcome not specified in the protocol, five excluding at least one primary outcome specified in the protocol and 35 where a protocol outcome was either upgraded or downgraded to a primary/secondary review outcome, respectively. A further 24 (8%) reviews had no protocol registered on the Cochrane Library while 14 (5%) reviews could not be assessed for outcome reporting bias due to poor primary outcome definition. Discrepancies between primary outcomes specified in the review and protocol are common. The most common reasons for these discrepancies were: (1) recommendation by editors/peer reviewers; and (2) recognition of the importance of the outcome before/after reading the results to the included trials. The seriousness of bias arising from the reasons provided by the review authors for such discrepancies, non-registration of review protocols and ill-defined outcome definition will be discussed. |
Kirkham et al.
59
|
| Biomedical clinical research |
Minimal |
Adding, omitting, or changing outcomes after a systematic review protocol is published can result in bias because it increases the potential for unacknowledged or post hoc revisions of the planned analyses. The main objective of this study was to look for discrepancies between primary outcomes listed in protocols and in the subsequent completed reviews published on the Cochrane Library. A secondary objective was to quantify the risk of bias in a set of meta-analyses where discrepancies between outcome specifications in protocols and reviews were found. New reviews from three consecutive issues of the Cochrane Library were assessed. For each review, the primary outcome(s) listed in the review protocol and the review itself were identified and review authors were contacted to provide reasons for any discrepancies. Over a fifth (64/288, 22%) of protocol/review pairings were found to contain a discrepancy in at least one outcome measure, of which 48 (75%) were attributable to changes in the primary outcome measure. Where lead authors could recall a reason for the discrepancy in the primary outcome, there was found to be potential bias in nearly a third (8/28, 29%) of these reviews, with changes being made after knowledge of the results from individual trials. Only 4 (6%) of the 64 reviews with an outcome discrepancy described the reason for the change in the review, with no acknowledgment of the change in any of the eight reviews containing potentially biased discrepancies. Outcomes that were promoted in the review were more likely to be significant than if there was no discrepancy (relative risk 1.66, 95% CI (1.10, 2.49), p = 0.02). In a review, making changes after seeing the results for included studies can lead to biased and misleading interpretation if the importance of the outcome (primary or secondary) is changed on the basis of those results. Our assessment showed that reasons for discrepancies with the protocol are not reported in the review, demonstrating an under-recognition of the problem. Complete transparency in the reporting of changes in outcome specification is vital; systematic reviewers should ensure that any legitimate changes to outcome specification are reported with reason in the review.
|
Kirkham et al.
60
|
| Biomedical clinical research |
Minimal |
Recent studies have shown that final reports do not always correspond to what authors originally set out to study in protocols. This phenomenon, which has only been assessed in primary studies, can affect the quality of reporting and can generate selective reporting of results. The objective was to assess the agreement between the outcomes stated in protocols and those reported in systematic reviews. Original protocols and the subsequent full reports of Cochrane systematic reviews (thereafter called “Cochrane protocol review pairs” published in Issues 2, 3, and 4 of 2005 and Issue 1 of 2006 were first identified and then a random sample of 186 CPRPs was drawn from them. For each CPRP, agreement between protocols and final reports was assessed. A CPRP was classified as “in agreement” if the outcomes were present in both protocols and reviews or when the reviewers report that an outcome was not available in primary studies. Agreement between classification of primary/secondary outcomes was also assessed. A preliminary analysis based on 19 CPRPs has shown that: four (21%) did not provide the protocol, in five (33%), disagreement between the protocol and the final report was observed, and in two (13%), primary and secondary outcomes were switched. Results based on a larger sample will be provided at the Colloquium. One of the stated advantages in Cochrane systematic reviews is the commitment of authors to prepare and publish a protocol before a systematic review is carried out. Therefore, it is important in The Cochrane Collaboration to assess whether this is happening in practice and this study was set out for this. |
Parmelli et al.
61
|
| Biomedical clinical research |
Minimal |
The protocol is considered an important element when a study is conducted. First, it makes a study replicable by independent researchers; second, it provides the researchers with a guide for the conduct of the study; and finally, it protects from biases due to prespecifying hypotheses of the research. Evidence suggests that there are discrepancies between protocols and final studies in terms of outcomes in primary literature. The objective was to evaluate whether discrepancies exist between outcomes and SR protocols in Cochrane SRs. This analysis is based on a sample of 60 SRs. Outcomes of AE were reported in 39 SRs (65%). The disagreement with protocols was observed in 28 SRs (47%) for effectiveness outcomes and in 9 SRs (23%) for AE outcomes. Eleven SRs (18%) included at least a new outcome and 17% excluded at least 1 outcome. In those 18 SRs (30%) in which a change in type of outcome was observed, 6 (10%) upgraded at least 1 outcome, 9 (15%) downgraded at least one outcome, and 3 (5%) did both. Similar results were observed for AE outcomes. The median number of outcomes per protocol was 9 and it was 11 in SRs (range 2–45). We are currently analyzing the association between type of change and statistical significance. The results will be presented considering a larger sample size. The preliminary results show that discrepancies between protocols and SRs are not uncommon. Most discrepancies are due to change in the typologies of outcomes: many outcomes that were primary in the protocols became secondary, or undefined, in the SRs.
|
Parmelli et al.
62
|
| Biomedical clinical research |
Minimal |
Publication of research protocols minimizes bias by explicitly stating a priori hypotheses and methods without prior knowledge of results. The authors conducted a retrospective comparative study to assess the extent to which the content of published Cochrane reviews had changed compared with their previously published protocols and to assess any potential impact these changes may have had in introducing bias to the study. They identified previously published protocols for new Cochrane reviews appearing in The Cochrane Library; 2000, issue 3. The texts of published protocols and completed reviews were compared. Two raters independently identified changes to the different sections of the protocol and classified the changes as none, minor, or major. Of the 66 new Cochrane reviews, they identified a previously published protocol for 47 reviews. Of these, 43 reviews had at least 1 section that had undergone a major change compared with the most recently published protocol. The greatest variation between protocols and reviews was in the methods section, in which 68% of reviews (n = 32) had undergone a major change. Changes made in other sections that may have resulted in the introduction of bias included narrowing of objectives, addition of comparisons or new outcome measures, broadening of criteria for the types of study design included, and narrowing of types of participants included. Research protocols, even if published, are likely to remain, at least to some extent, iterative documents. They found that a large number of changes were made to Cochrane reviews, some of which could be prone to influence by prior knowledge of results. Even if many of the changes between protocol and review improve the overall study, the reasons for making these should be clearly identified and documented within the final review.
|
Silagy et al.
63,64
|
| Genomics |
Moderate |
Prior to applying genomic predictors to clinical samples, the genomic data must be properly normalized to ensure that the test set data are comparable to the data upon which the predictor was trained. The most effective normalization methods depend on data from multiple patients. From a biomedical perspective, this implies that predictions for a single patient may change depending on which other patient samples they are normalized with. This test set bias will occur when any cross-sample normalization is used before clinical prediction. The authors demonstrated that results from existing gene signatures which rely on normalizing test data may be irreproducible when the patient population changes composition or size using a set of curated, publicly available breast cancer microarray experiments. As an alternative, they examined the use of gene signatures that rely on ranks from the data and show why signatures using rank-based features can avoid test set bias while maintaining highly accurate classification, even across platforms. |
Patil et al.
65
|