The case for introducing pre-registered confirmatory pharmacological pre-clinical studies

Abstract

When evaluating the design of pre-clinical studies in the field of traumatic brain injury, we found substantial differences compared to phase III clinical trials, which in part may explain the difficulties in translating promising experimental drugs into approved treatments. By using network analysis, we also found cases where a large proportion of the studies evaluating a pre-clinical treatment was performed by inter-related researchers, which is potentially problematic. Subjecting all pre-clinical trials to the rigor of a phase III clinical trial is, however, likely not practically achievable. Instead, we repeat the call for a distinction to be made between exploratory and confirmatory pre-clinical studies.

Keywords

Experimental design network analysis systematic reviews translational research traumatic brain injury

Introduction

The frequent failures to translate promising pre-clinical research into approved drug treatments are a concern in medical research, and the field of traumatic brain injury (TBI) is no exception.¹ The reason for this is likely multi-factorial and suggested causes include patient heterogeneity, insufficient therapeutic time window, and inexact outcome measures.² Another potential reason, which so far has been given limited attention, is discrepancies in experimental design between pre-clinical studies and phase III clinical trials. To explore this subject, we randomly selected and evaluated 30 experimental studies, published in 2016, which evaluated pharmacological treatments for TBI. We also explored the use of author network analysis (ANA) as a tool to determine if a conclusion about potential therapeutic efficacy has been independently reached by different researcher constellations.

Network analysis

In order to perform ANAs, we first identified three treatments which had been evaluated extensively pre-clinically. The first is in clinical use for TBI-induced sub-arachnoid hemorrhage (Treatment A), the second failed to improve outcome in phase III (Treatment B), while the third has not yet been tested in phase III (Treatment C). An author network was then defined as researchers which had co-authored any of these publications. Author list for the identified publications was retrieved from PubMed using the R package RisMed and unique authors preliminary identified by matching first initial and family name. This assignment was then manually verified, co-authorship identified using R, and the data visualized using Gephi (Figure 1).

Figure 1.

Authors networks involved in preclinical studies of Treatment A (a and d), Treatment B (b and e), and Treatment C (c and f). In a–c, each node represents one author and co-authors are connected by lines. The percentage of all publications that a researcher is a co-author on determines the color. Note that one researcher is a co-author on more than 50% of the publications evaluating Treatment B (b and h) and has a NBC of 0.08 (i). In d–f, each node represents one publication, and publications with shared authors are connected by lines. Note that in both e and f, an independent group of authors explicitly raised concerns regarding the efficacy of the treatment. The data are quantified as the fraction of publications per network (g), fraction of publications per author (h), and the NBC for each author (i).

By letting one node in the network graph represent one author (Figure 1(a) to (c)), it is apparent that a substantial number of researchers distributed among several independent networks contributed to the fields. The majority of researchers co-authored less than 10% of the publications in their field, with a notable exception in Treatment B, where one researcher co-authored more than half of the identified publications. The networks can also be visualized by letting one node represent one publication, and then connect publications with shared authors (Figure 1(d) to (f)). This visualization reveals that almost all networks evaluating Treatment A contributed a single publication, while several larger networks evaluated Treatment C. In the case of Treatment B, it is clear that a single network dominates the field (Figure 1(e)) and has contributed more than half of the publications. To further quantify the results, we calculated the normalized betweenness centrality (NBC) for each author. The NBC assess to which degree a node lies on the shortest path between two other nodes, and thus indicate if a node (the author) is in a central position within the network.³ The score for NBC can range from 0 to 1 and the majority of authors had scores less than 0.02 (Figure 1(i)). A clear exception is one author with a score of 0.08, indicating that this researcher is very influential within the field.

Even though most of the research in Treatment B have been performed within a single network, their results are supported by several other independent author networks, indicating that the dominant network performed high-quality research. If a large fraction of the research is performed within a single network, this may, however, lead to homogeneity in the experimental settings. Conversely, the use of a large variety of experimental models, species, strains, subject ages and sex, as well as dosing regimens, may potentially lead to results with a higher external validity and be a better approximation of the complex clinical setting.

In Phase III clinical trials, the external validity of the results is ensured by including patients from multiple centers. This is occasionally done in pre-clinical research, for example in the Operation Brain Trauma Therapy, which performs multi-center pre-clinical evaluations of experimental pharmacological treatments for TBI.⁴ This is, however, typically not done in pre-clinical research, although the existence of publications from several independent author networks can be considered analogous.

Overall, we found ANA to be a promising tool when evaluating pre-clinical research before deciding whether to launch clinical trials or not. Further use of this tool is needed to determine its value, but it has at least three potential uses:

Detection of large networks which may indicate, for example, homogeneity in the choice of experimental methods.

Inclusion in systematic reviews to visualize the networks behind the included publications.

Meta-analyses may be performed with and without a dominant network to determine its influence on the final result.

Identify authors with multiple prior publications so they can be excluded from studies aimed at confirming whether a treatment works or not, since they can be perceived to have a conflict of interest.

When considering whether a treatment is extensively studied or not, the number of participating networks may be more important than the number of publications.

Drug administration

Another potential difference between pre-clinical studies and phase III trials is the choice of administration route, where clinical trials often use i.v. injections. In the 30 randomly selected TBI publications, on the other hand, we found that 21 utilized i.p. injections and only five used i.v. injections. Substances injected i.p. are primarily absorbed into the hepatic portal vein,⁵ and are thus subjected to hepatic first-pass metabolism, but not gut-wall metabolism. This route of administration is therefore not directly comparable to neither oral nor intravenous administration, and its use potentially reduces the chance of successful clinical translation.

Subject characteristics

In order to achieve clinical translation, it is likely important to match the sex and age of the preclinical subjects to the patient population. Sex has been suggested to affect outcome following TBI and age is well established as an important outcome predictor. Young males are over-represented among TBI patients, but older and female patients still make up a substantial portion, mainly due to fall accidents. While assessing whether this is reflected in the pre-clinical literature, we found that the actual age of the animals was only reported in 12 of 30 publications, despite this being one of the most important outcome predictors identified.⁶ By estimating the age using the reported weight and growth curves, we found that none of the studies utilized aged animals and only one included female animals (Figure 2(a)). The almost exclusive use of young male animals in pre-clinical studies thus represents a fairly clear discrepancy compared to the phase III clinical trials.

Figure 2.

Characteristics of 30 randomly selected publications evaluating pharmacological treatments in experimental models of TBI. (a) The age of rodents included in pre-clinical studies and an assessment of the corresponding human age. (b) Reported group sizes for experiments performing some sort of tissue analysis or behavioral testing. Group sizes in behavioral tests, determined using a power calculation (PC), were significantly higher than those determined without one. (c) The number of evaluated outcome measures in individual publications. (d) The percentage of outcome measures found to be improved compared to the percentage of the abstract discussing improved outcome measures. The fraction of each abstract devoted to describing improved outcomes was significantly higher than the fraction of improved outcomes in the described study.

Study design and statistics

Other potential discrepancies between pre-clinical research and phase III clinical trials that have been previously described are the use of blinding, randomization, and power calculations to determine group sizes.⁷ We found a limited use of blinding (22 of 30) and randomization (20 of 30) as well as an almost complete absence of power calculations (3 of 30), consistent with previous reports.⁸ The utilized group sizes were also fairly small, often less than 10 animals (Figure 2(b)), which raises the concern for the risks associated with underpowered studies.⁹ The group sizes used for behavioral testing were significantly higher in studies which used power calculations. Performing power calculations can thus influence experimental design and their use in pre-clinical research is potentially an important step to reduce the discrepancy between pre-clinical and clinical studies.

To evaluate the impact of introducing power calculations, we estimated the group size required to assess tissue loss following controlled cortical impact (CCI) in rats. Based on studies published in 2016, we estimated that it is reasonable to expect a tissue loss of 10 mm³ with a standard deviation of 2 mm³. In order to use Student's t-test to detect a 20% reduction in tissue loss with α = 0.05 and β = 0.2, a power calculation reveals that a group size of 17 subjects is required. This is considerably more than what most of the evaluated studies used (Figure 2(b)), even though the conventional values for α and β were used.¹⁰ It has, however, been suggested that the final test of a hypothesis should use stricter criteria, for example α = 0.01.¹¹ Given the high cost of clinical trials and the current lack of effective treatments that specifically targets secondary injury mechanism following TBI, it is highly desirable to avoid both type I and type II errors in preclinical research. Furthermore, the need for rapid treatment initiation following TBI means that some clinical trials are performed with exemption from informed consent. For ethical reasons, these types of studies should only be performed with treatments which have a reasonably high chance of success. To achieve this by using the stricter criteria of a = 0.01 and β = 0.05 results in required group sizes of 38 subjects, far more than what is currently used.

In addition to the limited use of power calculations, we were also unable to find a single study which reported a pre-determined strategy for the statistical analysis, which may lead to bias if there are several ways to perform the analysis.¹² In the statistical evaluation of pre-clinical research data, there are typically several choices to make, for example between parametric and non-parametric tests, whether to adjust for pre-injury results or not, as well as the selection of a post hoc test. In most cases, there is no consensus on best practice and the methodology differs between publications.

Outcome measures

Similarly to the absence of a pre-determined lack of a statistical analysis strategy, we were unable to find any publications that declared a pre-determined primary outcome measure. To assess to what extent this is a potential problem, we determined the number of outcome measures utilized in each publication and found that a substantial amount of publications used more than 10 outcome measures (Figure 2(c)). The use of a pre-determined primary outcome measure is mandatory for phase III clinical trials since a post hoc choice of primary outcome may be biased towards improved outcomes. To determine if this type of bias was present in the evaluated pre-clinical studies, we determined if the phrasing of the abstract reflected the fraction of the evaluated outcomes which were improved. This was done by classifying the words in the abstract as either neutral, (describing previous results, methods, and effects of the trauma itself), negative (describing outcomes that were unaffected or impaired), or positive (describing improved outcomes). We then compared the amount of positive and negative phrasing to determine the percentage of positive words and compared this to the percentage of outcomes that were improved (Figure 2(d)). The result clearly shows that there is a bias towards highlighting the outcomes measures that were improved. The lack of a predetermined primary outcome measure is thus not only causing a risk for bias, at least in the way the findings are summarized in abstracts, there is a selection bias towards improved outcomes.

Conclusions

To conclude, there are several aspects in which the experimental design used in preclinical research differs from phase III clinical trials. Several of these may increase the risk of bias, and in terms of how the results are summarized in publication abstracts, there was actual bias towards highlighting positive results.

Guidelines such as Arrive and STAIR strive to mitigate these problems^13,14 by introducing a single set of requirements for all pre-clinical studies. This is potentially problematic since the requirements needed to mimic a phase III clinical trial are very high. Especially the need to pre-determine certain aspects of the experimental design may stifle investigations which are more exploratory in nature.

A suggested solution is to distinguish between exploratory and confirmatory pre-clinical studies.¹⁵ The proposed strategy is to use exploratory studies to establish molecular mechanism, pharmacokinetics, dose-response curves, therapeutic time-window, optimal dosing regimens and suitable outcome measures. Once a promising drug candidate and dosing regimen have been identified, the work can continue with confirmatory studies. There are several characteristics of confirmatory studies which we consider crucial:

Pre-registration of primary and secondary outcome measures.

The primary outcome measure is based on functional/behavioral evaluation, rather than an assessment of tissue characteristics.

Pre-registration of the method for statistical evaluation.

Group sizes determined using power calculations.

Pre-registration of inclusion and exclusion criteria, as well as the reporting of any excluded subjects.

Performed by researcher without prior publications evaluating the treatment to avoid a conflict of interest.

The administration route is directly translatable to the clinic.

The age and sex of the subjects correspond to the clinical situation.

In particular, the pre-registration of trial design is crucial not only to avoid bias within a study, but also to estimate the extent of publication bias of entire studies or exclusion of non-improved outcome measures. The introduction of pre-registration for clinical trials resulted in a drop in the fraction of studies which reported positive results,¹⁶ indicating the importance of this measure. Pre-registration can be done, for example, by using www.preclinicaltrials.eu.¹⁷ While pre-registration requires very limited resources, the use of power calculations is, however, likely to cause a substantial increase in the group sizes. This may be difficult to handle for an individual research group, but there are some potential approaches which may mitigate this problem:

Multi-center studies are performed to spread the efforts across several research groups. This would, however, require a solution to the problem of inconsistent results between laboratories, especially when performing behavioral testing.¹⁸

Some research groups specialize in performing confirmatory studies.

Resources are directed towards the development of automated methods, which may decrease the cost per subject and improve consistency between laboratories.

Resources are directed towards the development of methods with decreased variability to enable smaller group sizes.

Adaptive designs are introduced to allow studies to be terminated early if the treatment has no chance to reach significance (futility) or is obviously beneficial.

The introduction of this type of confirmatory studies could thus potentially help bridge the gap in experimental design between pre-clinical research and phase III clinical studies that currently exists.

Footnotes

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Mattson Foundation via the Swedish Brain Foundation (PS2015-0060), the Swedish Armed Forces (R&D), and the Stockholm County Council (ALF 561589).

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

ORCID iD

Anders Hånell

References

Maas

Marmarou

Murray

, et al. Prognosis and clinical trial design in traumatic brain injury: the IMPACT study. J Neurotrauma 2007; 24: 232–238.

Bullock

Merchant

Choi

, et al. Outcome measures for clinical trials in neurotrauma. Neurosurg Focus 2002; 13: ECP1.

Opsahl

Agneessens

Skvoretz

. Node centrality in weighted networks: generalizing degree and shortest paths. Social Networks 2010; 32: 245–251.

Kochanek

Bramlett

Shear

, et al. Synthesis of findings, current investigations, and future directions: operation brain trauma therapy. J Neurotrauma 2016; 33: 606–614.

Turner

Brabb

Pekow

, et al. Administration of substances to laboratory animals: routes of administration and factors to consider. J Am Assoc Lab Anim Sci 2011; 50: 600–613.

Hukkelhoven

Steyerberg

Rampen

, et al. Patient age and outcome following severe traumatic brain injury: an analysis of 5600 patients. J Neurosurg 2003; 99: 666–673.

Bragge

Synnot

Maas

, et al. A State-of-the-science overview of randomized controlled trials evaluating acute management of moderate-to-severe traumatic brain injury. J Neurotrauma 2016; 33: 1461–1478.

Macleod

Lawson McLean

Kyriakopoulou

, et al. Risk of bias in reports of in vivo research: a focus for improvement. PLoS Biol 2015; 13: e1002273.

Button

Ioannidis

Mokrysz

, et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 2013; 14: 365–376.

10.

Schulz

Grimes

. Sample size calculations in randomised trials: mandatory and mystical. Lancet 2005; 365: 1348–1353.

11.

Mogil

Macleod

. No publication without confirmation. Nature 2017; 542: 409–411.

12.

Head

Holman

Lanfear

, et al. The extent and consequences of p-hacking in science. PLoS Biol 2015; 13: e1002106.

13.

Kilkenny

Browne

Cuthill

, et al. Improving bioscience research reporting: the ARRIVE guidelines for reporting animal research. PLoS Biol 2010; 8: e1000412.

14.

Stroke Therapy Academic Industry

. Recommendations for standards regarding preclinical neuroprotective and restorative drug development. Stroke 1999; 30: 2752–2758.

15.

Kimmelman

Mogil

Dirnagl

. Distinguishing between exploratory and confirmatory preclinical research will improve translation. PLoS Biol 2014; 12: e1001863.

16.

Kaplan

Irvin

. Likelihood of null effects of large NHLBI clinical trials has increased over time. PLoS One 2015; 10: e0132382.

17.

Jansen of Lorkeers

Doevendans

Chamuleau

. All preclinical trials should be registered in advance in an online registry. Eur J Clin Invest 2014; 44: 891–892.

18.

Crabbe

Wahlsten

Dudek

. Genetics of mouse behavior: interactions with laboratory environment. Science 1999; 284: 1670–1672.