Unit information prior for incorporating real-world evidence into randomized controlled trials

Abstract

Randomized controlled trials (RCTs) have been widely recognized as the gold standard to infer the treatment effect in clinical research. Recently, there has been growing interest in enhancing and complementing the result in an RCT by integrating real-world evidence from observational studies. The unit information prior (UIP) is a newly proposed technique that can effectively borrow information from multiple historical datasets. We extend this generic approach to synthesize the non-randomized evidence into a current RCT. Not only does the UIP only require summary statistics published from observational studies for ease of implementation, but it also has clear interpretations and can alleviate the potential bias in the real-world evidence via weighting schemes. Extensive numerical experiments show that the UIP can improve the statistical efficiency in estimating the treatment effect for various types of outcome variables. The practical potential of our UIP approach is further illustrated with a real trial of hydroxychloroquine for treating COVID-19 patients.

Keywords

Clinical trials evidence synthesis informative prior observational studies summary statistics

1 Introduction

Randomized controlled trials (RCTs) are widely recognized as the gold standard to evaluate the efficacy of a treatment in clinical studies. However, the rigorous experimental settings and inclusion criteria restrict the generalisability of trial results to the routine clinical practice.¹ It is also costly and slow for RCTs to deliver the findings if a large number of subjects are required to be followed for a long period of time. In order to complement and enhance the evidence in an RCT, information synthesis from the ubiquitous real-world evidence or observational studies into therapeutic development has recently received growing attention in the medical literature.^2–4 Motivated by the above concerns, a variety of statistical methods have been proposed to integrate the non-randomized and randomized evidence, most of which are based on the Bayesian framework due to its prominent flexibility in combining multi-source information. When only aggregate data are available for both RCTs and observational studies, the real-world evidence is typically quantified using meta-analysis techniques,^5–8 such as through the prior distribution of the treatment effect parameter in a network meta-analysis model, or through a multi-level Bayesian hierarchical model. The aforementioned methods are typically designed for aggregating the published summary statistics from all studies, including the treatment effect estimates with the corresponding standard errors, and thus they cannot be directly applied to the scenario where the individual patient data are available for a current RCT.

For the cases where investigators have access to patient-level data of both randomized and non-randomized studies, many approaches have been proposed for evidence synthesis following the idea of propensity scores.⁹ Lin et al.^10,11 leveraged the observational samples to augment the disproportionate RCT into a 1:1 randomized trial, and then discounted the borrowed samples via propensity score-based priors. Wang et al.¹² proposed a Bayesian nonparametric approach to jointly modeling the outcomes and the corresponding propensity scores from different types of studies. Liu et al.¹³ introduced a stratum-specific meta-predictive-analytic prior to including the observational evidence following propensity scores. A three-stage framework was proposed to validate observational data with the RCT as an anchor and integrate the statistical evidence from both sources.¹⁴ Although the individual patient data could allow flexible and effective synthesis of information, they are difficult to obtain for all observational studies in practice while typically only summary statistics are available in the published literature.

We hence focus on the situation where the investigator needs to infer the treatment effect from the patient-level data of a current RCT, while additional summary statistics can be obtained from publications of related observational studies to enhance the RCT results. We assume that the causal estimands are the same between the RCT and observational studies, such as the hazard ratio or odds ratio. This scenario is particularly common under the current COVID-19 pandemic, because the urgency of health care has promoted numerous observational analyses prior to the report released from a well-conducted RCT.^15,16 Some attempts have been made in the literature to deal with the synthesis for mixed data that consist of individual patient data and aggregate information, such as the simulated treatment comparison,¹⁷ matching-adjusted indirect comparisons,¹⁸ Bayesian aggregation¹⁹ and Bayesian meta-analysis with individual data.^20,21 However, those approaches are either only applicable to some specific types of outcomes,^20,21 not pertinent to our scenario,^17,16,19 or unable to account for the potential bias of observational evidence caused by heterogeneity and confounding effects.²⁰ Recently, Jin and Yin²² proposed the unit information prior (UIP) to borrow evidence from historical studies. This approach has several salient advantages to address the information synthesis in our scenario. First, the UIP only requires summary statistics of the historical trials, which is common in the published literature. Second, the UIP can account for the heterogeneity between the current and historical datasets through an appropriate weighting scheme, which helps to discount the biased information.^5,6 Furthermore, it automatically determines the amount of information for integration from different historical datasets. Finally, this method is generic and can be readily modified to accommodate various types of outcomes. However, Jin and Yin²² focused on the development of UIP for binary and continuous data from historical RCTs rather than observational studies. It is thus of great interest to explore the application of this method to incorporate the historical observational evidence into a current RCT with the availability of individual patient data involving various covariates and different types of outcomes. Particularly, we construct the UIP for the causal parameter of treatment effect using the point estimators from the observational studies with their standard errors, which are invariant to the types of outcome variables. We further evaluate the performance of UIP on continuous, binary, and time-to-event outcomes under different configurations.

The remainder of this article is organized as follows. Section 2 introduces how the real-world evidence can be incorporated using the UIP in the estimation of the parameter of interest in an RCT. We present the settings and results of comprehensive numerical experiments in Section 3, where we find that UIP can effectively improve the statistical efficiency of estimation. In Section 4, we re-analyze a randomized trial of hydroxychloroquine for treating COVID-19 by synthesizing the evidence from several observational studies. We conclude with a brief discussion in Section 5.

2 Methodology

2.1 Problem formulation

Let $X = (X_{1}, \dots, X_{n})^{⊤} \in R^{n \times d}$ denote the covariate matrix for $n$ enrolled patients in a current RCT. We assume that the RCT only involves one treatment and one control arm, and accordingly let $Z = (Z_{1}, \dots, Z_{n})^{⊤}$ be the binary allocation indicators with $Z_{i} \in {0, 1}$ for $i = 1, \dots, n$ . The $i$ th patient is assigned to the treatment arm if $Z_{i} = 1$ and to the control arm otherwise. The allocation variable $Z_{i}$ is generated from randomization. The outcome variables $Y = (Y_{1}, \dots, Y_{n})^{⊤} \in R^{n}$ are then collected based on the treatment assignment. Our methodology is general enough to consider different types of outcomes, including continuous, binary, and survival data. For the survival outcome, the variable $Y_{i}$ refers to the observed follow-up time under an independent right-censoring assumption. We additionally observe the event indicators $δ = (δ_{1}, \dots, δ_{n})^{⊤}$ with $δ_{i} \in {0, 1}$ for the survival outcome, where we have $δ_{i} = 1$ if the event occurs otherwise $δ_{i} = 0$ indicating that $Y_{i}$ is a censored observation. We thus have the individual patient data $D = {Y, Z, X, δ}$ for the current RCT data, where $δ$ is fixed as a vector of 1s denoted by $1$ (i.e. no censoring) under continuous and binary outcomes for convenience.

Given the observed RCT data $D$ , we consider a general modeling structure that can cover different types of outcomes

P (Y_{i} | X_{i}, Z_{i}, δ_{i}) = f (Y_{i}, δ_{i}, β_{0} + θ Z_{i} + β^{⊤} X_{i}),

(1)

which denotes the probabilistic model for treatment inference with

β_{0}

θ

, and

β

as parameters for the linear predictor. The function

f (\cdot)

is specified accordingly depending on the type of outcome

Y

. For the continuous outcome, we adopt a linear model

Y_{i} = β_{0} + θ Z_{i} + β^{⊤} X_{i} + ε_{i},

where

ε_{i} \sim N (0, σ_{ε}^{2})

and

σ_{ε}^{2}

is the unknown variance. For the binary outcome, the logistic regression is considered with

logit {P (Y_{i} = 1 | X_{i})} = β_{0} + θ Z_{i} + β^{⊤} X_{i} .

As for the survival outcome, we use the Cox proportional hazards model with the hazard function

h (t | Z_{i}, X_{i})

defined as

h (t | Z_{i}, X_{i}) = h_{0} (t) \exp (θ Z_{i} + β^{⊤} X_{i}),

where

h_{0} (t)

is the unknown baseline hazard function. The parameter

θ

or its transformation is related to the treatment effect of interest, which represents the additive treatment effect for the continuous outcome, the odds ratio (OR) for the binary outcome, and the hazard ratio (HR) for the survival outcome. The parameter

β_{0}

corresponds to the intercept under the linear and logistic models and

β

represents the coefficient vector for covariate effects. Our major goal is to make a comprehensive estimation of

θ

by synthesizing

D

and aggregate evidence from observational studies. Before the formal integration analysis is carried out, we should collect all the necessary aggregate data from relevant observational studies.

2.2 Aggregate data from observational studies

Suppose that we have $K$ historical observational studies that are closely related to the current RCT. There are several fundamental criteria for selecting appropriate observational studies to be included in our synthesis analysis. First, all observational studies should conceptually consider the same clinical objective and intervention as the current RCT, and numerically adopt the same parameter to measure the treatment effect. For example, both observational studies and the current RCT use the HR for survival data to measure the treatment effect. Second, the study design of observational data should be similar to the RCT for controlling the heterogeneity among studies, for example, the inclusion criteria are similar so that participants should have a comparable range of ages and symptoms. Finally, we require that all covariates should be balanced for the estimation of treatment effect in these observational studies to reduce the bias caused by confounding effects. Typically, this refers to analysis based on propensity scores, such as through matching²³ or weighting²⁴ schemes, which has been widely adopted for analyzing observational data in the medical research.^25,26 All information for checking the criteria can be readily found in the abstract of medical publications.

Let $θ_{k}$ denote the parameter of treatment effect in the $k$ th observational study for $k = 1, \dots, K$ , which is the counterpart of $θ$ in the current RCT. We only require three summary statistics for $θ_{k}$ from each observational study. First, we should obtain the maximum likelihood estimator (MLE) ${\hat{θ}}_{k}$ for $θ_{k}$ and the corresponding standard error $S E ({\hat{θ}}_{k})$ . For binary and survival outcomes, ${\hat{θ}}_{k}$ is obtained by conducting the logarithmic transformation on OR and HR, respectively. The standard error can be derived from the confidence interval (CI) if it is not directly provided in the published literature. Assume that the 95% CI is $[l_{k}, u_{k}]$ for the $k$ th study, and we can calculate the standard error by $S E ({\hat{θ}}_{k}) = {g (l_{k}) - g (u_{k})} / (2 \times 1.96)$ , where $g (\cdot)$ is a transformation function based on the type of outcomes. For example, we have $g (\cdot) = \log (\cdot)$ if $\exp (θ_{k})$ corresponds to OR or HR. Sometimes, the published medical papers may provide multiple estimators for $θ_{k}$ based on different covariate-balancing approaches,²⁷ while we only adopt the results from the primary analysis for integration. Second, we need the corresponding sample size $n_{k}$ used in the primary analysis, which should be distinguished from the original sample size in the observational study. For instance, we should specify $n_{k}$ as the sample size of matched data if the primary analysis uses a matching method to balance covariates. We use $S_{k} = {{\hat{θ}}_{k}, S E ({\hat{θ}}_{k}), n_{k}}$ to denote the summary information in the primary analysis of the $k$ th observational study. After collecting $S_{1}, \dots, S_{K}$ , we can construct the UIP and then conduct Bayesian inference for $θ$ in the current RCT.

2.3 Unit information prior

Let $D_{k}$ denote the underlying individual patient data corresponding to $S_{k}$ , and $L^{(k)} (θ_{k} | D_{k})$ refers to the likelihood function for $θ_{k}$ . According to Jin and Yin,²² we first introduce the unit information $I_{U} (θ_{k})$ for $θ_{k}$ ,

I_{U} (θ_{k}) = - \frac{1}{n_{k}} \frac{\partial^{2} \log L^{(k)} (θ_{k} | D_{k})}{\partial θ_{k}^{2}},

which represents the observed Fisher information at a unit sample level. The UIP to

θ

in the current RCT is then defined as follows:

\begin{aligned} θ | (M, w_{1}, \dots, w_{K}, S_{1}, \dots, S_{K}) & \sim N (\sum_{k = 1}^{K} w_{k} {\hat{θ}}_{k}, {M \sum_{k = 1}^{K} w_{k} I_{U} ({\hat{θ}}_{k})}^{- 1}), \\ (w_{1}, \dots, w_{K}) & \sim Dirichlet (γ_{1}, \dots, γ_{K}), \\ M & \sim Uniform (0, min {n, \sum_{k = 1}^{K} n_{k}}), \end{aligned}

where

I_{U} ({\hat{θ}}_{k})

refers to the unit information evaluated at

{\hat{θ}}_{k}

, and

(γ_{1}, \dots, γ_{K})

are pre-specified hyper-parameters with

γ_{k} = min (1, n_{k} / n)

so that a study with a larger sample size would be assigned a higher prior weight. We tend to assign a smaller weight in the Dirichlet prior for the dataset whose sample size is smaller than that of the RCT, and otherwise, the prior weight is bounded by 1. Intuitively, the UIP first incorporates the unit-level information from each observational study through the weight parameters

w_{k}

, and then figures out the total amount of information to be synthesized through the parameter

M

. Closely related to the effective sample size in the prior,²⁸

M

is bounded by the sample size of the RCT

(n)

as well as the total sample size of matched data in the observational studies

(\sum_{k = 1}^{K} n_{k})

, because the UIP should not dominate the current RCT. Both

M

and

w_{k}

’s are adaptively determined by the commensurability between the RCT and observational studies.

The only concern now is to specify $I_{U} ({\hat{θ}}_{k})$ for the $k$ th observational study. Jin and Yin²² suggested performing a case-by-case derivation to obtain the explicit form of $I_{U} ({\hat{θ}}_{k})$ based on the type of outcome and the corresponding model. By contrast, we provide a more general alternative to determine the unit information. Following the asymptotic normality of the MLE, we have

\sqrt{n_{k}} ({\hat{θ}}_{k} - θ_{k}) \to N (0, 1 / E {I_{U} (θ_{k})}),

where the expectation is taken over the distribution that generates independent individual samples in

D_{k}

. The above property shows that the asymptotic variance of

{\hat{θ}}_{k}

1 / [n_{k} E {I_{U} (θ_{k})}]

, and it thus motivates us to define

I_{U} ({\hat{θ}}_{k})

using the empirical standard error of

{\hat{θ}}_{k}

I_{U} ({\hat{θ}}_{k}) = \frac{1}{n_{k} S E^{2} ({\hat{θ}}_{k})} .

Finally, we adopt non-informative prior (NIP) for the other parameters specified in model (1) and then perform Bayesian inference.

There are several advantages of incorporating real-world evidence using UIP. First, we only need the summary statistics $S_{k}$ to construct the prior without the need for access to the individual data $D_{k}$ . Second, the specification of UIP is independent of the outcome model, which greatly facilitates its broad applicability. Most importantly, the UIP accounts for the potential bias in each observational study by a weighting scheme. Different from RCTs, the estimator obtained from an observational study cannot avoid the bias that originates from the non-randomized treatment allocation, although the bias can be reduced by some post-hoc balancing techniques. Furthermore, the bias may also originate from the heterogeneity between observational studies and the current RCT. Therefore, properly discounting the observational evidence plays an essential role in the synthesis to adjust the possible over-estimation or under-estimation of the treatment effect.^5–7 In the UIP, we have

w_{k} I_{U} ({\hat{θ}}_{k}) = \frac{1}{n_{k} {S E^{2} ({\hat{θ}}_{k}) / w_{k}}}

for each study, where

w_{k} \in (0, 1)

inflates the standard error

S E^{2} ({\hat{θ}}_{k})

to alleviate the underlying unamendable bias. A larger weight

w_{k}

leads to a smaller standard error

S E^{2} ({\hat{θ}}_{k}) / w_{k}

, which leads to more information borrowing through the UIP.

3 Simulation

In the simulation, we evaluate the performance of the aforementioned UIP in estimating the treatment effect for the current RCT with respect to different types of outcomes. Individual patient data are first simulated for both the RCT and observational studies, and the propensity score analysis is performed on the observational data to generate the summary statistics for further synthesis.

3.1 Continuous outcome

For continuous outcomes, we fix the sample size of the current RCT at $n = 200$ , and generate the covariates $X_{i}$ and allocation indicator $Z_{i}$ for the $i$ th sample $(i = 1, \dots, n)$ as follows:

\begin{aligned} X_{i} & \sim N (0, (1 - ρ) I_{d} + ρ 1_{d} 1_{d}^{⊤}), \\ Z_{i} & \sim Bernoulli (0.5), \end{aligned}

where we set the correlation coefficient

ρ = 0.1

and the covariate dimension

d = 6

. The first two covariates (

X_{i j}, j = 1, 2

) are then dichotomized to 1 if it is non-negative and to 0 otherwise. Given the covariates

X_{i}

and the assignment indicators

Z_{i}

, we generate the outcome variables

Y_{i}

of RCT through a linear model,

Y_{i} = 1 + θ Z_{i} + β^{⊤} X_{i} + ε_{i},

where

θ = 1

β = 1_{d}

, and

ε_{i} \sim N (0, {0.5}^{2})

For the evidence from observational studies, we consider $K = 3$ historical observational studies and fix the sample size $N_{k} = 500$ for the $k$ th study with $k = 1, 2, 3$ . It is worth emphasizing that $N_{k}$ is the original sample size of the observational studies and $n_{k}$ in the UIP is the sample size after matching the propensity scores to balance the covariates, so $n_{k} \leq N_{k}$ . Similarly, we first generate the covariates from a normal distribution with $d_{k} = 6$ ,

X_{i}^{(k)} \sim N (μ_{k} 1_{d_{k}}, σ_{k}^{2} (1 - ρ_{k}) I_{d_{k}} + σ_{k}^{2} ρ_{k} 1_{d_{k}} 1_{d_{k}}^{⊤}),

where

μ_{k}

and

σ_{k}^{2}

are the covariate mean and variance, respectively, and

ρ_{k}

denotes the correlation coefficient. The first two covariates (

X_{i j}^{(k)}, j = 1, 2

) are then dichotomized to 1 if its value is larger than

μ_{k}

and to 0 otherwise. For the observational studies, the treatment indicator variables depend on covariates or confounders, which are generated from the following model:

\begin{aligned} Z_{i}^{(k)} & \sim Bernoulli {P (Z_{i}^{(k)} = 1 | X_{i}^{(k)})}, \\ logit {P (Z_{i}^{(k)} = 1 | X_{i}^{(k)})} & = η^{(k) ⊤} X_{i}^{(k)} = \sum_{j = 1}^{d_{k}} η_{j}^{(k)} X_{i j}^{(k)}, \end{aligned}

where

P (Z_{i}^{(k)} = 1 | X_{i}^{(k)})

is the propensity score with the coefficient vector

η^{(k)} = (η_{1}^{(k)}, \dots, η_{d_{k}}^{(k)})^{⊤}

. After collecting the covariate matrix and the allocation indicators, we generate the continuous outcome for the

k

th observational study via

Y_{i}^{(k)} = 1 + θ_{k} Z_{i}^{(k)} + β^{(k) ⊤} X_{i}^{(k)} + ε_{i},

where

ε_{i} \sim N (0, {0.5}^{2})

θ_{k}

and

β^{(k)}

are the parameters of the treatment effect and covariate coefficients, respectively.

3.2 Binary outcome

Regarding binary outcomes, the covariates and treatment indicators of the RCT and three observational studies are generated following the same procedure as that for continuous outcomes. Under a logistic model

logit {P (Y_{i} = 1 | X_{i})} = θ Z_{i} + β^{⊤} X_{i},

with the coefficients

β = 0.1 \times 1_{d}

, the outcome of RCT is then simulated as

Y_{i} \sim Bernoulli {P (Y_{i} = 1 | X_{i})} .

For the three observational studies, we simulate

Y_{i}^{(k)}

for

k = 1, 2, 3

as follows:

\begin{aligned} Y_{i}^{(k)} & \sim Bernoulli {P (Y_{i}^{(k)} = 1 | X_{i}^{(k)})}, \\ logit {P (Y_{i}^{(k)} = 1 | X_{i}^{(k)})} & = θ_{k} Z_{i}^{(k)} + β^{(k) ⊤} X_{i}^{(k)}, \end{aligned}

where the treatment indicator

Z_{i}^{(k)}

is confounded with covariates

X_{i}^{(k)}

due to non-randomization.

3.3 Survival outcome

For survival data, we generate the covariates and treatment indicators of the RCT and historical data in the same manner. Following the Cox proportional hazards model, we assume a Weibull baseline hazard function $h_{0} (t) = s t^{s - 1} / 2^{s}$ , where the values of the shape parameter $s = 0.5, 1, 2$ correspond to decreasing, constant and increasing hazards. We generate the underlying event time $T_{E, i}$ for the RCT as follows:

T_{E, i} \sim F (t) = 1 - \exp {- (t^{s} / 2^{s}) \exp (θ Z_{i} + β^{⊤} X_{i})},

where the coefficient vector

β

is kept as

0.1 \times 1_{d}

. The censoring time

T_{C, i}

is generated as

T_{C, i} \sim Uniform (0, C),

where the constant parameter

C

is determined to achieve a censoring rate of 10%. We set the observed time

Y_{i} = min {T_{E, i}, T_{C, i}}

and the censoring indicator

δ_{i} = I (T_{E, i} \leq T_{C, i}),

where

I (\cdot)

denotes the indicator function.

The survival outcome in the $k$ th observational study is similarly simulated from the proportional hazards model

T_{E, i}^{(k)} \sim F (t) = 1 - \exp {- (t^{s_{k}} / 2^{s_{k}}) \exp (θ_{k} Z_{i}^{(k)} + β^{(k) ⊤} X_{i}^{(k)})},

where the parameter

s_{k}

determines the shape of the baseline hazard function. The censoring time is generated as

T_{C, i}^{(k)} \sim Uniform (0, C^{(k)})

, where

C^{(k)}

is chosen to determine the censoring rate. The observed time is

Y_{i}^{(k)} = min {T_{E, i}^{(k)}, T_{C, i}^{(k)}},

and the censoring indicator is

δ_{i}^{(k)} = I (T_{E, i}^{(k)} \leq T_{C, i}^{(k)})

. For simplicity, we set

s_{1} = s_{2} = s_{3} = s

and specify

C^{(k)}

to ensure the censoring rate to be 10% for

k = 1, 2, 3

3.4 Parameter settings for observational data generation

We consider five different scenarios for the parameters $μ_{k}$ , $σ_{k}^{2}$ , $θ_{k}$ and $β^{(k)}$ in the observational studies as follows:

Scenario 1: Let $(μ_{1}, μ_{2}, μ_{3}) = (0, 0, 0)$ , $(σ_{1}^{2}, σ_{2}^{2}, σ_{3}^{2}) = (1, 1, 1)$ , $(ρ_{1}, ρ_{2}, ρ_{3}) = (0.1, 0.1, 0.1)$ , and $(θ_{1}, θ_{2}, θ_{3}) = (1, 1, 1)$ . Set $β^{(k)} = 1_{6}$ for continuous outcomes and $β^{(k)} = 0.1 \times 1_{6}$ for both the binary and survival outcomes with $k = 1, 2, 3$ .

Scenario 2: Let $(μ_{1}, μ_{2}, μ_{3}) = (0, 0, 0)$ , $(σ_{1}^{2}, σ_{2}^{2}, σ_{3}^{2}) = (1, 1, 1)$ , $(ρ_{1}, ρ_{2}, ρ_{3}) = (0.1, 0.1, 0.1)$ , and $(θ_{1}, θ_{2}, θ_{3}) = (0.8, 1.1, 1.3)$ . Generate $β^{(k)} \sim N (1_{6}, I_{6})$ for continuous outcomes and $β^{(k)} \sim N (0.1 \times 1_{6}, {0.1}^{2} \times I_{6})$ for both the binary and survival outcomes with $k = 1, 2, 3$ .

Scenario 3: Let $(μ_{1}, μ_{2}, μ_{3}) = (0.5, 1, 1.5)$ , $(σ_{1}^{2}, σ_{2}^{2}, σ_{3}^{2}) = (0.5, 1.5, 2)$ , $(ρ_{1}, ρ_{2}, ρ_{3}) = (0.141, 0.071, 0.082)$ , and $(θ_{1}, θ_{2}, θ_{3}) = (0.8, 1.1, 1.3)$ . Generate $β^{(k)} \sim N (1_{6}, I_{6})$ for continuous outcomes and $β^{(k)} \sim N (0.1 \times 1_{6}, {0.1}^{2} \times I_{6})$ for both the binary and survival outcomes with $k = 1, 2, 3$ .

Scenario 1 considers the ideal situation when the observational data share the same treatment effect and patient population as the RCT data except that the participants are not randomized during the allocation in the observational studies. Scenario 2 makes the treatment effects in the observational studies slightly different from that in the RCT. In Scenario 3, we simulate more realistic cases where the observational data are collected from related studies with similar treatment effects but slightly different patient populations in terms of covariate means and variances. Furthermore, we consider two scenarios for continuous and binary outcomes to demonstrate how our method performs in some extreme situations.

Scenario 4: Let $(θ_{1}, θ_{2}, θ_{3}) = (0.1, 0.5, 0.9)$ for continuous outcomes and $(θ_{1}, θ_{2}, θ_{3}) = (0.5, 2.5, 4.5)$ for binary outcomes. Values of other parameters are kept the same as those in Scenario 3.

Scenario 5: Let $(θ_{1}, θ_{2}, θ_{3}) = (2, 3, 4)$ for continuous outcomes and $(θ_{1}, θ_{2}, θ_{3}) = (5, 7, 9)$ for binary outcomes. Values of other parameters are kept the same as those in Scenario 3.

Scenario 4 considers the case where the observational data are not strictly selected, because the treatments adopted in the observational studies are not identical and

θ_{k}

’s are different from each other as well as deviating from the RCT treatment effect

θ = 1

. Scenario 5 mimics the situation where the drugs investigated in the observational studies may not even be related to the one in the RCT because treatment effects are substantially off the target. Finally, we fix the coefficient vectors

η^{(k)}

’s for the propensity scores for each observational study as follows:

\begin{aligned} Observational study 1 : η^{(1)} & = (0, 0, 0.2, 0.2, - 0.2, - 0.2)^{⊤} . \\ Observational study 2 : η^{(2)} & = (1, 1, 1, 1, - 1, - 1)^{⊤} . \\ Observational study 3 : η^{(3)} & = (2, 2, 2, 2, - 2, - 2)^{⊤} . \end{aligned}

Figure 1 shows the distributions of propensity scores for three observational datasets under different generation scenarios. A wider distributional gap or separation between the treatment and control groups indicates a stronger confounding effect, which means the covariates in the corresponding dataset are more difficult to balance.

Figure 1.

Distributions of propensity scores under different scenarios for each observational dataset.

3.5 Aggregate data generation

Through the above data generation procedure, we can obtain the patient-level datasets $D_{k}$ for $k = 1, 2, 3$ . We then obtain the aggregate data $S_{k}$ summarized from $D_{k}$ by (i) fitting a logistic model to estimate the propensity scores; (ii) using the nearest neighbor matching (Match)²³ or inverse probability weighting (IPW)²⁴ to balance the observational data; and (iii) conducting inference over the balanced dataset. In the second step, we implement Match and IPW using MatchIt²⁹ and WeightIt³⁰ packages, respectively. For the $k$ th aggregate data $S_{k} = {{\hat{θ}}_{k}, S E ({\hat{θ}}_{k}), n_{k}}$ , the adjusted sample size $n_{k}$ is the number of samples after the propensity score matching whereas we take $n_{k} = N_{k}$ for IPW. Furthermore, the MLEs ${\hat{θ}}_{k}$ ’s for different types of outcomes are accordingly obtained using the functions lm(), glm() and coxph() in the R language.³¹ Finally, we adopt the robust sandwich standard error to calculate $S E ({\hat{θ}}_{k})$ following the instructions in MatchIt and WeightIt.

3.6 Bayesian inference

After collecting the RCT data and aggregrating information from observational data, we conduct the Bayesian inference using the models in Section 2.1. Bayesian linear regression and logistic regression models are fitted to the continuous and binary outcomes, respectively. As for the survival outcome, we use the piecewise exponential proportional hazards model, where the baseline hazard function $h_{0} (t)$ is modeled as the piecewise constant function with the partition points of the time axis as $0 < τ_{1} < \dots < τ_{J}$ . We set the number of intervals to be $J = 10$ , and ${τ_{1}, \dots, τ_{J}}$ are chosen based on the corresponding $100 (j / J)$ quantiles ( $j = 1, \dots, J$ ) of observed event times. Within each interval $(τ_{j - 1}, τ_{j}]$ , we assume a constant hazard $λ_{j}$ , which is assigned an NIP $Gamma (0.01, 0.01)$ . Under the linear model, the priors for $β_{0}$ , $β$ , and $σ_{ε}^{2}$ are set to be $β_{0} \sim N (0, 100^{2})$ , $β \sim N (0, 100^{2} \times I_{d})$ , and $σ_{ε}^{2} \sim Inv-Gamma (0.01, 0.01)$ , and we adopt the same priors for the corresponding parameters in the logistic and Cox models. For the prior treatment effect parameter $θ$ , we compare UIP with the NIP, which is simply defined as $θ \sim N (0, 100^{2})$ .

All Bayesian models are implemented by PyMC3,³² and 5,000 posterior samples are drawn for inference using the Markov chain Monte Carlo with 1,000 burn-in iterations. We simulate 200 repetitions for each configuration of data generation. Based on the repetitions, we evaluate UIP and NIP in terms of the empirical bias, root mean squared error (RMSE), and the width and coverage of the $95 %$ credible interval (CrI) for the parameter $θ$ . The $95 %$ CrI is constructed based on $2.5 %$ and $97.5 %$ quantiles from the posterior samples.

3.7 Results

Table 1 presents the simulation results for the continuous outcome. Because NIP does not require observational data, its results are the same across different scenarios. Considering the UIP, we observe that it can effectively improve the precision of estimating $θ$ in terms of RMSE under various scenarios and both balancing approaches. In addition, UIP reduces the bias under Scenario 1 when the observational data share the same treatment effect with RCT, and only incurs a minor increase in bias under Scenarios 2 and 3 when there exist heterogeneous treatment effects and covariates between the RCT and observational studies. For the CrI, UIP yields shorter widths and slightly larger values of coverage probability compared with NIP for the first three scenarios, which is due to the precise estimation of the treatment effect after synthesizing the observational evidence. Furthermore, UIP tends to borrow more information from $D_{1}$ and $D_{2}$ as demonstrated by larger values of $w_{1}$ and $w_{2}$ in Scenarios 2 and 3. It is due to the empirical fact that the first two datasets can more accurately estimate $θ_{1}$ and $θ_{2}$ , which are closer to $θ$ of the RCT, after balancing the relative less confounded covariates (e.g. $n_{1} = 482$ , $n_{2} = 369$ , and $n_{3} = 366$ under Scenario 3). The values of $w_{k}$ ’s are more distinctive when the deviations of $θ_{k}$ ’s from $θ$ vary substantially under Scenario 4. The total amount of borrowed information $M$ is slightly above half of the RCT sample size $(n = 200)$ for the UIP approaches when $θ_{k}$ ’s are close to $θ$ under Scenarios 1 to 3. A drastic decrease can be observed for $M$ under Scenarios 4 and 5, where the treatment effects in the observational studies are more extreme relative to $θ$ in the RCT. In Scenario 5, almost no information is borrowed from observational studies due to such a small value of $M = 2$ , and thus UIP delivers performance close to NIP in terms of all evaluation metrics.

Table 1.
The simulation results of different prior methods for continuous outcomes (linear regression) under three scenarios.

Scenario Method ${Bias}_{\times 100}$ ${RMSE}_{\times 100}$ Width Coverage $w_{1}$ $w_{2}$ $w_{3}$ $M$

NIP 1.15 7.27 0.283 0.939

1 UIP + Match 0.73 5.91 0.246 0.944 0.352 0.336 0.312 111

UIP + IPW 0.61 5.98 0.246 0.949 0.353 0.340 0.307 111

2 UIP + Match 1.45 6.44 0.256 0.944 0.358 0.349 0.294 109

UIP + IPW 1.34 6.43 0.256 0.949 0.356 0.351 0.293 109

3 UIP + Match 1.37 6.32 0.255 0.939 0.355 0.348 0.296 109

UIP + IPW 1.37 6.29 0.256 0.939 0.361 0.349 0.290 108

4 UIP + Match $-$ 2.19 7.44 0.273 0.924 0.140 0.293 0.567 72

UIP + IPW $-$ 2.03 7.41 0.274 0.939 0.144 0.304 0.552 73

5 UIP + Match 1.70 7.45 0.275 0.914 0.398 0.349 0.253 2

UIP + IPW 1.66 7.41 0.275 0.929 0.395 0.347 0.258 2

Scenario	Method	${Bias}_{\times 100}$	${RMSE}_{\times 100}$	Width	Coverage	$w_{1}$	$w_{2}$	$w_{3}$	$M$
	NIP	1.15	7.27	0.283	0.939
1	UIP + Match	0.73	5.91	0.246	0.944	0.352	0.336	0.312	111
	UIP + IPW	0.61	5.98	0.246	0.949	0.353	0.340	0.307	111
2	UIP + Match	1.45	6.44	0.256	0.944	0.358	0.349	0.294	109
	UIP + IPW	1.34	6.43	0.256	0.949	0.356	0.351	0.293	109
3	UIP + Match	1.37	6.32	0.255	0.939	0.355	0.348	0.296	109
	UIP + IPW	1.37	6.29	0.256	0.939	0.361	0.349	0.290	108
4	UIP + Match	$-$ 2.19	7.44	0.273	0.924	0.140	0.293	0.567	72
	UIP + IPW	$-$ 2.03	7.41	0.274	0.939	0.144	0.304	0.552	73
5	UIP + Match	1.70	7.45	0.275	0.914	0.398	0.349	0.253	2
	UIP + IPW	1.66	7.41	0.275	0.929	0.395	0.347	0.258	2

CrI: credible interval; IPW: inverse probability weighting; Match: nearest neighbor matching; NIP: non-informative prior; RMSE: root mean squared error; UIP: unit information prior. Note: We evaluate the performance of estimating the treatment effect $θ$ using the bias, RMSE, the width and coverage of the 95% CrI. The weights of UIP for three observational studies are denoted by $w_{1}, w_{2}, w_{3}$ , and $M$ corresponds to the amount of information borrowed from observational data measured by the number of units.

The results for the binary outcome are shown in Table 2. Clearly, the UIP can reduce the RMSE under Scenarios 1 to 3, in contrast, to NIP with a minor sacrifice in bias. Narrower CrIs can be obtained under UIP with coverage probabilities closer to the nominal level. Under Scenarios 4 and 5, the first two datasets contribute more to the current RCT as evidenced by relatively larger values of $w_{1}$ and $w_{2}$ in comparison with $w_{3}$ . The UIP borrows much less information in Scenario 5 as indicated by small values of $M$ , because of substantial differences between the treatment effects of the observational studies and RCT. The UIP also improves the inference of $θ$ for the survival outcome as shown by Table 3. The UIP can reduce both the bias and RMSE compared with NIP for decreasing ( $s = 0.5$ ), constant ( $s = 1$ ), and increasing ( $s = 2$ ) baseline hazards, which shows the robustness of UIP with respect to different types of hazard functions. It also yields tighter CrIs for $θ$ with coverage probabilities closer to the nominal level. In summary, UIP can enhance the inference of the RCT via integrating real-world evidence, and the improvement is more evident when the RCT and observational studies are similar.

Table 2.

The simulation results of different prior methods for binary outcomes (logistic regression) under three scenarios.

Scenario	Method	Bias	RMSE	Width	Coverage	$w_{1}$	$w_{2}$	$w_{3}$	$M$
	NIP	0.061	0.346	1.238	0.929
1	UIP + Match	0.075	0.285	1.153	0.965	0.348	0.327	0.325	112
	UIP + IPW	0.089	0.293	1.175	0.960	0.360	0.328	0.313	110
2	UIP + Match	0.092	0.291	1.162	0.975	0.349	0.331	0.321	111
	UIP + IPW	0.101	0.307	1.183	0.970	0.361	0.335	0.304	111
3	UIP + Match	0.081	0.289	1.163	0.975	0.347	0.327	0.325	112
	UIP + IPW	0.090	0.306	1.173	0.955	0.363	0.330	0.307	111
4	UIP + Match	0.207	0.376	1.233	0.894	0.527	0.304	0.169	96
	UIP + IPW	0.208	0.378	1.236	0.879	0.544	0.299	0.156	94
5	UIP + Match	0.136	0.375	1.290	0.929	0.548	0.261	0.191	12
	UIP + IPW	0.131	0.374	1.288	0.939	0.466	0.312	0.222	14

CrI: credible interval; IPW: inverse probability weighting; Match: nearest neighbor matching; NIP: non-informative prior; RMSE: root mean squared error; UIP: unit information prior. Note: We evaluate the performance of estimating the treatment parameter $θ$ using the bias, RMSE, the width and coverage of the 95% CrI. The weights of UIP for three observational studies are denoted by $w_{1}, w_{2}, w_{3}$ , and $M$ corresponds to the amount of information borrowed from observational data measured by the number of units.

Table 3.

The simulation results of different prior methods for survival outcomes (Cox regression) under various cases, where $s$ determines the shape of the baseline hazard function.

$s$	Scenario	Method	Bias	RMSE	Width	Coverage	$w_{1}$	$w_{2}$	$w_{3}$	$M$
0.5		NIP	0.062	0.191	0.665	0.919
	1	UIP + Match	0.046	0.152	0.568	0.944	0.359	0.333	0.308	113
		UIP + IPW	0.044	0.156	0.577	0.949	0.370	0.330	0.300	113
	2	UIP + Match	0.035	0.154	0.576	0.944	0.337	0.343	0.320	112
		UIP + IPW	0.035	0.159	0.581	0.939	0.344	0.342	0.314	111
	3	UIP + Match	0.034	0.154	0.573	0.939	0.342	0.341	0.317	112
		UIP + IPW	0.031	0.157	0.582	0.955	0.349	0.336	0.315	111
1		NIP	0.053	0.184	0.664	0.924
	1	UIP + Match	0.037	0.149	0.576	0.955	0.356	0.332	0.312	114
		UIP + IPW	0.035	0.156	0.592	0.949	0.369	0.332	0.299	114
	2	UIP + Match	0.048	0.153	0.580	0.955	0.356	0.335	0.308	113
		UIP + IPW	0.044	0.154	0.588	0.955	0.365	0.335	0.300	113
	3	UIP + Match	0.047	0.155	0.578	0.955	0.358	0.334	0.309	113
		UIP + IPW	0.039	0.153	0.586	0.965	0.361	0.334	0.305	113
2		NIP	0.069	0.184	0.678	0.939
	1	UIP + Match	0.054	0.148	0.584	0.975	0.357	0.332	0.311	114
		UIP + IPW	0.054	0.153	0.592	0.949	0.368	0.330	0.302	114
	2	UIP + Match	0.060	0.153	0.585	0.955	0.348	0.338	0.314	114
		UIP + IPW	0.055	0.153	0.589	0.955	0.356	0.340	0.304	113
	3	UIP + Match	0.063	0.155	0.589	0.944	0.352	0.338	0.310	114
		UIP + IPW	0.059	0.155	0.597	0.960	0.360	0.338	0.301	113

CrI: credible interval; IPW: inverse probability weighting; Match: nearest neighbor matching; NIP: non-informative prior; RMSE: root mean squared error; UIP: unit information prior. Note: We evaluate the performance of estimating the treatment parameter $θ$ using the bias, RMSE, the width and coverage of the 95% CrI. The weights of UIP for three observational studies are denoted by $w_{1}, w_{2}, w_{3}$ , and $M$ corresponds to the amount of information borrowed from observational data measured by the number of units.

Based on Tables 1 to 3, we can also gain some insight into the factors that affect $w_{k}$ ’s and $M$ . In general, the weights are determined by both treatment effects and the covariate confounding effects in the observational data relative to the RCT, and the impact of the treatment effect is more evident. In Scenario 1, all observational data share the same treatment effect with the RCT, that is, $θ_{1} = θ_{2} = θ_{3} = θ$ , but they have different degrees of covariate confounding effect. Since a greater confounding effect would lead to a larger standard error or a more biased treatment effect estimator, the first two datasets, $D_{1}$ and $D_{2}$ , tend to be more informative and we thus observe $w_{1} > w_{2} > w_{3}$ . In Scenario 2, although $θ_{2}$ is closer to $θ$ than $θ_{1}$ , we observe $w_{2} < w_{1}$ in most cases, because the covariate confounding effect has a greater impact on the weights than the treatment effects when all $θ_{k}$ ’s are close to $θ$ . Scenarios 4 and 5 show that the observational treatment effects are more influential than the confounding effect when $θ_{k}$ ’s deviate far from $θ$ . Furthermore, by comparing Scenario 2 and Scenario 3, the patient population does not have a strong influence on the order of $w_{k}$ ’s.

The parameter $M$ is affected by the level of consistency between the observational studies and the current RCT,²² for which the more inconsistency the smaller value of $M$ . Scenarios 1 to 3 show that the patient population does not impose a strong influence on the values of $M$ . Scenarios 4 and 5 show that $M$ is sensitive to the dissimilarity between $θ_{k}$ ’s and $θ$ .

4 Real application

In the COVID-19 pandemic era, numerous studies have been conducted to find effective treatments. We illustrate the application of UIP by re-analyzing a randomized trial,³³ which was launched to evaluate the efficacy of hydroxychloroquine (HCQ) in reducing the in-hospital mortality of COVID-19 compared with the placebo. The parameter of interest is the HR between the HCQ and placebo groups. Due to inaccessibility to the original survival data, we use the R package IPDfromKM^34,35 to reconstruct the patient-level data from the Kaplan–Meier curves of the HCQ and placebo groups in Figure 2 of the original paper.³³ The reconstructed dataset contains 208 samples with 104 samples from the HCQ group and 104 from the control group. We obtain the outcome variable $Y_{i}$ , treatment indicator $Z_{i}$ , and the event indicator $δ_{i}$ for the $i$ th reconstructed observation.

Based on the reconstructed data, we re-estimate the HR of receiving HCQ, respectively, using NIP and UIP under the piecewise exponential model without covariates, i.e.that is, $β = 0$ . In addition to the Bayesian approaches, we also adopt the frequentist Cox proportional hazards model via the R function coxph() to estimate the HR. For the UIP, we collect the aggregate information from four different observational studies denoted by their respective published journals: PLoS One (PLoS),³⁶New England Journal of Medicine (NEJM),²⁷American Journal of Epidemiology (AJE),³⁷ and Clinical Infectious Diseases (CID).³⁸ The four observational studies are selected following the criteria detailed in Section 2.2 with the aforementioned RCT as the template. First, those studies all considered the effect of HCQ therapy in reducing the mortality of COVID-19 and used the HR to measure the treatment efficacy. Second, similar to the RCT, the enrolled samples in those studies were all non-pregnant hospitalized patients who had been diagnosed with COVID-19 and did not participate in other trials, and the majority of them were middle-aged or older people. Finally, those studies all fitted the Cox proportional hazards model with the propensity score adjustment to estimate the HR, where the propensity score weighting was used for NEJM, AJE, and CID, while PLoS leveraged the propensity score stratification. The summary information provided in the published papers includes sample size, the estimated HR $\exp ({\hat{θ}}_{k})$ , and its $95 %$ CI. We perform the logarithmic transformation to obtain $S_{k} = {{\hat{θ}}_{k}, SE ({\hat{θ}}_{k}), n_{k}}$ , $k = 1, \dots, 4$ . The UIP and NIP are then, respectively, applied to the Bayesian model following the same configurations used in the simulation.

We report the estimated HR for the reconstructed RCT data and the summary information of the historical studies in Table 4. The estimated HR under UIP is 0.96 (95% CrI, 0.62–1.41), which has a narrower CrI than that under NIP, 0.94 (95% CrI, 0.58–1.44), and the frequentist counterpart, 0.93 (95% CI, 0.59–1.45), thus demonstrating the effectiveness of historical information borrowing. The parameter $M$ is estimated to be 122 for UIP, indicating that UIP synthesizes an intermediate amount of observational information in contrast with the sample size of the randomized trial. The UIP weights for (PLoS, NEJM, AJE, CID) are $(w_{1}, \dots, w_{4}) = (0.295, 0.307, 0.284, 0.114)$ , suggesting that the least amount of information is borrowed from CID and more information is incorporated from NEJM and PLoS. The sample sizes of the two studies in NEJM and PLoS are both large and their estimated HRs have narrower CIs and, as a result, UIP tends to borrow more evidence from more reliable studies. In summary, the re-analyzed HR based on UIP concludes that there is no significant improvement in the post-hospitalization mortality caused by COVID-19 for patients receiving HCQ, which is consistent with the claim in the original article but the evidence via incorporating the four observational studies is clearly more convincing.

Table 4.
The estimated HR with the CI/CrI for the reconstructed RCT and four historical observational studies using their respective journal names.

Samples HR 95% CI CI Width UIP Weights

PLoS 2512 1.02 (0.83, 1.27) 0.44 0.295

NEJM 1376 1.04 (0.82, 1.32) 0.50 0.307

AJE 998 1.21 (0.82, 1.76) 0.94 0.284

CID 84 0.89 (0.23, 3.47) 3.24 0.114

RCT 208 0.93 (0.59,1.45) 0.85

95% CrI CrI Width

RCT + NIP 208 0.94 (0.58, 1.44) 0.86

RCT + UIP 208 0.96 (0.62, 1.41) 0.79 ( $M = 122$ )

	Samples	HR	95% CI	CI Width	UIP Weights
PLoS	2512	1.02	(0.83, 1.27)	0.44	0.295
NEJM	1376	1.04	(0.82, 1.32)	0.50	0.307
AJE	998	1.21	(0.82, 1.76)	0.94	0.284
CID	84	0.89	(0.23, 3.47)	3.24	0.114
RCT	208	0.93	(0.59,1.45)	0.85
			95% CrI	CrI Width
RCT + NIP	208	0.94	(0.58, 1.44)	0.86
RCT + UIP	208	0.96	(0.62, 1.41)	0.79	( $M = 122$ )

HR: hazard ratio; CrI: credible interval; CI: confidence interval; RCT: randomized controlled trial; NIP: non-informative prior; UIP: unit information prior. Note: Journal names: PLoS: PLoS One; NEJM: New England Journal of Medicine; AJE: American Journal of Epidemiology; CID: Clinical Infectious Diseases. The parameter $M$ corresponds to the amount of information borrowed from observational data measured by the number of units.

5 Discussion

For a more comprehensive analysis of a current RCT, we develop the UIP methodology for synthesizing the evidence from historical observational studies into the analysis of the RCT data. The UIP is readily applicable to various common types of outcomes considered in clinical trials, and it only requires published summary information. The prior automatically quantifies the amount of information borrowed from each study and simultaneously considers the unamendable bias in the real-world evidence. Extensive numerical experiments show that the UIP can improve the statistical efficiency of inference for the RCT. There are several promising directions to further extend the idea of UIP for information borrowing. For example, it is of interest to leverage the prior to incorporate the patient-level observational data into the single-arm trials as discussed by Liu et al.¹³ Furthermore, one can investigate how UIP affects the type I error and power, and its possible application in determining the sample size of an RCT at the design stage. The UIP also sheds light on a meta-analysis from the information borrowing aspect, which warrants further investigation. Finally, one may consider the synthesis when individual patient-level data are available for both the RCT and observational studies, for which other existing methods, such as the power prior or commensurate prior, can also be applicable. The codes for the numerical studies are available at https://github.com/BobZhangHT/UIP4RCT.

Footnotes

Acknowledgements

We thank the two referees for their careful reviews of our manuscript and many insightful comments that have significantly improved the paper.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

This research was partially supported by funding from the Research Grants Council of Hong Kong (17308321).

ORCID iDs

Hengtao Zhang

Guosheng Yin

Supplemental material

Supplementary material for this article is available online.

References

Rothwell

. External validity of randomised controlled trials: “To whom do the results of this trial apply?”. Lancet 2005; 365: 82–93.

Sherman

Anderson

Dal Pan

et al. Real-world evidence—what is it and what can it tell us? N Engl J Med 2016; 375: 2293–2297.

Katkade

Sanders

Zou

. Real world data: An opportunity to supplement existing evidence for the use of long-established medicines in health care decision making. J Multidiscip Healthc 2018; 11: 295–304.

Schmidli

Häring

Thomas

et al. Beyond randomized clinical trials: Use of external controls. Clin Pharmacol Ther 2020; 107: 806–816.

Schmitz

Adams

Walsh

. Incorporating data from various trial designs into a mixed treatment comparison model. Stat Med 2013; 32: 2935–2949.

Efthimiou

Mavridis

Debray

et al. Combining randomized and non-randomized evidence in network meta-analysis. Stat Med 2017; 36: 1210–1226.

Verde

. A bias-corrected meta-analysis model for combining, studies of different types and quality. Biom J 2021; 63: 406–422.

Jenkins

Hussein

Martina

et al. Methods for the inclusion of real world evidence in network meta-analysis. arXiv preprint arXiv:180506839, 2021.

Rosenbaum

Rubin

. The central role of the propensity score in observational studies for causal effects. Biometrika 1983; 70: 41–55.

10.

Lin

Gamalo-Siebers

Tiwari

. Propensity score matched augmented controls in randomized clinical trials: A case study. Pharm Stat 2018; 17: 629–647.

11.

Lin

Gamalo-Siebers

Tiwari

. Propensity-score-based priors for Bayesian augmented control design. Pharm Stat 2019; 18: 223–238.

12.

Wang

Rosner

. A Bayesian nonparametric causal inference model for synthesizing randomized clinical trial and real-world evidence. Stat Med 2019; 38: 2573–2588.

13.

Liu

Bunn

Hupf

et al. Propensity-score-based meta-analytic predictive prior for incorporating real-world and historical data. Stat Med 2021. In Press.

14.

Zhang

Lin

Starkopf

et al. Estimation of causal effect in integrating randomized clinical trial and observational data–an example application to cardiovascular outcome trial. Contemp Clin Trials 2021; 107: 106492.

15.

Rubin

Harrington

Hogan

et al. The urgency of care during the covid-19 pandemic—learning as we go. N Engl J Med 2020; 382: 2461–2462.

16.

Cohen

. Hydroxychloroquine for the prevention of covid-19—searching for evidence. N Engl J Med 2020; 383: 585–586.

17.

Caro

Ishak

. No head-to-head trial? Simulate the missing arms. Pharmacoeconomics 2010; 28: 957–967.

18.

Signorovitch

Andrew

et al. Comparative effectiveness without head-to-head trials. Pharmacoeconomics 2010; 28: 935–945.

19.

Weber

Gelman

Lee

et al. Bayesian aggregation of average data: An application in drug development. Ann Appl Stat 2018; 12: 1583–1604.

20.

Sutton

Kendrick

Coupland

. Meta-analysis of individual-and aggregate-level data. Stat Med 2008; 27: 651–669.

21.

Riley

Lambert

Staessen

et al. Meta-analysis of continuous outcomes combining individual patient data and aggregate data. Stat Med 2008; 27: 1870–1893.

22.

Jin

Yin

. Unit information prior for adaptive information borrowing from multiple historical datasets. Stat Med 2021; 40: 5657–5672.

23.

Austin

. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res 2011; 46: 399–424.

24.

Robins

Hernán

MÁ

Brumback

. Marginal structural models and causal inference in epidemiology. Epidemiology 2000; 11: 550–560.

25.

Yao

Wang

Speicher

et al. Reporting and guidelines in propensity score analysis: A systematic review of cancer and cancer surgical studies. JNCI: J Nat Cancer Inst 2017; 109: djw323.

26.

Granger

Watkins

Sergeant

et al. A review of the use of propensity score diagnostics in papers published in high-ranking medical journals. BMC Med Res Methodol 2020; 20: 1–9.

27.

Geleris

Sun

Platt

et al. Observational study of hydroxychloroquine in hospitalized patients with COVID-19. N Engl J Med 2020; 382: 2411–2418.

28.

Morita

Thall

Müller

. Determining the effective sample size of a parametric prior. Biometrics 2008; 64: 595–602.

29.

Imai

King

et al. MatchIt: Nonparametric preprocessing for parametric causal inference. J Stat Softw 2011; 42: 1–28. https://www.jstatsoft.org/v42/i08/.

30.

Greifer

. WeightIt: Weighting for Covariate Balance in Observational Studies, 2021. https://CRAN.R-project.org/package=WeightIt. R package version 0.12.0.

31.

R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2021. https://www.R-project.org/.

32.

Salvatier

Wiecki

Fonnesbeck

. Probabilistic programming in Python using PyMC3. PeerJ Comput Sci 2016; 2: e55.

33.

Hernandez-Cardenas

Thirion-Romero

Rodríguez-Llamazares

et al. Hydroxychloroquine for the treatment of severe respiratory infection by covid-19: A randomized controlled trial. PLoS ONE 2021; 16: e0257238.

34.

Liu

Zhou

Lee

. IPDfromKM: Reconstruct individual patient data from published Kaplan-Meier survival curves. BMC Med Res Methodol 2021; 21: 1–22.

35.

Guyot

Ades

Ouwens

et al. Enhanced secondary analysis of survival data: Reconstructing the data from published Kaplan-Meier survival curves. BMC Med Res Methodol 2012; 12: 1–13.

36.

Berry

Hansen

et al. Hydroxychloroquine and tocilizumab therapy in COVID-19 patients—an observational study. PLoS ONE 2020; 15: e0237693.

37.

Gerlovin

Posner

et al. Pharmacoepidemiology, machine learning and COVID-19: An intent-to-treat analysis of hydroxychloroquine, with or without azithromycin, and COVID-19 outcomes amongst hospitalized US veterans. Am J Epidemiol 2021.

38.

Paccoud

Tubach

Baptiste

et al. Compassionate use of hydroxychloroquine in clinical practice for patients with mild to severe COVID-19 in a French university hospital. Clin Infect Dis 2020; 323: 2493–2502.