Sage Journals: Discover world-class research

Abstract

Background

Survival extrapolation is essential in cost-effectiveness analysis to quantify the lifetime survival benefit associated with a new intervention, due to the restricted duration of randomized controlled trials (RCTs). Current approaches of extrapolation often assume that the treatment effect observed in the trial can continue indefinitely, which is unrealistic and may have a huge impact on decisions for resource allocation.

Objective

We introduce a novel methodology as a possible solution to alleviate the problem of survival extrapolation with heavily censored data from clinical trials.

Method

The main idea is to mix a flexible model (e.g., Cox semiparametric) to fit as well as possible the observed data and a parametric model encoding assumptions on the expected behavior of underlying long-term survival. The two are “blended” into a single survival curve that is identical with the Cox model over the range of observed times and gradually approaching the parametric model over the extrapolation period based on a weight function. The weight function regulates the way two survival curves are blended, determining how the internal and external sources contribute to the estimated survival over time.

Results

A 4-y follow-up RCT of rituximab in combination with fludarabine and cyclophosphamide versus fludarabine and cyclophosphamide alone for the first-line treatment of chronic lymphocytic leukemia is used to illustrate the method.

Conclusion

Long-term extrapolation from immature trial data may lead to significantly different estimates with various modelling assumptions. The blending approach provides sufficient flexibility, allowing a wide range of plausible scenarios to be considered as well as the inclusion of external information, based, for example, on hard data or expert opinion. Both internal and external validity can be carefully examined.

Highlights

Interim analyses of trials with limited follow-up are often subject to high degrees of administrative censoring, which may result in implausible long-term extrapolations using standard approaches.

In this article, we present an innovative methodology based on “blending” survival curves to relax the traditional proportional hazard assumption and simultaneously incorporate external information to guide the extrapolation.

The blended method provides a simple and powerful framework to allow a careful consideration of a wide range of plausible scenarios, accounting for model fit to the short-term data as well as the plausibility of long-term extrapolations.

Keywords

expert opinion;extrapolation;real world evidence survival modeling in HTA

Survival or “time-to-event” data from randomized control trials (RCTs) are typically used to assess the cost-effectiveness of new interventions. However, the observed data from RCTs are often censored and immature with limited duration of follow-up,¹ so the clinical benefits regarding life expectancy or quality-adjusted life-years (QALYs) cannot be estimated directly. Consequently, it is necessary to extrapolate the estimates of the resulting survival proportions, often long beyond the data observed in the trial period.²

Methods of extrapolation most used in submissions to health technology agencies such as the National Institute for Health and Care Excellence (NICE) in the United Kingdom often consider a parametric model for the control arm and assume proportional hazard (PH) to derive the survival curve for the treatment arm.^3–5 This implicitly assumes a constant treatment effect beyond the trial period. However, a treatment performing well over the course of the trial is unlikely to remain consistent on account of various factors such as waning treatment effects or competing risks from other causes of mortality. The typical length of follow-up in clinical trials has been shown to account for no more than 40% of the modeled time horizon,⁶ failing to reach median time. In the absence of long-term data, care should be taken in whether the extrapolation is realistic, as the long-term modeling assumptions can have a dramatic impact on the decisions.^7,8

Historically, conventional approaches involved fitting the most appropriate parametric model to the observed data.⁹ In fact, different models with a similar fit to the data may generate highly divergent long-term survival estimates due to the differences in the tails of survival distributions. Recently, there has been an increasing recognition that external long-term validity is essential when the extrapolation period is substantial with heavy censoring in the trial data.^4,10–12 Current guidelines recommend the inclusion of both statistical criteria for model fitting as well as clinical plausibility of extrapolation, which may be achieved through the use of external data or expert opinion.¹³ In recent times, the proportion of health technology appraisals (HTAs) using external information for validity has increased sharply,^2,6 in which clinical experts assess the plausibility of extrapolation or evaluate which models fall in with the elicited plausible range of survival.¹⁴

There are many different ways that external data can be leveraged.^10,15 While the most frequent methods are indirect or retrospective, direct utilization of patient-level data for the extrapolation have increasingly been considered.^12–14 It is possible that historical data are formally integrated into the extrapolated portion as informative priors via a Bayesian framework.¹⁶ In addition, a piecewise or hybrid approach where observational data are used to facilitate the extrapolation has been undertaken, although the selection of where to implement cut points can be fairly subjective or arbitrary.^11,17,18 Commonly, external data from a different source will not match the trial population perfectly^19–21 so that hazard rates from a model fitted to external data could be matched to the control arm using a time acceleration adjustment after follow-up of the trial and anchoring/hazard ratio tapering for the investigation arm.¹² Further methods were attempted to combine evidence from a variety of available sources: especially under the Bayesian framework, disease-specific external data from registries might be extrapolated using general population data^12,15,22-25 or informed by justifiable clinical opinion where the external data are not fully mature.^15,26

This article presents a method based on “blending” survival curves as a possible solution. A similar approach has been presented previously in other applied fields²⁷ but we adapt it to survival modeling for cost-effectiveness analysis. The basic idea is to mix a flexible model (e.g., Cox semiparametric) to fit as well as possible the observed data and a parametric model encoding assumptions on the expected behavior of underlying long-term survival. The blended curve will improve decision making especially in cases in which decisions are made accounting for survival in long-term timeframes relative to the available trial data but expert knowledge or external information about the long-term is available and can be coherently combined. Extrapolated curves using only the short-term data are likely to be biased or overestimate survival, whereas the blended model helps constrain the tail and retain the information in the early time period. For HTA, cost and QALY calculations can use the estimated survival in the blending interval, which is consistent with information from both the early and later stages.

Motivating Case Study

Our motivating example is the one considered in NICE technology appraisal TA174²⁸ and in other methodological contributions.^29,30 This is based on the CLL-8 trial,³¹ which compares rituximab with fludarabine and cyclophosphamide (FCR) to fludarabine and cyclophosphamide (FC) for the first-line treatment of chronic lymphocytic leukemia.

Among 810 patients enrolled in the trial, 403 were randomly assigned to receive the treatment of FCR and the remaining 407 to the control arm of FC. There were 41 and 52 deaths in the FCR and FC arms, respectively.²⁹ While this study has a relatively large sample size and a relatively long follow-up (about 4 y), it is also characterized by a large amount of censoring such that more than 70% of individuals were not observed to die, as is common in this type of investigation. Following existing guidance,^13,32 a set of standard parametric distributions were fitted to the digitized data on overall survival from published Kaplan-Meier (KM) curves, as shown in Figure 1.

Figure 1

Overall survival curves for the parametric models (Exponential, Weibull, log-Normal, log-Logistic, Gompertz) fitted to the 4-y CLL-8 trial data (Kaplan-Meier curves) and long-term extrapolation to 15 y.

These models achieved a reasonable fit to the observed data (as evident in the left portion of Figure 1), but none of them generated credible extrapolations. All models suggested greater than 30% survival at 15 y, which was in stark contrast with expert estimates, suggesting instead that only 1.3% of the cohort would be likely to survive beyond that time.³³

Blended Survival Curve Methodology

Denote the available data as $D_{i} = (t_{i}, d_{i})$ , where $t_{i}$ is the observed time at which the event (e.g., progression or death) occurs, while $d_{i}$ is an event indicator taking value 1 if the ith individual is fully observed and 0 if censored. Typically, we model $t_{i} | θ_{i} ~ p (t | θ)$ , where $p (\cdot)$ is a parametric distribution indexed by a vector of parameters $θ$ , for instance $θ = (γ, μ)$ indicating shape and scale, respectively. Given this structure, we can define the hazard $h (t | θ)$ and the survival function $S (t | θ) = \Pr (T > t | θ)$ .

Blending considers 2 separate processes to describe the long-term horizon survival. The first one is driven exclusively by the observed data. Similar to a “standard” HTA analysis, we use this to determine an estimate over the entire time horizon, which we term $S_{obs} (t | θ_{obs})$ , a function of the relevant parameters $θ_{obs}$ . We could choose a simple parametric model or, alternatively, some more complex model, with the main objective to produce the best fit possible to the observed information. Unlike in a standard modeling exercise in which the issue of overfitting is potentially critical, achieving a very close approximation to the observed dynamics has much less important implications in the case of blending, as explained below.

For the second component of the blending process, we consider a separate external survival curve, $S_{ext} (t | θ_{ext})$ . This is a parametric model that is not informed by the observed data—for instance, we could use hard information (e.g., derived from a different data source, such as registries or observational studies) or construct a model that is purely based on subjective knowledge elicited from experts or possibly a combination of the two. Either way, $S_{ext} (t | θ_{ext})$ will typically be less concerned with the observed portion (for which we want the available data to drive the inference) but is instrumental to produce a reasonable and realistic long-term estimate for the survival probabilities.

The blended survival curve is simply obtained as

S_{ble} (t | θ) = S_{obs} (t | θ_{obs})^{1 - π (t; α, β, a, b)} \times S_{ext} (t | θ_{ext})^{π (t; α, β, a, b)}

(1)

where $θ = {(θ_{o b s}, θ_{e x t}, α, β, a, b)}^{⊤}$ is the vector of model parameters. Here, $π (\cdot)$ is a weight function that controls the extent to which the 2 survival curves $S_{obs} (\cdot)$ and $S_{e x t} (\cdot)$ are blended together. Technically, we define $π (\cdot)$ as the cumulative distribution function of a Beta random variable with parameters $α, β > 0$ , evaluated at the point $(t - a) / (b - a)$ :

π (t; α, β, a, b) = \Pr (T \leq \frac{t - a}{b - a} | α, β) = F_{Beta} (\frac{t - a}{b - a} | α, β),

for $t \in [0, T^{*}]$ , where $T^{*}$ is the upper end of the interval of times over which we want to perform our evaluation. This means that the weighting function $π (\cdot)$ varies over the time horizon, which in turn allows us to give different weights to the 2 components at different times $t$ . The range $[a, b] \in (0, T^{*})$ is the blending interval, that is, a subset of the lifetime horizon in which $S_{obs} (\cdot)$ and $S_{ext} (\cdot)$ are blended into a single survival curve.

Figure 2 depicts this process graphically. In this case, we assume that the trial data span over the interval $[0, a]$ , which we label in the graph as the “Follow-up.” The dashed curve is the KM estimate of the observed data (for simplicity, but without loss of generality, we consider here a single arm). The green curve indicated as $S_{obs}$ results from a suitable model fitted to the observed data, in order to capture the known features of the data generating process almost to perfection—as is possible to appreciate in the graph, the KM curve is basically identical with the model obtained with $S_{obs}$ .

Figure 2

Graphical representation of the blended curve method. The whole time-horizon is partitioned into 3 parts: Follow-up, Blending interval, and Long-term. The blended survival is equivalent to the model fitted to the short-term data (purple Kaplan-Meier curve) within the Follow-up period (green curve), then gradually approaching the external estimate in the Blending interval (red curve), and eventually consistent with the expected behavior (blue curve) in the Long-term. The black point in the Long-term is an example of external information about 10% expected survival at the 13 y from experts.

The blue curve, indicated as $S_{ext}$ , should be used to give information about the expected long-term behavior of the survival process. While it may be difficult to directly access hard data, as discussed in the “Unobserved Time Period: Extrapolation Using External Data” section, we and others^15,34 argue that it is often possible and generally desirable to so. For example, we may have individual-level data from a registry based on a drug with a similar mechanism to the one of interest or perhaps we have elicited clinical knowledge or expert opinion to identify that survival at a certain time point is not expected to exceed a certain threshold and we can use this information to constrain $S_{ext}$ to conform with this expectation. Notice in particular that $S_{ext}$ can deviate substantially from the observed data, as shown in Figure 2.

In summary, $S_{ble} (\cdot)$ is constructed as a combination of $S_{obs} (\cdot)$ and $S_{ext} (\cdot)$ so that:

Between times 0 and $a$ , $π (\cdot) = 0$ , which means that the long-term extrapolation has no influence. Since this is the trial follow-up, the observed data should be described as best as possible, as obtained by $S_{obs} (\cdot)$ .

Between times $b$ and $T^{*}$ (set to 20 in the example shown in Figure 2), $π (\cdot) = 1$ , which means that it is the long-term extrapolated survival curve from the observed data to bear no weight whatsoever. Again, we do this because, given the heavy censoring, the resulting extrapolation is most likely a gross overestimation.

Between times $a$ and $b$ , the 2 curves merge into one another, according to the process characterized by the weight function $π (\cdot)$ . In the blending interval, both curves influence the resulting blended survival curve, which gradually abandons the extrapolation from the observed data (thus avoiding issues with the inherent overfitting and unrealistic estimates) and merges into the long-term extrapolation from the external evidence.

We can control the rate at which the blending process occurs by using specific values for the parameters ( $α, β)$ of the relevant Beta distribution. Given the same blending area, different values of parameters for ( $α, β)$ will provide distinct slopes, influencing the speed of the blending process. For example, in Figure 3, in the same interval $(a = 3, b = 13)$ , the blue curve $(α = 2, β = 5)$ is steeper than the red one $(α = β = 3)$ , which implies that the blending trend of the former is faster and the impact of $S_{ext}$ would be relatively greater at the same point in time (along the x-axis). Overall, the slope of the weight curve in the situation that $α < β$ is larger than when $α \geq β$ .

Figure 3

Graphical examples of the weight function $π (t; α, β, a, b)$ . The gray area $[a, b]$ is the blending intervals $[3, 13]$ for both weight curves. The initial slope of the red curve ( $α = 3$ , $β = 3$ ) is smaller than the blue one ( $α$ =2, $β = 5$ ), which means the former blending rate is slower than the latter one.

Different assumptions about how quickly the treatment effect might wane can be easily examined by adjusting the choice of parameters regarding the weight function $π (t)$ as a part of sensitivity analysis. For example, if the observed treatment effect is assumed to persist over the whole horizon, we can set value of $a$ equal to the point $T^{*}$ , in which case the blended curve is the same as the observed one over entire time frame.

Note also that our method is fundamentally different from well established mixture cure models (MCMs³⁵). In the MCM case, it is assumed that the observed trial data correspond to a mixed survival curve resulting from the experience of 2 subgroups (“cured” v. “noncured” patients). Conversely, we model 2 components, $S_{obs}$ and $S_{ext}$ , independently within the blended process, respectively, based on observed data and external evidence. Importantly, values for $π (t)$ are provided externally and could be modified on demand. We return to this important distinction in the “Discussion” section.

Blending Hazard Functions

By simply rescaling Equation 1, our method can also be expressed in terms of hazard functions. This is helpful because hazard plots often aid understanding of long-term survival mechanism and provide useful insights into suitable model selection.² Specifically, the blended hazard rate $h_{b l e} (t)$ can be characterized by 3 components: the weighted hazard rates from 2 survival curves $h_{obs} (t)$ and $h_{ext} (t)$ and an extra term related to the weight function and cumulative hazard. Then, we can re-express Equation 1 equivalently as

\begin{matrix} h_{ble} (t) = [(1 - π (t)) \times h_{obs} (t)] + [π (t) \times h_{ext} (t)] \\ + [\frac{f_{Beta} (\frac{t - a}{b - a})}{b - a} \times (H_{ext} (t) - H_{obs} (t))], \end{matrix}

where $f_{Beta} (\cdot)$ denotes the density function of a Beta random variable, associated with the weight function $π (\cdot)$ , while $H_{ext} (t)$ and $H_{obs} (t)$ are the cumulative hazard rates from the 2 underlying survival curves, respectively.

The hazard function depends on the same subset of parameters as the corresponding survival functions. Given the properties of the Beta distribution, $f_{Beta} (\cdot)$ supports only the blending interval (i.e., $[a, b]$ ) but is zero otherwise. Since $π (t)$ is 0 in $[0, a)$ and 1 in $[b, T^{*}]$ , it is easy to show that the blended hazard $h_{ble}$ is equal to the observed estimate $h_{obs} (t)$ at the beginning and to the external hazard $h_{ext} (t)$ in the long-term after time point $b$ .

The slope of the blended hazards as well as the location of the turning point can be determined by the value imposed for the parameters $(α, β)$ , enabling different assumptions on the underlying hazard rates to be tested. In this example, in which the external hazard is much greater than the observed hazard, if $α < β$ , there would be monotonic increasing hazard within the interval $[a, b]$ . Alternatively, if $α > β$ , a sharp increase would be followed by a steady decrease during the interval (red segment in Figure 4). This pattern allows the turning points beyond the observed period, and so it is likely that the form of the blended hazard is more flexible and realistic compared with standard parametric models.

Figure 4

Graphical representation of the blended hazard. For interval $[0, a]$ , the blended hazard is equal to the observed hazard ( $h_{obs}$ , green curve); then in the blending interval $[a, b]$ , there is a sharp increase followed by a steady decrease ( $h_{ble}$ , red curve). Eventually, it is consistent with the external hazard ( $h_{ext}$ , blue curve).

Technical Implementation

Observed Time Period: Best Fit to the Internal Data

Generally speaking, there is no restriction to the distributional assumptions used to model the observed data. With a view to providing the best fit possible and a good level of flexibility, here we recommend a Cox semiparametric model with piecewise constant hazards. We choose a Bayesian approach, which naturally allows the incorporation of external evidence and lends itself to the conduct of “uncertainty analysis.”^36,37 The R code is provided for the motivating case study, which is available at the GitHub repository (https://github.com/StatisticsHealthEconomics/blendR-paper).

To construct the model, we partition the time period into $K$ intervals, $0 = u_{0} < \dots < u_{K}$ , and assume the hazard $h_{0} (t)$ to be constant in each interval using $K$ parameters $λ_{1}$ , …, $λ_{K}$ . We set a random walk (RW) of order 1 (or 2) as the prior for $λ_{k}$ , which implies that the increments $Δ λ_{k} = λ_{k} - λ_{k - 1}$ (or $Δ^{2} λ_{k} = λ_{k} - 2 λ_{k - 1} + λ_{k - 2}$ ) are associated with a Gaussian distribution with zero mean and a common precision.³⁸

Note that, using this model, we can still extrapolate beyond the observed times using the RW structure. Obviously, in the presence of large censoring, the extrapolation is likely to be not credible, with substantial uncertainty around the average. This, however, is a minor concern in our modeling structure, because as time progresses outside of the blending interval, the extrapolation from the semiparametric component has increasingly low weight.

Of course, other choices are possible: we could select a parametric model (e.g., Weibull, Gompertz, or any other from the set suggested in various guidelines²); in reality, a flexible semiparametric model may not increase the computational complexity by a substantial amount, compared with alternatives such as Royston-Parmar splines³⁹ or fractional polynomials.⁴⁰ In addition, because of the blending process, we need to worry only about the performance of any model chosen in the follow-up period.

Unobserved Time Period: Extrapolation Using External Data

In the best-case scenario, long-term data can be accessed from a relevant study, possibly of an observational nature, such as a registry or a cohort study; this is naturally unlikely to contain direct information on the intervention under investigation from the trial data. But, perhaps, we may have information on drugs with similar mechanisms of action or tackling the same condition. In these circumstances, we could simply use the survival result from an appropriate model fitted to the relatively complete data externally or include additional assumptions such as a time acceleration adjustment to match the reconstructed external data to the trial data.¹² Whatever the distributional assumptions, we would be able to determine an estimate of the survival curve for the extrapolation period and then plug that into the blended model. Note that it is not simple to adjust the external population to accurately represent the trial population,¹⁵ so the blending procedure would allow further assumptions for the extrapolated curve, in which different weights could be given to the external component over time.

Unobserved Time Period: Extrapolation Using Expert Judgment

A more general situation, encountered in real-life applications, is when only tentative knowledge is available, typically in the form of expert elicitation. It is rather common for modelers to ask “key opinion leaders” for their assessment of the validity of a given extrapolation, perhaps in the form of plausible ranges or point estimates for the survival probabilities at given times. For example, experts may suggest that, given their clinical knowledge, the plausible interval of 10-y survival probability is between 10% and 30% or that no more than 5% of participants would survive beyond 15 y. We thus need to map those numerical estimates onto a suitable model and construct a representative curve of the external information.

Elicitation of survival estimates could be expressed as the expected number of individuals who, in a population of a given size, will survive at the specific point; for instance, 20% survival at 10 y could be interpreted as “20 in 100 patients would survive beyond 10 years.” We could translate the clinical constraint into an artificial data set and then use the standard method to analyze the pseudo data. Given that 80% of time-to-event data should be shorter than 10 y, we could use a uniform distribution with boundaries 0 and 10 to generate the individual survival times, because there is no knowledge or assumption about the time-to-event outcome in this synthetic data set. To build up survival outcomes for the remaining 20% of the population who would survive at least 10 y, it is essential to determine a maximum lifetime $T_{\max}$ beyond which no patient would be expected to be alive, and then similarly, the survival times should be samples of the uniform distribution ranging from 10 y to $T_{\max}$ . Of course, other processes of simulation of the underlying time-to-event data may be selected, as long as the soft constraints hold and the resulting long-term extrapolation is justifiable.

Figure 5 (top plot) illustrates the above example, and the synthetic data set consists of the 2 groups of time-to-event data $t_{1}$ and $t_{2}$ , in which all the event indicators are equal to 1 as they are assumed to be fully observed. The dashed curve in Figure 5 (bottom plot) shows the KM estimate for the synthetic data set with 100 individuals. The choice of the sample size is directly related to the level of implied uncertainty on the external information. If clinicians/experts are not very certain about their elicitation, the sample size of the artificial data set should be reduced, which would lead to a wider 95% interval around the point estimate. When the data set is constructed, the process is similar to the one used for direct “hard” external data. In Figure 5, fitting a Gompertz distribution appears to perform well, and the blue curve is fully reflective of the expert opinion at 10 y. Using a Bayesian approach implies that we can naturally characterize the underlying uncertainty in the survival curves.

Figure 5

Graphical representation of constructing external survival curve (based on the subjective opinion). The top plot illustrates mechanism of generating the artificial data set, and the bottom plot is the Gompertz model fitted to the synthetic data set. The elicitation is only at 1 time point: 20% expected survival at 10 y.

This simple case only considers 1 time constraint, but more importantly, the process would be essentially identical and easy even if there are multiple elicited time points. Given more information about several time points, it is required only to partition a whole time horizon into 3 or more portions while constructing the external data set; other than that, all procedures should be consistent. The method of using an artificial data set enables a range of possible constraints to be flexibly considered. Moreover, based on more externally specific details, the resulting curves can align more closely with substantive expert beliefs.

Results

Interim Analysis for Observed Time Period

The piecewise constant hazard model in the Bayesian framework provided a good fit to the observed data (with 8 intervals over the 4-y follow-up; green curve in Figure 6). As is known, a greater number of intervals might lead to lower deviance (better fit); however, in this particular case, no meaningful improvement was seen by increasing the number of intervals. Notice that, unsurprisingly, the extrapolation from the model is not reasonable, as it implies artificially and unrealistically large survival probabilities at the end of the follow-up period.

Figure 6

Blended survival curve based on short-term data and external information for the FCR arm. The digitized data from CLL-8 are updated with longer follow-up until 96 mo (purple dotted line). (a) FC arm. (b) FCR arm.

External Curve with Expert Information

Given the relatively strong opinion that approximately 1.3% of the cohort would be alive beyond 15 y, we construct a synthetic data set with 300 participants, in which no more than 4 individual times are longer than 15 y (180 mo). We can experiment with different sample sizes (in our case, we used a number of scenarios, with the sample size ranging from 10 to 500) to get a better sense of the implied uncertainty around the resulting survival curves.

Among the candidate parametric models, the Gompertz distribution fits the external data very well, describing the belief specified above accurately. In a real-life case, the experts and modelers should be able to defend this assumption, in the absence of hard evidence to justify it. We note, however, that this process happens irrespective of the modeling strategy chosen; in our case, we make it in a way that does not affect directly the fit to the observed data.

Blended Estimate Compared with Updated Data from the CLL-8 Trial

Figure 6 shows the blended survival curve driven by the internal and external curve and 95% interval estimates around the average curves over the whole time horizon. Under the Bayesian framework, the interval estimates are simulations generated from the posterior distribution as the probabilistic sensitivity analysis. Without any further information about the blending process, we assume a constant rate over the blending interval, based on a linear weight function with $α = β = 1$ . On account of the only elicited time point at 15 y, we identify the blending interval from the end of follow-up (4 y) to the end of the modeling horizon.

When compared with a later data cut for the CLL-8 trial until 96 mo,⁴¹ the blended survival curve after 48 mo is generally very close to the updated data. Unsurprisingly, the observed survival without external information overestimate the longer value, 40% higher than the updated result.

Discussion

There is a growing need to improve extrapolation of immature survival data when interim analysis is frequently carried out in the context of accelerated regulatory approvals. A short duration of follow-up is often subject to a substantial amount of censoring, which can lead to implausible extrapolations with conventional approaches based only on observed data. In addition, innovative cancer drugs are evaluated on the back of limited information, because no alternative treatment is available as a viable option for patients affected by a specific disease. To obtain credible estimation of overall survival gains, it is essential to relax the traditional PH assumption and supplement the external information to guide the extrapolated curve. In this article, we have introduced an innovative approach based on blended survival curves as a possible solution to these issues in the extrapolation.

In the cases in which the hazard early on is unlikely to reflect the long-term behavior, our blended approach enables the extrapolated survival to be less and less affected by the short-term data as time progresses. Long-term outcomes would be dominated by the external information. Providing a best fit to the observed data, the blended curve would gradually approach the prediction derived from the external sources over the extrapolated period. In the blending interval, time-specific weights are allocated to the observed and external survival to allow for varying proportions of the 2 components contributing to the overall estimate in the course of time, which largely differs from the other mixture models with time-independent mixtures over time. As mentioned in the “Blended Survival Curve Methodology” section 2, a mixture cure model also consists of 2 components (survival profiles of cured and uncured patients) but is distinct because it assumes a constant weight, namely, the proportion of cured patients, through the entire time range.⁴² The mixtures are time independent, and cure fractions—as well as the survival of uncured patients—often rely solely on short-term data.⁴³ Conversely, weight functions, together with external projections in the blended curves, would be governed by information outside an RCT. Finally, MCMs are based on the assumption that the underlying data-generating process gives rise to a single survival mechanism that is a combination of 2 subgroups; in our case, we explicitly consider 2 separate processes (the short-term and long-term survivals) and ensure that extrapolation from the former is anchored in a principled and flexible way to the latter.

With the use of external sources, our novel method allows turning points in the extrapolated hazard, which may provide a more flexible and realistic shape beyond the trial period. By adjusting relevant parameters of the weight function, the blended procedure permits nonmonotonic hazards (as shown in Figure 4) that might be more practical in the extrapolation. For example, if a trial period ends with a low but increasing hazard, there could be several turning points over time, such as a temporary decrease due to the long-term survivors then a following increase due to aging effects.⁴³ Although the flexible parametric models, such as splines or fractional polynomials, can also capture a complex hazard function, a turning point cannot be generated in the posttrial period, and the monotonic hazard based on the final observed segment is likely to be undesirable without external data.

It is important to identify appropriate external information to facilitate the extrapolation. A key assumption in the blended method is that from a specific time point, the extrapolated survival is consistent with the estimate from external evidence. Before using potential data from other sources, researchers should examine if the external population matches some characteristics for the patients of interest and have equivalent mortality in the long-term. Conveniently, there is one advantage to the blending process that no adjustment would be required, even if 1:1 matching between 2 sources were unavailable. The matching procedure is replaced by the blending process.

Expert opinion as a kind of subjective information is frequently used for model validation rather than formal incorporation in the modeling. However, some research indicated the potential benefits of formally integrating expert opinion to aid the long-term extrapolation,¹⁵ especially in situations in which no access is given to the patient-level external data. Therefore, we focus on more general cases such that only expert/clinical subjective beliefs are available in the long-term. Experts may have some knowledge about the likely values or plausible ranges of survival in the future according to the trial data and their experience. Different from expressing the evidence via informative priors of relevant parameters, our approach translates the beliefs about long-term survival to a representative curve by interpreting the elicitation as an artificial data set and then using standard methods to analyze it. Meaningfully, the number of elicited time points is not limited and depends completely on the clinicians. Obviously, the curve would be closer to what the expert believes if more elicited information are collected. The procedure is simple and straightforward, yet the expert-based survival estimates are inherently subjective and might be limited in scope, which means attention should be given to the selection of more appropriate knowledge if possible.

In the absence of long-term data within a trial itself, scenario-based sensitivity analyses should be performed for uncertain assumptions of the extrapolations. Uncertainty of the underlying evidence may have a large impact on the prediction. It may be worthwhile to test a range of plausible scenarios about the future trend, especially when integrating limited or conflicting elicitation into the extrapolations.³² This implementation is not hugely complicated, in which the modeler simply changes the values of the parameters associated with the blended model for defensible circumstances. In the extrapolated period, they can select suitable values (e.g., plausible ranges) of survival at multiple time points and flexibly determine the number of the elicited points and their locations. Besides, the blending operation, including the interval and the rate, can be characterized by the weight function if there is any available knowledge about biologically plausible shapes for the extrapolated hazard. A web-based application is being developed to aid the elicitation process, in which immediate outcomes (i.e., survival and hazard plots) would help the experts to obtain reliable estimate.

Due to the lack of long-term data, extrapolation is always going to be a problem, which largely involves the subjective modeling assumptions. Crucially, we believe that the blending method allows to shift elements of subjective assumptions away from the extrapolation derived from the observed data, although the blending operation cannot necessarily avoid the issue of subjectivity. It is our view that untestable assumptions are all but unavoidable in the range of survival models that are relevant in HTA. The blending procedure attempts at recognizing and embracing this feature by providing a simple and powerful framework for its incorporation and evaluation on the model fit as well as on the long-term economic outputs.

There is no restrictive technical implementation for modeling the observed data. The piecewise Cox model is recommended due to the potential advantage of extremely good fits to the data without substantial computation required. Under a Bayesian framework, it allows a high level of flexibility and does not bring extra complexity, compared with spline or fractional polynomial models. Furthermore, the PH assumption is not necessary, as a stratified version of the Cox model exists in which we can control for covariates that do violate the assumption by stratifying, effectively creating many versions of the baseline hazard.⁴⁴

Current implications focus on the absolute effectiveness of treatments in a trial; however, decision making requires a combination of different trials that compare multiple treatments with the relative effectiveness of interest. In fact, it is not difficult to implement the blended approach into network meta-analysis with a hierarchical structure that synthesizes all direct and indirect evidence across trials.⁴⁵ The mechanism of separately estimating observed and external hazards achieves flexibility in a simple way, and obtaining the blending result is not complex given a weight function identified by the experts. It is possible to apply a common weight function with consistent values for the relevant parameters to all treatment arms or alternatively to consider different choices for each specific treatment if justified information is available externally. For the trial data, the Cox model would be beneficial, providing the same structure as other studies based on the PH assumption in the network. Moreover, the interpretation of the parameters in the Cox model is explicit. As the piecewise exponential model does not add much computation under the Bayesian framework, the implementation of the blending approach is less computationally intensive, and therefore, time consumption is probably less than that of alternative flexible models.

Conclusion

Long-term extrapolation entirely driven by the immature trial data is highly unreliable, and varying assumptions of the treatment effect can have a great impact on the survival estimate. To improve the credibility of the prediction, the blended survival curve method allows the extrapolation to take advantage of external knowledge that manufacturers might have in form of hard data or just elicited belief from clinical experts. The formal inclusion of external evidence considers a variety of available sources, especially the subjective opinion that is more common in reality. Therefore, not only internal but also external validity can be fully taken into account for the survival model. Considering a range of easily plausible scenarios, the blended approach provides a simple and robust framework to ensure sufficient flexibility for the long-term survival estimate.

Footnotes

Acknowledgements

Dr Nathan Green is partially funded by a research grant sponsored by ICON. Additionally, the authors would like to thank Prof Håvard Rue and Victoria Federico Paly for helpful discussion on the methodology and two anonymous reviewers for their helpful comments.

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Financial support was provided by a joint doctoral training grant from the University College London and China Scholarship Council (No. 201908060054). Zhaojing Che was funded by the University College London and China Scholarship Council (UCL-CSC) joint research grant. No other conflicts of interest were disclosed.

Author’s Note

Parts of this work have been presented at the conference International Society for Pharmacoeconomics and Outcomes Research, Europe 2021 (virtual), and the abstract has been published in Value in Health, January 2022 ().

ORCID iD

Zhaojing Che

Availability of Data and Materials

Data and the annotated R code of the blended survival method for the motivating case study are available at the GitHub repository: .

References

Tai

Latimer

Benedict

Kiss

Nikolaou

Prevalence of immature survival data for anti-cancer drugs presented to the National Institute for Health and Care Excellence and impact on decision making. Value Health. 2021;24(4):505–12.

Bell Gorrod

Kearns

Stevens

, et al. A review of survival analysis methods used in NICE technology appraisals of cancer treatments: consistency, limitations, and areas for improvement. Med Decis Making. 2019;39(8):899–909.

Ball

Levine

Thabane

Tarride

JE.

Onwards and upwards: a systematic survey of economic evaluation methods in oncology. Pharmacoecon Open. 2021;5(3):397–410.

Kearns

Stevens

Ren

Brennan

How uncertain is the survival extrapolation? A study of the impact of different parametric survival models on extrapolated uncertainty about hazard functions, lifetime mean survival and cost effectiveness. Pharmacoeconomics. 2020;38(2):193–204.

Benedict

Muszbek

Perampaladas

Survival modelling in UK oncology technology appraisals since the publication of good practice guidelines. In: The Evidence Forum. Bethesda (MD): Evidera; 2018.

Gallacher

Auguste

Connock

How do pharmaceutical companies model survival of cancer patients? A review of NICE single technology appraisals in 2017. Int J Technol Assess Health Care. 2019;35(2):160–7.

Everest

Blommaert

Chu

Chan

Parmar

Parametric survival extrapolation of early survival data in economic analyses: a comparison of projected versus observed updated survival. Value Health. 2021;25(4):622–9.

Bullement

Latimer

Gorrod

. Survival extrapolation in cancer immunotherapy: a validation-based case study. Value in Health. 2019; 22(3):276–283.

Jackson

Sharples

Thompson

SG.

Survival models in health economic evaluations: balancing fit and parsimony to improve prediction. Int J Biostat. 2010;6(1):Article 34.

10.

Kearns

Stevenson

Triantafyllopoulos

Manca

Comparing current and emerging practice models for the extrapolation of survival data: a simulation study and case-study. BMC Med Res Methodol. 2021;21(1):1–11.

11.

Bullement

Kearns

Incorporating external trial data to improve survival extrapolations: a pilot study of the COU-AA-301 trial. Health Serv Outcomes Res Methodol. 2022;22:317–31.

12.

Vickers

An evaluation of survival curve extrapolation techniques using long-term observational cancer data. Med Decis Making. 2019;39(8):926–38.

13.

Rutherford

Lambert

Sweeting

, et al. Flexible Methods for Survival Analysis. NICE DSU technical support document 21. London: NICE; 2020.

14.

Gibson

Koblbauer

Begum

, et al. Modelling the survival outcomes of immuno-oncology drugs in economic evaluations: a systematic approach to data analysis and extrapolation. Pharmacoeconomics. 2017;35(12):1257–70.

15.

Jackson

Stevens

Ren

, et al. Extrapolating survival from randomized trials using external data: a review of methods. Med Decis Making. 2017;37:377–90.

16.

Soikkeli

Hashim

Ouwens

Postma

Heeg

Extrapolating survival data using historical trial–based a priori distributions. Value Health. 2019;22(9):1012–7.

17.

National Institute for Health and Clinical Excellence. Abiraterone for Castration-Resistant Metastatic Prostate Cancer Previously Treated with a Docetaxel-Containing Regimen. London: National Institute for Health and Clinical Excellence; 2012.

18.

Larkin

Hatswell

Nathan

Lebmeier

Lee

The predicted impact of ipilimumab usage on survival in previously treated advanced or metastatic melanoma in the UK. PLoS One. 2015;10(12):e0145524.

19.

National Institute for Health and Clinical Excellence. Rituximab for the First-Line Treatment of Stage III-IV follicular lymphoma. London: National Institute for Health and Clinical Excellence; 2012.

20.

National Institute for Health and Clinical Excellence. Bortezomib Monotherapy for Relapsed Multiple Myeloma. London: National Institute for Health and Clinical Excellence; 2007.

21.

National Institute of Health and Care Excellence. Pemetrexed for the Treatment of Malignant Pleural Mesothelioma. London: National Institute for Health and Clinical Excellence; 2008.

22.

Kearns

Stevenson

Triantafyllopoulos

Manca

The extrapolation performance of survival models for data with a cure fraction: a simulation study. Value Health. 2021;24(11):1634–42.

23.

Felizzi

Paracha

Pöhlmann

Ray

Mixture cure models in oncology: a tutorial and practical guidance. Pharmacoecon Open. 2021;5(2):143–55.

24.

Guyot

Ades

Beasley

Lueza

Pignon

Welton

NJ.

Extrapolation of survival curves from cancer trials using external information. Med Decis Making. 2017;37(4):353–66.

25.

Benaglia

Jackson

Sharples

LD.

Survival extrapolation in the presence of cause specific hazards. Stat Med. 2015;34(5):796–811.

26.

Cope

Ayers

Zhang

Batt

Jansen

JP.

Integrating expert opinion with clinical trial data to extrapolate long-term survival: a case study of CAR-T therapy for children and young adults with relapsed or refractory acute lymphoblastic leukemia. BMC Med Res Methodol. 2019;19(1):1–11.

27.

Castro-Camilo

Huser

Rue

. Practical strategies for GEV-based regression models for extremes. arXiv preprint arXiv:210613110. 2021.

28.

National Institute for Health and Clinical Excellence. Rituximab for the First-Line Treatment of Chronic Lymphocytic Leukaemia. London: National Institute for Health and Clinical Excellence; 2009.

29.

Williams

Lewsey

Mackay

Briggs

AH.

Estimation of survival probabilities for use in cost-effectiveness analyses: a comparison of a multi-state modeling survival analysis approach with partitioned survival and Markov decision-analytic modeling. Med Decis Making. 2017;37(4):427–39.

30.

Williams

Lewsey

Briggs

Mackay

DF.

Cost-effectiveness analysis in R using a multi-state modeling survival analysis framework: a tutorial. Med Decis Making. 2017;37(4):340–52.

31.

Hallek

Fischer

Fingerle-Rowson

, et al. Addition of rituximab to fludarabine and cyclophosphamide in patients with chronic lymphocytic leukaemia: a randomised, open-label, phase 3 trial. Lancet. 2010;376(9747):1164–74.

32.

Latimer

NICE DSU technical support document 14: survival analysis for economic evaluations alongside clinical trials-extrapolation with patient-level data. Report by the Decision Support Unit; 2011.

33.

Roche Products Limited. Rituximab for the 1st Line Treatment of Chronic Lymphocytic Leukaemia. Roche submission to the National Institute for Health and Clinical Excellence; 2008.

34.

Demiris

Sharples

LD.

Bayesian evidence synthesis to extrapolate survival estimates in cost-effectiveness studies. Stat Med. 2006;25(11):1960–75.

35.

Lambert

Thompson

Weston

Dickman

PW.

Estimating and modeling the cure fraction in population-based cancer survival analysis. Biostatistics. 2007;8(3):576–94.

36.

Baio

Bayesian Methods in Health Economics. Boca Raton (FL): CRC Press; 2013.

37.

Baio

Dawid

AP.

Probabilistic sensitivity analysis in health economics. Stat Methods Med Res. 2015;24(6):615–34.

38.

Gómez-Rubio

Bayesian Inference with INLA. Boca Raton (FL): CRC Press; 2020.

39.

Royston

Parmar

MK.

Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Stat Med. 2002;21(15):2175–97.

40.

Royston

Altman

DG.

Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. J R Stat Soc Ser C (Appl Stat). 1994;43(3):429–53.

41.

Fischer

Bahlo

Fink

, et al. Long-term remissions after FCR chemoimmunotherapy in previously untreated patients with CLL: updated results of the CLL8 trial. Blood. 2016;127(2):208–15.

42.

Andersson

Dickman

Eloranta

Lambert

PC.

Estimating and modelling cure in population-based cancer studies within the framework of flexible parametric survival models. BMC Med Res Methodol. 2011;11(1):1–11.

43.

Latimer

Adler

AI.

Extrapolation beyond the end of trials to estimate long term survival and cost effectiveness. BMJ Med. 2022;1(1).

44.

Kleinbaum

Klein

, eds. The stratified Cox procedure. In: Survival Analysis. New York: Springer; 2012. p 201–40.

45.

Caldwell

Ades

Higgins

Simultaneous comparison of multiple treatments: combining direct and indirect evidence. BMJ. 2005;331(7521):897–900.

Blended Survival Curves: A New Approach to Extrapolation for Time-to-Event Outcomes from Clinical Trials in Health Technology Assessment

Abstract

Background

Objective

Method

Results

Conclusion

Highlights

Keywords

Motivating Case Study

Blended Survival Curve Methodology

Blending Hazard Functions

Technical Implementation

Observed Time Period: Best Fit to the Internal Data

Unobserved Time Period: Extrapolation Using External Data

Unobserved Time Period: Extrapolation Using Expert Judgment

Results

Interim Analysis for Observed Time Period

External Curve with Expert Information

Blended Estimate Compared with Updated Data from the CLL-8 Trial

Discussion

Conclusion

Footnotes

Acknowledgements

Author’s Note

ORCID iD

Availability of Data and Materials

References