Abstract
Introduction
Health technology assessment of histology-independent therapies (HITs) requires statistical methods that can capture heterogeneity in outcomes while allowing borrowing of information between tumor sites to inform cost-effectiveness analysis. In this study, we extend previous work on binary outcomes to the application of Bayesian hierarchical models (BHMs) for extrapolation of overall survival from pembrolizumab-treated patients with microsatellite instability-high/deficient mismatch repair solid tumors.
Methods
We considered BHMs based on 1- or 2-parameter distributions for extrapolation of survival outcomes. The scale or rate parameter of each model was assumed exchangeable among tumor types, and the shape parameter was assumed the same for all tumor types in the 2-parameter models. We compared overall survival (OS) and estimated mean survival time for each BHM with the corresponding nonhierarchical model.
Results
Extrapolated OS showed similar results between the BHM and standard models for colorectal, endometrial, and gastric cancers. Small intestine and biliary cancers showed higher OS estimates with a BHM than the standard models due to a combination of smaller sample sizes, information sharing in the BHM, and the use of a common shape parameter. Estimated mean survival times were similar between the BHM and equivalent standard model. However, the BHM showed reduced uncertainty in all cases.
Conclusions
We have demonstrated that BHMs provide a suitable framework to extrapolate time-to-event outcomes for HITs. The results provide extrapolated curves for OS that vary by tumor site, thus capturing and quantifying the inherent heterogeneity within the patient population. BHMs offer advantages in terms of reduced uncertainty around parameters that are often key drivers in cost-effectiveness analyses, such as estimated OS, through the borrowing of information between tumor sites.
Highlights
Bayesian hierarchical models (BHMs) reduced uncertainty in extrapolation of time-to-event outcomes for histology-independent treatments compared with nonhierarchical models fit to each tumor site.
Reduced uncertainty around the mean survival time is a key factor of cost-effectiveness analyses of histology-independent treatments.
BHMs provide a suitable framework for extrapolating histology-independent survival outcomes, effectively integrating prior knowledge and explicitly capturing heterogeneity between different tumor sites.
Keywords
Introduction
Recent advances in next-generation sequencing have led to the clinical development and regulatory approval of several new cancer therapies that treat patient populations defined by the presence of a genetic mutation or clinical biomarker, not the tumor site, type, or histology.1,2 In 2017, the US Food and Drug Administration granted the first approval for a histology-independent therapy (HIT) to pembrolizumab for the treatment of microsatellite instability-high (MSI-H) or deficient mismatch repair (dMMR) solid tumors. 3 Further HIT approvals by the US Food and Drug Administration followed for larotrectinib and entrectinib for the treatment of NTRK fusion-positive solid tumors. More recently, the European Medicines Agency made a similar positive recommendation for pembrolizumab in MSI-H or dMMR (MSI-H/dMMR) solid tumors while restricting the label to 5 specific tumor sites: colorectal, endometrial, gastric, small intestine, and biliary cancers. 4
Clinical trial design has had to adapt to support the clinical development of HITs, most commonly adopting a “basket trial” design in which patients are recruited only if they have a common genetic mutation or biomarker and in which each basket is defined by the underlying tumor location.5–8 The approval of pembrolizumab for MSI-H/dMMR tumors was informed by such a design, KEYNOTE-158 [NCT02628067], and also by a separate study of patients with colorectal cancer (CRC; KEYNOTE-164 [NCT02460198]), both in patient populations with MSI-H/dMMR tumors.9,10
Many of the characteristics of basket trial designs for HITs present challenges when considering health technology assessment (HTA), payer reimbursement, and, specifically, cost-effectiveness analysis. These challenges include a lack of comparator evidence within the trial, an absence of existing treatments applicable across the full histology-independent indication, the scarcity of previous published evidence specific to the targeted biomarker, small sample sizes in individual tumors, and the potential for heterogenous survival outcomes of different tumor types. Several published studies have investigated methods for deriving counterfactual survival outcomes, which include literature-based approaches, surrogacy assumptions, and intrapatient comparisons.11–13
The heterogeneity of clinical outcomes across different histological sites in the same trial is a key consideration in the analysis of data collected from basket trials. Previously, multicohort studies have considered heterogeneity to varying degrees. Hyman et al. 14 evaluated vemurafenib in multiple nonmelanoma cancers presenting with BRAF V600 mutations using a basket trial design that included 6 prespecified histologically defined cohorts as well as a seventh cohort permitting inclusion of any other tumor types. Results were reported separately, by cohort, as the authors suggest the basket trial design supports the identification of potential efficacy in multiple tumor sites while allowing for “the possibility that tumor lineage might influence drug sensitivity.” 14 Alternatively, clinical studies and corresponding HTA submissions of larotrectinib and entrectinib presented pooled results for all included histological sites.15–18 Murphy et al.19,20 suggested that both approaches have limitations.
Analyzing each basket separately fails to consider that patients with tumors at different sites may have similar outcomes when the intervention’s mechanism of action targets the same biomarker and the biomarker is prognostic. Ignoring potential commonalities between baskets amounts to a waste of valuable information, especially where several tumor types are often represented by small sample sizes due to the rarity of the selected biomarker. Complete pooling of results does the opposite, assuming that results are homogeneous. In the worst case, using the pooled treatment effect across all tumor sites may suggest a treatment is effective for some tumor types when it is not. More generally, complete pooling biases results for histological sites that are genuinely different from the pooled mean effect.
Reflecting these current limitations, HTA agencies are moving to include formal recommendations in their guidelines on methodological approaches to analyzing HIT data. In a recent update to their methods guidance, the National Institute for Health and Care Excellence explicitly refers to the use of basket trials for tumor-agnostic treatments. The National Institute for Health and Care Excellence advises that when basket trials are used, the analysis should include assessment of heterogeneity and allow borrowing of information between baskets.21,22
Bayesian hierarchical models (BHMs) have been suggested as a suitable statistical method that allows assessment of heterogeneity and borrowing information across baskets assuming a priori that effects are exchangeable and correlated while allowing outcomes for each tumor type to be similar but not identical.7,19 In this way, the specified model allows data from each basket to inform the modeled outcomes of other baskets, effectively shrinking the observed effect per tumor sites toward the pooled mean outcome. 19
Murphy et al. 20 presented a study using a BHM to capture heterogeneity in response rate for patients treated with larotrectinib. The results indicated considerable heterogeneity in response rate by histological site. They used a method that allows borrowing of information on the probabilities of response across baskets. This method increases the precision of estimates when compared with analyzing individual baskets separately while also reducing the chances of obtaining implausible estimates for tumor sites represented by few patients.7,8,17
To date, BHMs have not been applied to time-to-event outcomes for HITs in oncology. Extrapolation of survival outcomes is a key component for estimating the cost-effectiveness of these new treatments. In this study, we extend previous work by Murphy et al. 20 to the application of BHMs to model overall survival outcomes from pembrolizumab-treated patients in KEYNOTE-158 and KEYNOTE-164. The objective is to fit parametric survival distributions within a BHM framework to analyze, and extrapolate, time-to-event outcomes. We also present the results from standard, tumor-specific, nonhierarchical models that are widely used in HTA, 22 as well as 2 additional nonhierarchical frequentist model structures that were also explored. Lastly, we present the mean survival time from the BHMs and the nonhierarchical models as this metric forms the basis of cost-effectiveness analyses.
Methods
Analysis Datasets
The patient population with MSI-H/dMMR solid tumors included in the analysis reflects the approved European Medicines Agency label for this indication and the respective patient populations included in KEYNOTE-158 and KEYNOTE-164. 9 The approved indication for pembrolizumab in MSI-H/dMMR tumors includes a subset of the tumor types included in KEYNOTE-158 defined as “any other advanced solid tumor (with the exception of CRC) that are MSI-H/dMMR.” Therefore, only data from those tumor types were included in the analysis. Specifically, biliary (n = 22), endometrial (n = 83), gastric (n = 51), and small intestine cancers (n = 27) were included.
Data for patients with CRC came from the separate KEYNOTE-164 study. Patients in KEYNOTE-164 were divided into 2 cohorts based on the prior therapy received. Cohort A included those who had received fluoropyrimidine, oxaliplatin, and irinotecan with or without anti–vascular endothelial growth factor or epidermal growth factor receptor monoclonal antibodies (cohort A, n = 61). Cohort B included those who had received 1 or more prior lines of systemic therapy (cohort B, n = 63). 10 The analysis included data from both cohorts (n = 124) consistent with data used for HTA submissions.
Survival Models
BHMs assume that outcomes are similar, but not identical, between tumor types, and the tumor types do not determine a particular ordering of effectiveness a priori (i.e., technically, the baskets are “exchangeable”). 19 BHMs capture the heterogeneity between tumor sites and allow information to be borrowed between groups or “baskets” using shared parameters, which are modeled using a probability distribution. The hierarchical structure allows “partial pooling,” meaning that the underlying distributions used to model the sampling variability are characterized by both tumor-specific and overall parameters. These are often termed “random effects” and “fixed effects,” respectively, although perhaps “structured” and “unstructured” effects is a better terminology from a Bayesian perspective.
We considered BHMs based on a set of 1- or 2-parameter distributions widely used for extrapolation of survival outcomes in cost-effectiveness analysis, namely, exponential, Weibull, Gompertz, log-normal, and log-logistic. The considered models are characterized by a sampling distribution
The final model included as covariates age (centered around the mean), gender, Eastern Cooperative Oncology Group score, cancer stage, and the number of prior lines of therapy. These covariates thought to be the most important prognostic factors were selected based on expert clinical opinion. The parametrization of the covariates, and the full parametric form of each distribution, are provided in the supplementary information. We chose weakly informative priors for the model parameters. For the shape parameter
Bayesian Hierarchical Model Prior Distributions a
In this notation,
The priors were chosen to have standard parametric form while not being too restrictive.23,24 For
Bayesian inference was performed using Stan, a probabilistic programming language that uses Hamiltonian Monte Carlo to obtain samples from the true posterior distribution over basket-specific parameters. We accessed Stan from R via the RStan package. 26 We used 3 chains with 10,000 iterations, 3,000 of which were used for the warmup (and therefore excluded from the final posterior estimate).
Using the same set of distributions as listed above, standard nonhierarchical survival models were also fitted separately for each tumor site, with a frequentist maximum likelihood method using the R flexsurv package.
The choice to employ nonhierarchical frequentist models in this study, alongside BHMs, was driven by a deliberate methodological consideration. While BHMs offer the advantage of pooling information across different tumor types, nonhierarchical frequentist models represent one of the most straightforward and commonly employed approaches by HTA agencies. In addition to standard nonhierarchical frequentist survival models fitted separately for each tumor site, we considered 2 additional nonhierarchical frequentist model structures that differ in whether they share 0 or 1 parameter(s) between histological sites. The first used a different scale and a common shape term for each site, and the second used a different scale and a different shape term for each site, with the former being a fixed-effects analogue to the BHM.
For each model, the goodness of fit to the observed data was assessed 1) visually, by overlaying the parametric curve on the observed Kaplan–Meier curve; 2) statistically, via model comparison statistics (deviance information criterion for Bayesian models, Akaike information criterion and Bayesian information criterion for frequentist models); and 3) validation with clinical expert opinion. 27 To illustrate how the standard parametric and Bayesian approaches would influence uncertainty in a cost-effectiveness analysis, the mean survival time is presented using the mean and 95% intervals from 5,000 posterior samples over a 40-y time horizon.
Results
The observed overall survival data for pembrolizumab treated MSI-H/dMMR solid tumors show several interesting features (Figure 1). First, endometrial, gastric, small intestine, and biliary cancers all show a change in the event rate after approximately 1 y. The event rate is higher in the first year but then lower in years 2 to 6, as shown by the flattening of the Kaplan–Meier curve and evidence of a plateau for endometrial, gastric, and small intestine tumors.

Overall survival for pembrolizumab-treated MSI-H/dMMR solid tumors by tumor type.
The BHMs with a common shape and different scale terms were compared with additional nonhierarchical standard parametric models. Survival extrapolations for these additional nonhierarchical frequentist model structures are shown in the supplementary information (Supplementary Figure 7 and Figure 8). Of the nonhierarchical models, standard parametric models with a common shape and different scale terms for each site had the lowest Akaike information criterion and Bayesian information criterion (Supplementary Table 5), consistent with our approach to the BHMs. Therefore, we proceeded to evaluate BHMs with a common shape and different scale terms along with nonhierarchical standard parametric models fitted to each individual tumor type, as the latter approach is standard within HTA submissions.
The log-normal model, with the lowest deviance information criterion among the different BHMs (Supplementary Table 3) had similar survival extrapolations as the corresponding standard parametric models for CRC, endometrial, and gastric cancer (Figure 2). There were differences in the log-normal extrapolation for small intestine and biliary cancers, with higher estimates of overall survival observed with the BHM in both cases. Sensitivity analyses for the BHM varying the standard deviation of hyperparameter

Long-term extrapolation of overall survival for the log-normal model.
The differences in the shape of the extrapolated log-normal survival curves may be influenced by several factors including the assumption of a common shape parameter between tumor types in the BHM. The shape parameter for the log-normal distribution, the natural log of standard deviation, estimated from the BHM is very close to the independent estimates for CRC, endometrial, and gastric cancers, but the uncertainty in the parameter estimate is reduced in the BHM. The independent estimates for small intestine and biliary cancers are notably smaller and more uncertain than in the BHM estimate (Figure 3).

Comparison of the common shape parameter (Ln[SD]) of the log-normal model in the BHM with independent estimates by tumor type.
Survival extrapolations for alternative distributions and statistical fits for the standard parametric models are shown in Supplementary Table 4, Figure 5, and Figure 6. The main goal of modeling survival outcomes is to estimate the mean survival time within a cost-effectiveness framework as part of the HTA submission process. The point estimates of mean survival time are similar between the BHM and the standard model for any given parametric survival distribution. However, the estimated uncertainty from the BHM is typically smaller (narrower 95% credible interval [CrI]) than the estimate from the corresponding standard model (wider 95% confidence interval [CI]), owing to the sharing of information between tumor types (Figure 4).

Estimated mean survival time with 95% CrI/CI using Bayesian hierarchical models and standard parametric models.
The effect of information sharing on uncertainty is generally smaller for tumors with larger sample sizes and greater for those with smaller sample sizes. For example, in CRC, which has the largest sample size (n = 124), the difference in uncertainty estimates between the BHM and standard models is small. In tumors with smaller sample sizes, such as endometrial (n = 83), gastric (n = 51), and small intestine (n = 27), the difference is more pronounced. Interestingly, in biliary cancer, which had the smallest sample size (n = 22), the uncertainty estimates for each model type are similar, but the BHM gives a higher estimate of the mean survival time for all distributions except Gompertz.
The standard parametric Gompertz model consistently showed a much larger estimate of mean survival time, with a wide 95% CI compared with the Bayesian hierarchical Gompertz model. This difference can also be seen in the extrapolation of overall survival (Supplementary Information Figure 6). For all other distributions, the comparison of BHM and standard parametric models showed the same patterns described above with similar point estimates between models but reduced uncertainty in the BHM.
Discussion
In this study, we compared parametric survival models implemented within a BHM framework with nonhierarchical models fitted independently to each tumor site. Within the BHM framework, we implemented 1- and 2-parameter hierarchical survival models using a common shape parameter for all tumor sites and a shared rate parameter that was influenced by both clinical covariates and tumor-specific parameters to capture heterogeneity in observed outcomes between tumor sites. The main difference between the BHM and standard parametric models was in the estimation of long-term uncertainty including mean survival time. The point estimates of mean survival time were similar for both the BHM and the standard model for a given distribution. However, the BHM showed reduced uncertainty, as illustrated by the narrower 95% CrI, compared with the 95% CI from the corresponding standard model.
The log-normal BHM showed similar point estimates of the extrapolated survival probability to the corresponding standard parametric model for CRC, endometrial, and gastric cancer. For small intestine and biliary cancers, the BHM gave higher estimates of the survival probability than the standard model did with a less convincing visual fit to the Kaplan–Meier curve. This may be partly explained by the difference in sample size between tumor sites and the sharing of information between sites through the hierarchical structure of the BHM. Small intestine and biliary cancers have the smallest sample sizes and therefore may be more influenced by the borrowing of information and the assumption of a common shape parameter between tumor types used in the BHM. That is, the shape of the curve for small intestine and biliary cancers within the BHM framework is affected by the data for CRC, endometrial, and gastric cancers. For these tumor sites with small sample sizes relative to other baskets in the same trial, a better fit to the observed data might be obtained by fitting independent models at the expense of increased uncertainty; however, the goal is to obtain the best estimate of the survival function and may also involve leveraging information from other tumors. This can be understood as a Bayesian analogue of the optimal frequentist bias-variance tradeoff.
Further research is required that investigates the specification of partially exchangeable or nonexchangeable covariate structures that may mitigate the sensitivity of the survival extrapolations to the borrowing of information and sample size between tumor sites. In all cases, model specification should be based on the clinical plausibility of the inherent assumptions of HITs, in addition to statistical theory.
Given the uncertainty of outcomes for tumor sites represented by small patient numbers, the borrowing of information can be considered a strength in that the BHM makes it less likely to produce extreme and implausible extrapolations for baskets informed by small patient numbers. However, it can also be a limitation when the number of baskets is small and there are large differences in sample size informing each basket. In this study, a weakly informative prior distribution with large variance was used for the tumor-specific parameters to reflect the assumption that survival outcomes were exchangeable between tumor sites. Alternative BHMs may be specified where information is shared between all tumor sites, between some tumor sites but not others, or with no other tumor sites at all, which is an area for potential future work.19,28,29
It should be noted that the exchangeability assumption of an HIT BHM is not arbitrary. The fundamental concept of an HIT is that there is some similarity in outcomes for patients who share a common genetic marker such as MSI-H/dMMR. If MSI-H/dMMR status were perfectly prognostic of outcome, then complete pooling of data from all tumor sites would be a reasonable assumption. Alternatively, if outcomes were completely independent of MSI-H/dMMR status, then analyzing each tumor site independently would be an appropriate approach. Both extremes seem unlikely in practice, and assuming some degree of exchangeability between tumor sites appears reasonable. Exchangeability assumptions generally require careful assessment of both statistical considerations and clinical factors and must be addressed on a case-by-case basis. 30
In this study, we chose a single parametric function to represent all tumor sites. In circumstances in which the shape of the hazard function is expected to be substantially different between histological sites, it may be challenging to specify any single model that appropriately combines data from multiple tumor sites. For example, endometrial, gastric, and small intestine cancers all show evidence of a plateau in the Kaplan–Meier curve after 1 to 2 years of follow-up. It may be that standard parametric distributions are not flexible enough to model the observed hazard function across multiple tumor sites in cases in which patients treated with pembrolizumab show sustained long-term survival.
Standard parametric models that pool the data with a random effect shape term and different scales were not considered beyond exploring statistical fit. While this approach has similar advantages to the Bayesian approach, allowing borrowing of information across groups, the reliability of this approach can still be compromised by very small sample sizes: our smallest sample size includes only 22 participants.
We considered only 1- or 2-parameter models in this study. The 3-parameter generalized gamma model is also widely used in survival extrapolation and may offer some additional flexibility, with additional exchangeability assumptions on the third parameter. In principle, mixture cure models or flexible survival models, such as splines, could be used and may theoretically improve the fit to observed data. 31 However, such models would also require additional assumptions about the exchangeability of the extra parameters. For cure models, it seems unlikely that different tumor sites would all have the same cure fraction; therefore, some form of exchangeability or even independence of the cure fractions would have to be specified. Whether such models could be reliably estimated will depend on the data available. The application of alternative survival models within a BHM framework is an area for further research.
Our target of estimation, or estimand, is the survival of a typical participant. For technical reasons (related to the noncollapsibility of the hazard function), this is different from the average survival over all participants, also known as the marginal survival. Arguably, the marginal estimand is preferable for HTA, although the debate is nuanced and ongoing. 32 While there may be ways to obtain marginal estimates from conditional estimates, for example, G-computation–style marginalization, we encourage this as an avenue for future work.
Excess hazards from general population mortality were not included within the overall survival analysis as the motivation for these analyses was to generate BHMs for HIT. Constraints on the hazards from overall survival within the trials are generally added into the cost-effectiveness analysis itself so that survival could not exceed that of the age- and sex-matched general population.
Conclusions
HITs are a significant innovation in the treatment of cancer and present substantial methodological challenges when considering HTA and payer reimbursement. We have demonstrated that BHMs provide a suitable statistical framework to extrapolate time-to-event outcomes for HITs capturing heterogeneity between tumor sites.17,19 BHMs offer potential advantages in terms of reduced uncertainty around predicted mean survival time—a key component of cost-effectiveness analyses—through the borrowing of information between tumor sites in basket trials. Furthermore, while a defined list of histological sites is included for the population considered in this analysis, for indications that are truly “tumor agnostic” and do not restrict by the histological site at all, the BHM allows for the prediction of outcomes for as yet unrepresented tumor sites, which can be of real value. 20
Supplemental Material
sj-docx-1-mdm-10.1177_0272989X261434969 – Supplemental material for Extrapolation of Time-to-Event Survival Outcomes of Histology-Independent Therapies Using a Bayesian Hierarchical Model
Supplemental material, sj-docx-1-mdm-10.1177_0272989X261434969 for Extrapolation of Time-to-Event Survival Outcomes of Histology-Independent Therapies Using a Bayesian Hierarchical Model by Jan Mikelson, Richard Birnie, Grant McCarthy, Matthew Madin-Warburton, Ruifeng Xu, Justin Chumbley, Raquel Aguiar-Ibáñez, Mayur Amonkar, Gianluca Baio and Kate Young in Medical Decision Making
Footnotes
Acknowledgements
The authors would like to give special thanks to Jake Horgan and Greg Toreev (Lumanity, London, UK) for their editorial assistance and Matt Hemstock (Lumanity, London, UK) for his valuable support in generating the figures for publication. Special thanks also to Jean Muller (MSD, Zurich, Switzerland) for his adaptability, attention to detail, and R statistical programming expertise.
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: RA-I, KY, RX, JC, MA, and GM are employees of Merck Sharp & Dohme LLC, a subsidiary of Merck & Co., Inc., Rahway, NJ, USA, and own stock and/or hold stock options in Merck & Co., Inc., Rahway, NJ, USA. GB has no conflicts of interest to declare in relation to this work. JM has no conflicts of interest to declare in relation to this work. RB and MM-W are employees of Lumanity, which received consulting fees for this study from Merck Sharp & Dohme LLC, a subsidiary of Merck & Co., Inc., Rahway, NJ, USA (MSD). The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Financial support for this study was provided entirely by a contract with Merck Sharp & Dohme LLC, a subsidiary of Merck & Co., Inc., Rahway, NJ, USA (MSD). The funding agreement ensured the authors’ independence in designing the study, interpreting the data, writing, and publishing the report. The following author(s) are employed by the sponsor: GM, RX, JC, MA, KY, RAI. JM was employed by the sponsor at the time the study was completed.
Author Contributions
JM, GB, RA-I, RB, and KY contributed substantially to the concept and design of the BHMs, development of the analysis code, data analysis and interpretation, drafting of the manuscript, and critical revision of the manuscript for important intellectual content. RX and JC contributed substantially to the statistical analysis and to the critical revision of the manuscript for important intellectual content. JM, GM, MM-W, and RB contributed to the drafting of the manuscript, interpretation, and critical revision of the manuscript for important intellectual content. MA, RA-I, and KY contributed to the critical revision of the manuscript for important intellectual content.
Ethical Considerations
Not applicable.
Consent to Participate
Not applicable.
Patient Consent
Not applicable.
Consent for Publication
Not applicable.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
