Sage Journals: Discover world-class research

Abstract

Background

The purpose of external validation of a risk prediction model is to evaluate its performance before recommending it for use in a new population. Sample size calculations for such validation studies are currently based on classical inferential statistics around metrics of discrimination, calibration, and net benefit (NB). For NB as a measure of clinical utility, the relevance of inferential statistics is doubtful. Value-of-information methodology enables quantifying the value of collecting validation data in terms of expected gain in clinical utility.

Methods

We define the validation expected value of sample information (EVSI) as the expected gain in NB by procuring a validation sample of a given size. We propose 3 algorithms for EVSI computation and compare their face validity and computation time in simulation studies. In a case study, we use the non-US subset of a clinical trial to create a risk prediction model for short-term mortality after myocardial infarction and calculate validation EVSI at a range of sample sizes for the US population.

Results

Computation methods generated similar EVSI values in simulation studies, although they differed in numerical accuracy and computation times. At 2% risk threshold, procuring 1,000 observations for external validation, had an EVSI of 0.00101 in true-positive units or 0.04938 in false-positive units. Scaled by heart attack incidence in the United States, the population EVSI was 806 in true positives gained, or 39,500 in false positives averted, annually. Validation studies with >4,000 observations had diminishing returns, as the EVSIs were approaching their maximum possible value.

Conclusion

Value-of-information methodology quantifies the return on investment from conducting an external validation study and can provide a value-based perspective when designing such studies.

Highlights

In external validation studies of risk prediction models, the finite size of the validation sample leads to uncertain conclusions about the performance of the model. This uncertainty has hitherto been approached from a classical inferential perspective (e.g., confidence interval around the c-statistic).

Correspondingly, sample size calculations for validation studies have been based on classical inferential statistics. For measures of clinical utility such as net benefit, the relevance of this approach is doubtful.

This article defines the expected value of sample information (EVSI) for model validation and suggests algorithms for its computation. Validation EVSI quantifies the return on investment from conducting a validation study.

Value-based approaches rooted in decision theory can complement contemporary study design and sample size calculation methods in predictive analytics.

Keywords

risk prediction value of information uncertainty Bayesian statistics

Risk prediction models that quantify the probability of clinical events are a key aspect of individualized medicine. Once developed, these models are often used in diverse populations. Before being trusted for clinical use in a new population, a prediction model needs to undergo validation in a representative sample from that population.¹ For example, the original Framingham risk scores for cardiovascular disease risk are developed using data from the Framingham County in the United States but are externally validated for use in many different populations.²

During a validation study, the performance of the prediction model is assessed in terms of average prediction error, discrimination (e.g., c-statistic), calibration (e.g., calibration intercept and slope), and net benefit (NB).³ Among such metrics, NB is a decision-theoretic one, as it enables direct assessment of the clinical utility of a model (i.e., if a model’s expected NB (ENB) is higher than that of alternatives, it is expected to confer clinical utility). Due to such decision-theoretic underpinnings, and despite its relatively new arrival in the field of predictive analytics, NB has gained significant momentum and has become a standard component of modern external validation studies.⁴

When interpreting the results of a validation study, the finite size of the validation sample means that the assessment of model performance is accompanied by uncertainty. Uncertainty in conventional metrics of model performance is communicated using classical inferential methods (e.g., 95% confidence interval [CI] around the c-statistic or calibration slope). Similarly, sample size considerations when planning a validation study are based on inferential statistics.^5,6 However, the relevance of this approach to uncertainty around NB as a decision-theoretic metric is doubtful, as arbitrary significance levels or prespecified confidence bands do not necessarily translate to better decisions.^7–9

Decision theory provides an alternative view to the consequences of uncertainty by relating it to the outcome of decisions. A decision maker should adopt the strategy with the highest expected utility, which, if implementation costs are comparable, is the one with the highest ENB. While uncertainty around NB should not affect the adoption decision, it is associated with utility loss as it hinders our ability to identify the decision with the highest true NB. The extent of uncertainty should thus inform the research decision: whether the expected utility loss is large enough to necessitate collecting further evidence. This approach toward uncertainty quantification is referred to as value-of-information (VoI) analysis.^10–12

VoI methods are widely accepted in health policy making.^11,13 However, their relevance to clinical decision making and risk prediction has only recently been highlighted.¹⁴ In particular, the expected value of perfect information (EVPI), the expected gain in NB by completely eliminating uncertainty, has been applied to both the development and validation phases of risk prediction models.^14,15 However, to the best of our knowledge, the expected value of sample information (EVSI), the expected gain in NB by conducting a study of a given sample size, has not been defined and applied to clinical prediction models.

The aim of the present work is to define EVSI for the external validation studies of risk prediction models and propose algorithms for its computation. The rest of this article is structured as follows. After outlining the context, we define EVSI for a future validation study aimed at evaluating the clinical utility of a model in a new population. We propose different algorithms for EVSI computations, elaborate on their use case, and compare their performance in simulation studies. A case study puts the developments in context. We conclude by proposing areas of further inquiry.

Methods

Context

We focus on validating a previously developed risk prediction model in a new target population. The developments are presented for prediction models for binary responses but are generally applicable to any context where NB can be assessed (e.g., survival outcomes,¹⁶ models for treatment benefit¹⁷). A risk prediction model is advertised as a function that maps patient characteristics to an estimate of event risk. We have some current information about the performance of this model in the target population (for example, from expert opinion or from a pilot study) but are uncertain if using the model in this population is net beneficial. We are planning to obtain a random sample $D$ containing $N$ observations from this population to reduce such uncertainty. This sample is expected to improve our knowledge about model performance, which in turn will increase our chance of making the correct decision whether to use the model in this population. As such, procuring such a sample might be associated with NB gain, and we are interested in evaluating the expected gain in NB as a function of $N$ . We acknowledge the relevance of other sources of uncertainty (e.g., missing data, the plausibility that the sample is truly representative of the target population) but focus on sampling uncertainty—that is, our uncertainty in true NBs because of the finite size of the validation sample.

NB Calculations for Risk Prediction Models

Details of NB calculations for risk prediction models are explained in its original article and in many tutorials.^4,18,19 In brief, to turn a continuous predicted risk to a binary classification to inform a treatment decision, a decision maker needs to specify a risk threshold $z$ on predicted risks such that individuals classified as high risk are selected for the treatment (e.g., a 7.5%–10% threshold on the Framingham Risk Score is often used to initiate treatment with statins to reduce cardiovascular disease risk²⁰). Compared with treating no one (with a default utility of 0), the utility of this classification can be expressed as $p_{tp} - ω p_{fp}$ , where $p_{tp}$ and $p_{fp}$ are the probabilities of a true positive and a false positive classification, respectively, at this threshold, and $ω \geq 0$ represents the utility tradeoff (exchange rate) between a true-positive and a false-positive classification. Vickers and Elkin showed that the value of $ω$ can be deduced from the risk threshold itself¹⁸: if a decision maker is willing to treat those with risk $> z$ and not treat those with risk $< z$ , it means that individuals with a risk precisely equal to $z$ will be ambivalent about the treatment decision. This ambivalence indicates that in a population whose average outcome risk is $z$ , the expected utility of treating everyone and treating no one is the same to the decision maker. Because in this population, treating everyone corresponds to $p_{tp} = z$ and $p_{fp} = 1 - z$ , we have $ω = z / (1 - z)$ . For example, a 10% threshold on cardiovascular risk for initiating statin therapy means the benefit of treating 1 individual who will experience a cardiovascular event is considered equal to the harm of treating 9 individuals who will not experience such an event without treatment. This exchange rate enables calculating NB in either net true-positive or net false-positive units for a given threshold. In practice, the NB is calculated across a plausible range of thresholds.

It is often more intuitive to express $p_{tp}$ and $p_{fp}$ in terms of outcome prevalence and the model’s true sensitivity and specificity in the target population.²¹ Let $θ$ be the triplet of outcome prevalence ( $θ_{p}$ , a quantity that is independent of model performance), sensitivity ( $θ_{se}$ ), and specificity ( $θ_{sp}$ ). We have $p_{tp} = θ_{p} θ_{se}$ and $p_{fp} = (1 - θ_{p}) (1 - θ_{sp})$ (for brevity, we drop the notation that indicates sensitivity and specificity are functions of $z$ ).

Any given model should be compared with at least 2 “default” strategies of treating no one and treating all. Treating no one has NB = 0 by definition, and NB calculations for treating all follow the same logic as above but by considering all individuals as positive. Thus, we have 3 competing strategies: do not treat anyone, use the model to treat those with predicted risk $\geq z$ , and treat all. We index them by 0, 1, and 2, respectively. This results in the following equation for NBs:

\begin{matrix} NB (i, θ) = \\ {\begin{matrix} 0 & i = 0 & (treat no one) \\ θ_{p} θ_{se} - (1 - θ_{p}) (1 - θ_{sp}) \frac{z}{1 - z} & i = 1 & (use model to decide) \\ θ_{p} - (1 - θ_{p}) \frac{z}{1 - z} & i = 2 & (treat all) \end{matrix}, \end{matrix}

(1)

For time-to-event outcomes, $θ$ s and thus NBs are time dependent and should be defined at a time-horizon of interest.¹⁶ If there are alternative strategies, they can also be considered (the above indexing of NB() can accommodate other strategies), but this is omitted here for ease of exposition.

VoI Analysis

VoI analysis is a Bayesian approach and requires explicit specification of current information on the performance of the model in the target population. By default, we assume this information is generated from a preliminary validation exercise based on a random sample $d$ of size $n$ . However, such information can be generated in other ways. For example, one can extrapolate the performance of a model from its development sample or its performance in previous validation studies²² and accommodate any added uncertainty about the compatibility of the populations by quantitatively discounting external information.²³ Formal expert elicitation approaches can also be used to construct distributions around model performance metrics.²⁴ In such instances, $d$ can be an abstract entity representing, for example, the expert-elicited information.

The Expected Value of Current Information

Our current information about $θ$ based on the available evidence is expressed as $P (θ | d)$ . According to Bayes’s theorem, $P (θ | d) \propto P (θ) P (d | θ)$ , which indicates that this information is influenced by any prior information we might have had, as well as by $P (d | θ)$ , that is, the likelihood function, the support the observed data provide for a given value of $θ$ . With such current information, the decision maker should pick the strategy with the highest ENB given the current information:

EN B_{current} = \max_{i \in {0, 1, 2}} E_{θ | d} NB (i, θ),

(2)

where the expectation is with respect to $P (θ | d)$ .

EVPI

Details of the reasoning and algorithms behind calculating EVPI are presented previously.¹⁵ In brief, if we could know the true values of $θ$ , we would pick the strategy with the highest true NB. We indeed do not have access to the ground truth but have partial information in terms of $P (θ | d)$ . As such, we can model the consequence of knowing the truth (having perfect information) and take its expectation with respect to this distribution:

EN B_{truth} = E_{θ | d} [\max_{i \in {0, 1, 2}} NB (i, θ)] .

(3)

The EVPI is then the difference in ENB of having perfect information versus current information:

EVPI = EN B_{truth} - EN B_{current} .

(4)

The EVPI is a scalar quantity and quantifies the expected loss in NB due to uncertainty, or alternatively, the expected gain in NB by completely resolving uncertainty (i.e., conducting a validation study of infinite size). We have proposed a generic algorithm based on bootstrapping to estimate this expectation as well as an asymptotic approach based on the central limit theorem and 2-dimensional unit normal loss integral.¹⁵

EVSI

The reasoning behind the EVSI calculation is similar to that of EVPI, with the modification that instead of knowing the truth, we will have more (but not perfect) information about $θ$ , with additional information coming from obtaining a future sample $D$ . With this added information we will have the opportunity to revise our current decision. We will update our knowledge from $P (θ | d)$ to $P (θ | d, D)$ and will select the strategy that has the highest ENB.

The NB of this approach once future data are obtained is $\max_{i} E_{θ | D, d} NB (i, θ)$ . We do not know what data we will observe in the future, but we can treat $D$ as a random variable and take the expectation of this line of reasoning with regard to its predictive distribution (the distribution for future observations given current data and the model):

EN B_{sample} = E_{D | d} [\max_{i \in {0, 1, 2}} E_{θ | d, D} NB (i, θ)] .

(5)

The EVSI is the difference between this ENB and ENB under current information:

EVSI = EN B_{sample} - EN B_{current} .

(6)

The EVSI is a nonnegative scalar quantity in the same NB units as EVPI. It quantifies the expected gain in NB by procuring a validation sample of a given size. The higher the EVSI, the higher the expected gain from the planned validation study. EVPI, quantifying the ENB gain by completely resolving uncertainty, puts an upper limit on EVSI.

EVSI Computation Algorithms

The challenge for EVSI calculation is the 2 nested expectations in ${ENB}_{sample}$ . The inner expectation represents updated estimates of NBs after obtaining future data, and the outer expectation is with respect to the predictive distribution of future data. Below we detail 3 computation algorithms. Exemplary R implementations of the algorithms are provided in the Supplementary Material Section 1.

A Bootstrap-Based Algorithm

If the current evidence $d$ is in terms of a random sample of size $n$ from the target population (e.g., from a pilot study), EVSI computation can be performed via a 2-level resampling algorithm. This approach has been previously applied to EVSI computation for data-driven economic evaluations²⁵ and can be seen as a direct extension of the bootstrap method for EVPI computation for external validation studies.¹⁵ The logic behind this approach is provided in detail previously.²⁵ In a nutshell, let $D^{*}$ be the target population from which $d$ and $D$ are sampled. We take $D^{*}$ as a random entity and apply Bayes’s rule: $P (D^{*} | d) \propto P (D^{*}) P (d | D^{*})$ . Rubin showed that by using the improper prior $P (D^{*}) ~ Dirichlet (0, 0, \dots, 0)$ placed on all possible observations in $D^{*}$ , the posterior distribution $P (D^{*} | d)$ will be a discrete distribution that puts random weights $Dirichlet (1, 1, \dots, 1)$ on each observation in $d$ . Since any parameter of interest $θ$ is a function of $D^{*}$ , posterior draws of weights from the $Dirichlet (1, 1, \dots, 1)$ distribution produce draws from the posterior distribution of $P (θ | d)$ .²⁶ Further, once the generating population is at hand, the data of the future study ( $D$ ) can be drawn by sampling from this population. The resulting 2-level resampling algorithm is explained in Table 1.

Table 1

Bootstrap-Based Computation of EVSI

- Prerequisite: Obtain

d

, a pilot sample of size

n

consisting of predicted risks and observed responses from the target population.
1. For

j

= 1 to

M

(number of Monte Carlo simulations):
(a) Draw

D^{*}

, a Bayesian bootstrap^† of the data

d

, as a random draw from the posterior distribution of the population that has generated the sample. This is done by assigning random weights^‡

W^{*} ~ Dirichlet (1, 1, . . ., 1)

to each observation in

d

.
(b) Estimate

θ^{*}

, the true values of prevalence, sensitivity, and specificity in this iteration, from

D^{*}

.
(c) Calculate

{NB}_{0}^{*}

{NB}_{1}^{*}

, and

{NB}_{2}^{*}

, by plugging

θ^{*}

into equation 1. These are true NBs in this iteration. Record the maximum of true NBs:

{NB}_{truth}^{*} = \max {{NB}_{0}^{*}, {NB}_{1}^{*}, {NB}_{2}^{*}}

.
(d) Draw

D

, a random realization of the data of the future study, through sampling

N

observations with replacement from

D^{*}

.
(e) Create

D^{+} = d + D

, the pooled data of existing and future samples.
(f) Calculate

θ^{+}

from

D^{+}

, the updated estimates of prevalence, sensitivity, and specificity after obtaining the future sample.
(g) Calculate

{NB}_{0}^{+}

{NB}_{1}^{+}

, and

{NB}_{2}^{+}

, updated estimates of expected NBs after obtaining the future sample, by plugging

θ^{+}

into equation 1. Determine the strategy that has the highest expected future NB:

l = {\arg \max}_{i \in {0, 1, 2}} ({NB}_{i}^{+})

. Record the true NB of this strategy:

{NB}_{sample}^{*} = {NB}_{l}^{*}

.
- Next

j

.
2. Take the average of

N B^{*}

s from step 1c and pick the maximum value. This is

EN B_{current}

^§.
3. Average

{NB}_{truth}^{*}

s from step 1c. This is

EN B_{truth}

. From this subtract

EN B_{current}

. This is the EVPI.
4. Average

{NB}_{sample}^{*}

s from step 1g. This is

EN B_{sample}

. From this subtract

EN B_{current}

. This is the EVSI.

EVPI, expected value of perfect information; EVSI, expected value of sample information; ENB, expected net benefit; NB, net benefit.

†

Ordinary bootstrap can also be used; see text.

‡

One way of generating such wights is by generating $n$ exponential(1) random variables and scaling them to add up to 1. See the exemplary R code in Supplementary Material Section 1.

Because the NB terms are linear on the elements of $θ$ , $EN B_{current}$ can also be directly evaluated in the original sample without bootstrapping, by plugging in the point estimates of $θ$ into equation 1. However, we recommend averaging over bootstrapped values as the positive correlation between this term and the other term in EVPI and EVSI equations improves precision and prevents getting occasional negative value-of-information estimates.

The power of this approach is in the flexibility of the bootstrap method in accommodating different types of outcomes and practical considerations in validation studies. For example, if $d$ has a nontrivial level of missing data, which affects the amount of information it contains, step 1c in Table 1 can include an imputation component, generating the population $D^{*}$ without any missing values (and similarly, step 1d can introduce missingness in the future sample to simulate a real-world validation dataset).

The ordinary bootstrap, which assigns weights from a scaled multinomial (1,1,…,1). distribution to observations in $d$ , is conceptually similar to the Bayesian bootstrap and has been interpreted in a Bayesian way (e.g., in the approximate Bayesian bootstrap employed in missing value imputation²⁷). In data-driven VoI analyses, it generates similar results to the Bayesian bootstrap.²⁵ It can thus be used in lieu of the Bayesian bootstrap in step 1a.

A Fast Algorithm Based on Beta-Binomial Modeling for Binary Outcomes

In their proposal for a parametric Bayesian NB estimation for binary outcomes, Netto Flores Cruz and Korthauer²⁸ note that the likelihood function for $NB$ can be factorized into terms involving prevalence, sensitivity, and specificity. As such, they take advantage of the beta-binomial conjugacy and note that specifying prior information in terms of independent beta distributions on these quantities will result in corresponding posterior distributions that are also independent beta. In this case, given that $P (θ | d)$ is composed of 3 independent beta distributions, by the same token $P (θ | d, D)$ after a realization of $D$ will also be composed of 3 independent beta distributions. This results in a fast Monte Carlo approach for EVSI computation as explained in Table 2.

Table 2

Computation of EVSI Based on Beta-Binomial Modeling

- Prerequisite: Specify current information

P (θ | d)

θ_{p} ~ Beta (α_{p}, β_{p})

θ_{se} ~ Beta (α_{se}, β_{se})

θ_{sp} ~ Beta (α_{sp}, β_{sp})

.
1. For

j

= 1 to

M

(number of Monte Carlo simulations):
(a) Obtain

θ^{*}

, by random drawing from

P (θ | d)

. These are true

θ

s in this iteration.
(b) Calculate

{NB}_{0}^{*}

{NB}_{1}^{*}

, and

{NB}_{2}^{*}

, by plugging

θ^{*}

into equation 1. These are true NBs in this iteration. Record the maximum true NB:

{NB}_{truth}^{*} = \max {{NB}_{0}^{*}, {NB}_{1}^{*}, {NB}_{2}^{*}}

.
(c) Generate

D

, the sample of future study, defined by

{N_{tp}, N_{fn}, N_{tn}, N_{fp}}

given

θ^{*}

obtained in step 1a as:

N_{+} ~ Binomial (N, θ_{p}^{*})

(number of positive cases in the future sample),

N_{tp} ~ Binomial (N_{+}, θ_{se}^{*})

N_{fn} = N_{+} - N_{tp}

N_{tn} ~ Binomial (N - N_{+}, θ_{sp}^{*})

N_{fp} = N - N_{+} - N_{tn}

.
(d) Calculate

θ^{+}

, the updated estimates of prevalence, sensitivity, and specificity after obtaining the future sample:

θ_{p}^{+} = (α_{p} + N_{tp} + N_{fn}) / (α_{p} + β_{p} + N)

θ_{se}^{+} = (α_{se} + N_{tp}) / (α_{se} + β_{se} + N_{tp} + N_{fn})

θ_{sp}^{+} = (α_{sp} + N_{tn}) / (α_{sp} + β_{sp} + N_{tn} + N_{fp})

.
(e) Calculate

{NB}_{0}^{+}

{NB}_{1}^{+}

, and

{NB}_{2}^{+}

, updated estimates of NBs after obtaining the future sample, by plugging

θ^{+}

into equation 1. Determine the strategy that has the highest expected future NB:

l = {\arg \max}_{i \in {0, 1, 2}} ({NB}_{i}^{+})

. Record the true NB of this strategy:

{NB}_{sample}^{*} = {NB}_{l}^{*}

.
- Next

j

.
2. Take the average of

N B^{*}

s from step 1b and pick the maximum value. This is

EN B_{current}

.
3. Average

{NB}_{truth}^{*}

s from step 1b. This is

EN B_{truth}

. From this subtract

EN B_{current}

. This is EVPI.
4. Average

{NB}_{sample}^{*}

s from step 1e. This is

EN B_{sample}

. From this subtract

EN B_{current}

. This is EVSI.

EVPI, expected value of perfect information; EVSI, expected value of sample information; ENB, expected net benefit; NB, net benefit.

Independent beta distributions for each component of $θ$ can emerge, for example, if we ask an expert to express their belief about prevalence, sensitivity, and specificity, in terms of fractions. That is, if their best estimate of prevalence is 30%, and their uncertainty is akin to their opinion coming from a sample of 10 individuals, the information can be expressed as $θ_{p} ~ Beta (3, 7)$ (similar for sensitivity and specificity).

Importantly, for binary outcomes and in the absence of missing data, the Bayesian bootstrap approach explained previously reduces to this beta-binomial model. This is because the information in a sample $d$ for estimating NBs can be fully specified via 4 quantities: the number of true positives ( $n_{tp}$ ), false negatives ( $n_{fn}$ ), true negatives ( $n_{tn}$ ), and false positives ( $n_{fp})$ . Due to the aggregate property of the Dirichlet distribution,²⁹ the $Dirichlet (0, 0, . . ., 0)$ prior used in the Bayesian bootstrap is equal to assigning $Beta (0, 0)$ priors to $θ_{p}$ , $θ_{se}$ , and $θ_{sp}$ , resulting in the posterior distribution for the population that can be specified as $(p_{tp}, p_{fn}, p_{tn}, p_{fp}) ~ Dirichlet (n_{tp}, n_{fn}, n_{tn}, n_{fp})$ . As such, the beta-binomial algorithm will be equal to the Bayesian bootstrap and can be used instead for VoI computations due to its computational efficiency.

Yet another utility of this approach is that instead of the improper prior employed in the Bayesian bootstrap, the user can specify informative priors. Imagine in the current sample $d$ of size $n$ , there are $m$ individuals who have experienced the outcome. If an independent cross-sectional study in the same target population has estimated outcome prevalence that can be expressed as $Beta (α, β)$ , our current information about prevalence can be expressed as $θ_{p} ~ Beta (α + m, β + n - m)$ . Even in the absence of external information, weakly informative priors can be employed to rectify some of the previously identified problems with bootstrap-based VoI calculations at extreme situations. In small validation samples and extreme thresholds, VoI calculations might generate counterintuitive results.¹⁵ One instance is when sample estimates of $θ_{se}$ or $θ_{sp}$ are at their extreme (0 or 1). The improper $Beta (0, 0)$ prior on these quantities results in improper beta posteriors with 1 of their 2 parameters being 0. This can be resolved if one uses a flat $Beta (1, 1)$ prior for these quantities, as noted by Netto Flores Cruz et al²⁸.

A General Algorithm for Arbitrary Specification of $P (θ | d)$

In many practical situations, our current information $P (θ | d)$ may not have a simple mathematical form, but one can still draw a Monte Carlo sample of arbitrary size from its distribution. Examples include when $P (θ_{p})$ is constructed from a meta-analysis of prevalence studies, or $P (θ_{se}, θ_{sp})$ from a bivariate meta-analysis of the diagnostic accuracy of the test, giving rise to a joint distribution for $P (θ_{se}, θ_{sp})$ .³⁰ Markov chain Monte Carlo methods can be used to obtain samples from the joint distribution of $θ$ , even though the mathematical form of the joint density might be intractable (an example is the case studies in Wynants et al.²¹).

For such a general case in which we have a sample from $P (θ | d)$ , the empirical distribution of the sample can be used as the distribution for the current information. We note that $P (θ | d, D) \propto P (θ) P (d, D | θ) = P (θ) P (d | θ) P (D | θ) \propto P (θ | d) P (D | θ)$ . As such, the posterior distribution of $θ$ after observing the future sample can be approximated by the discrete distribution that puts a mass $w^{(i)} \propto P (D | θ^{(i)})$ on the $i$ th observation of the sample from $P (θ | d)$ . This will result in the general algorithm explained in Table 3.

Table 3

EVSI Computation for Binary Outcomes for General, Nonconjugate Distributions

- Prerequisite: Specify current information as arbitrary distribution

P (θ | d)

. Obtain

M

draws

(θ^{(1)}, θ^{(2)}, . . ., θ^{(M)})

from

P (θ | d)

.
1. For

j

= 1 to

M

(number of Monte Carlo simulations):
(a) Let

θ^{*} = θ^{(i)}

. This is true

θ

in this iteration.^†
(b) Calculate

{NB}_{0}^{*}

{NB}_{1}^{*}

, and

{NB}_{2}^{*}

, by plugging

θ^{*}

into equation 1. These are true NBs in this iteration. Record the maximum true NB:

{NB}_{truth}^{*} = \max {{NB}_{0}^{*}, {NB}_{1}^{*}, {NB}_{2}^{*}}

.
(c) Generate

D

, the sample of future study, defined by

{N_{tp}, N_{fn}, N_{tn}, N_{fp}}

as follows:

N_{+} ~ Binomial (N, θ_{p}^{*})

(number of positive cases in the future sample),

N_{tp} ~ Binomial (N_{+}, θ_{se}^{*})

N_{fn} = N_{+} - N_{tp}

N_{tn} ~ Binomial (N - N_{+}, θ_{sp}^{*})

N_{fp} = N - N_{+} - N_{tn}

.
(d) Update the distribution of

θ

given the future sample. This is done by updating the weights

w^{(k)}

based on the likelihood

P (D | θ)

. For binary outcomes the weights are:

w^{(k)} \propto P_{Bin} (N_{tp} + N_{fn}, N, θ_{p}^{(k)}) \times P_{Bin} (N_{tp}, N_{tp} + N_{fn}, θ_{se}^{(k)}) \times P_{Bin} (N_{tn}, N_{tn} + N_{fp}, θ_{sp}^{(k)})

,
with

P_{Bin} (x, m, p)

being the probability mass function of the binomial distribution at value

x

for

m

trials and success probability

p

.
(e) Calculate

{NB}_{0}^{+}

{NB}_{1}^{+}

, and

{NB}_{2}^{+}

, updated estimates of expected NBs after obtaining the future sample, by using the weights generated in the previous step as follows:

{NB}_{i}^{+} = {\begin{matrix} 0 & i = 0 \\ \sum_{k = 1}^{M} w^{(k)} {θ_{p}^{(k)} θ_{se}^{(k)} - (1 - θ_{p}^{(k)}) (1 - θ_{sp}^{(k)}) \frac{z}{1 - z}} / \sum_{k = 1}^{M} w^{(k)} & i = 1 \\ \sum_{k = 1}^{M} w^{(k)} {θ_{p}^{(k)} - (1 - θ_{p}^{(k)}) \frac{z}{1 - z}} / \sum_{k = 1}^{M} w^{(k)} & i = 2 \end{matrix} .

Determine the strategy that has the highest expected NB in this pooled sample:

l = {\arg \max}_{i \in {0, 1, 2}} ({NB}_{i}^{+})

. Record the true NB of this strategy:

{NB}_{sample}^{*} = {NB}_{l}^{*}

.
- Next

j

.
2. Average

{NB}_{truth}^{*}

s from from step 1b. This is

EN B_{truth}

. Subtract from this

EN B_{current}

. This is EVPI.
3. Average

{NB}_{sample}^{*}

s from step 1e. This is

EN B_{sample}

. Subtract from this

EN B_{current}

. This is EVSI.

EVPI, expected value of perfect information; EVSI, expected value of sample information; ENB, expected net benefit; NB, net benefit.

†

If the size of the sample is large, it might not be computationally feasible to loop over all observations. In which case one can pick one of $θ^{(i)} s$ at random in this step.

Case Study

We used data from GUSTO-I, a clinical trial of multiple thrombolytic strategies for acute myocardial infarction (AMI), as our case study.³¹ This dataset has been widely used for methodological research in predictive analytics.^32–34 All analyses are conducted in R,³⁵ with a fast implementation of the general EVSI algorithm in C++ (please refer to the evsiexval R package for details).

For the present case study, we use a similar setup as our previous work, with a risk prediction model for 30-day mortality developed using the non-US sample of this dataset and validated in the US subsample. The risk prediction model is a logistic regression model fitted to the entire non-US subset (sample size 17,796). We initially assume we have access to only $n = 500$ individuals from the US subsample as the source of current information. We randomly selected 500 individuals, without replacement, from the entire 23,034 observations in the US subsample of GUSTO-I. Later, we will use more observations from the US subsample to study how EVSI behaves with the amount of current information. We consider the range of relevant thresholds to be 1% to 10%, with 1%, 2%, 5%, and 10% being of primary interest. VoI computations are based on 10⁶ Monte Carlo simulations using the aggregate beta-binomial model, with a flat $Beta (1, 1)$ prior on each component of $θ$ .

Using this setup, we performed 2 simulation studies to explore the face validity of the proposed algorithms. In the first set, we aimed at comparing the numerical stability and computational time of the 3 algorithms. In the second set, we explored how EVSI changes as a function of the amount of current information, represented by $n$ , the size of the current validation data from which $P (θ | d)$ is constructed.

Results

The prevalence of the outcome in the development sample (entire non-US sub-sample) was 0.0723; in the current validation sample it was 0.0860, while in the entire US sub-sample it was 0.0679. The risk prediction model was of the form:

\begin{matrix} logit (P (Y = 1)) = - 2.0842 + 0.0781 [age] + 0.4027 \\ [AMI location other (vs . inferior)] + 0.5773 \\ [AMI location anterior (v . inferior)] + 0.4678 \\ [previous AMI] + 0.7666 [AMI severity (Killip score)] \\ - 0.0775 [\min (blood pressure, 100)] + 0.0182 [pulse] . \end{matrix}

The decision curve depicting the NB of the model alongside alternative strategies is presented in Figure 1. Panel (a) shows the NB, and panel (b) shows the difference between the NB of the model and the best default strategy ( $N B_{1} - \max (N B_{0}, N B_{2})$ ) along with its 95% CI (based on the percentile methods using 10,000 bootstraps). The model had higher ENB than the alternative strategies at the entire range of thresholds of interest.

Figure 1

Decision curve (a) and incremental net benefit of the model versus the best alternative strategy (b). (a) Solid blue: use the model; dashed gray: treat all; dotted gray: treat none. (b) Dashed lines are 95% confidence intervals.

Among 500 observations in the current sample, 43 experienced the outcome. Given the prior $Beta (1, 1)$ , this will result in $θ_{p} ~ Beta (44, 458)$ . The distribution of sensitivity and specificity depends on the threshold. Taking $z = 0.02$ as an example, given that 41 of the 43 individuals who experienced the outcome had a predicted risk $\geq z$ , and 147 of the 457 who did not experience the outcome had predicted risk $< z$ , with a $Beta (1, 1)$ prior, we have $θ_{se} ~ Beta (42, 3)$ and $θ_{se} ~ Beta (148, 311)$ .

The EVPI curve is presented in Figure 2. The EVPI at 0.01 threshold was 0.00037. At the 0.02 threshold it was 0.00125. The EVPI was 0 at 0.05 and 0.10 thresholds (so EVSI for any $N$ will also be 0). We therefore focus on VoI calculations at 0.01 and 0.02 thresholds. EVPI quantifies the ENB loss per decision (every time the model might be used). The overall NB loss due to uncertainty is therefore affected by the expected number of times the medical decision will be made.³⁶ In scaling VoI quantities to the population, we apply a similar approach as in our previous work¹⁵: there are about 800,000 AMIs in the United States every year,³⁷ and a national guideline development panel in charge of recommending risk stratification for AMI management can consider all those instances relevant. The scaled VoI values are provided in true-positive units on the second y-axis of the EVPI figure. At the 0.02 threshold, a future validation study can have a maximum benefit equal to gaining 1,004 net true-positive cases (individuals who will die within 30 d and are correctly identified as high risk) per year, or avoiding 49,188 extra false-positive cases per year (individuals who will not die within 30 d but are identified as high risk).

Figure 2

The expected value of perfect information (EVPI) of the validation sample.

The EVSI curves are shown in Figure 3. Generally, it is expected that EVSI will increase with a larger future study and will asymptote to EVPI, a pattern that is obvious for both thresholds. The “diminishing return” pattern is also clear: the ENB gain steeply rises at small samples and plateaus as $N$ grows, leaving little room for gaining NB beyond $N$ = 4,000. Similar to EVPI, population scaling is applied to EVSIs and is presented as the second y-axis of the EVSI curve. At the 0.02 threshold, a future study of size $N = 1, 000$ has a per-decision EVSI value of 0.00101, corresponding to a population value of 806 in true-positive cases gained or 39,500 in false-positive cases averted.

Figure 3

The expected value of sample information (EVSI) for the case study. Red: EVSI; blue (horizontal): the expected value of perfect information (EVPI). Dashed lines: $z$ (risk threshold) = 0.01; solid lines: $z$ = 0.02.

Supplementary Material Section 2 provides results of the simulations comparing the 3 algorithms. All algorithms generated comparable VoI values and demonstrated the expected monotonicity behavior: the higher the $N$ , the higher the EVSI, with EVSI asymptoting toward EVPI. The precision for the bootstrap-based and beta-binomial approaches was excellent with 10⁶ simulations (coefficient of variation [CV] of the Monte Carlo error <1%), with the latter being >30 times faster. On the other hand, the general algorithm had unacceptable variation (CV > 30%) when the size of the sample from $P (θ | d)$ was 100, and the CV was still >10% with 1,000 samples, indicating that even larger samples might be required for reliable results. Supplementary Material Section 3 provides the results of brief simulation studies that show how the EVSI changes with the amount of current information. As expected, the higher the amount of current information (the larger the $n$ ), the lower the gain in NB by conducting future studies. When current evidence is weak, the initial gain in NB by conducting future research can be very large. This is not the case when the current information is relatively robust (e.g., based on n = 8,000 observations, which includes on average 544 events).

Discussion

Evaluating the performance of a model in a finite sample is fraught with uncertainty. In this work, we applied the VoI framework to investigate the decision-theoretic consequences of such uncertainty. We defined validation EVSI as the expected gain in NB by obtaining an external validation sample of a given size from the target population of interest. In a case study we showed the feasibility of EVSI calculations and studied how EVSI is affected by the amount of current information and the sample size of the future study. We suggested scaling the EVSI to the population to quantify the overall gain in clinical utility in true- (or false-) positive units from conducting an external validation study of a given sample size.

We proposed 3 algorithms for EVSI computations (with implementation in R as part of the evsiexval package: https://github.com/resplab/evsiexval). The bootstrap-based algorithm is applicable when previous individual-level data (e.g., from a pilot validation study) are at hand. The main advantage of this algorithm is in its flexibility, for example, in mimicking expected patterns of missingness in data. Among the 3 algorithms, this algorithm is the one that can most readily be extended to survival outcomes (similar to the extension of the bootstrap method for inference around NB¹⁶). However, this algorithm is applicable only when individual-level data are available and can be slow. The beta-binomial algorithm is applicable when current evidence on model performance can be expressed as independent beta distributions for prevalence, sensitivity, and specificity and is the fastest of the 3 algorithms. We showed that for binary outcomes and in the absence of censoring, this algorithm is equal to the Bayesian bootstrap (with the added flexibility that prior information can be incorporated). The general algorithm is the most versatile one as it works with any joint distribution of $θ$ as long as one can obtain random draws from it. As such, this algorithm can be useful when $P (θ | d)$ does not have a tractable form. However, the finite size of the sample adds another layer of uncertainty, potentially requiring significantly more computation time to achieve the same numerical accuracy of the other 2 algorithms.

How can EVSI analysis inform study design in predictive analytics? Conducting a validation study is an investment in resources that will generate further information on the performance of a clinical prediction model in a target population. The decision whether to undertake a validation study should ultimately hinge on whether the information gained from such a study is worth the required investments. The EVSI, when scaled to the population, determines the expected return on investment in NB unit. Ultimately, such return should be contrasted against the efforts and resources required for such a study. In decision-analytic (health policy) modeling, EVSI is typically in net monetary units, and when scaled to population, can be compared with the budget of a planned data collection activity. The optimal sample size will be one that maximizes the difference between population EVSI and study costs.³⁶ The NB for risk prediction models, on the other hand, is in net true- or false-positive units and as such cannot directly be compared with the budget of a validation study. One can always embark on full decision analysis to translate all outcomes to net monetary units, but this will likely require sophisticated decision modeling and context-specific assumptions on long-term outcomes, a process that might take a significant amount of time and require further data collection (e.g., to obtain utility weights for outcomes). To us, a main reason for the vast popularity of decision curve analysis is that it provides an assumption-free, reproducible method for NB calculation based on the very same data that are used for studying model calibration and discrimination. We think VoI analysis during model development and validation should generally keep the same spirit. A full decision analysis should be relegated to after an impact analysis has measured the resource-use implications of implementing the model. This, however, means the decision rules for determining the optimal sample size of development and validation studies based on the VoI framework would be different than those used in health policy analysis. This article deliberately stayed away from proposing such rules and instead focused on defining concepts and proposing computation methods for EVSI. Proposing such decision rules is detached from EVSI calculations and deserves its own airing.

There are multiple areas of further inquiry. The EVSI framework should also be applied to the development phase of prediction models. This can guide the investigator on whether further development, or moving to validation, should be prioritized. We mainly focused on NB loss due to sampling uncertainty. However, there are several sources of uncertainty, such as whether our existing information on the performance of the model is directly applicable to the target population, or if predictors and outcome are measured with the same quality between the study and usual practice. The comparative statistical and computational performance of the EVSI computations algorithms, and the adequacy of a given number of simulations for each algorithm, should be evaluated in dedicated studies. We also do not claim the algorithms we proposed are the only ones that can be used for EVSI computations. Other algorithms, such as those based on the central limit theorem,^38,39 can prove useful. VoI analysis in decision modeling has received a significant boost in computational speed in recent years due to the arrival of algorithms based on nonparametric regression modeling.^40,41 This approach can facilitate VoI analysis in risk prediction as well. The EVSI defined in this work is for a single, homogeneous target population and does not consider heterogeneous settings. VoI metrics and corresponding computation algorithms for multicenter studies should be developed separately. Further, during external validation, often a secondary aim is to update the model if its performance turns out to be suboptimal.⁴² Such model revision can take different levels of complexity,⁴³ and one might be interested in the expected yield of a given sample size for such model revision. This will get connected to the VoI concepts for model development. As stated earlier, how VoI metrics for prediction models should inform objective functions for determining the optimal sample size should be debated by the community in the hope of generating consensus and best practice standards.

There is an ongoing debate on the appropriateness of conventional metrics of uncertainty when reporting the NB of a clinical prediction model.^7–9 The same concerns can logically be extended to the frequentist method for sample size and power calculations around NB.^5,6 VoI methodology provides a rigorous, utilitarian response to such controversies. The toolbox of VoI methods for clinical prediction models is growing, and perhaps it is time to formalize the role of VoI in uncertainty quantification and design of empirical studies in predictive analytics.

Supplemental Material

sj-docx-1-mdm-10.1177_0272989X251314010 – Supplemental material for Expected Value of Sample Information Calculations for Risk Prediction Model Validation

Supplemental material, sj-docx-1-mdm-10.1177_0272989X251314010 for Expected Value of Sample Information Calculations for Risk Prediction Model Validation by Mohsen Sadatsafavi, Andrew J. Vickers, Tae Yoon Lee, Paul Gustafson and Laure Wynants in Medical Decision Making

Footnotes

Acknowledgements

We would like to thank members of the RESP lab for their insightful comments on earlier drafts.

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Financial support for this study was provided in part by the Canadian Institutes of Health Research with a Team Grant to the University of British Columbia (PHT 178432) and by the National Institutes of Health/National Cancer Institute (NIH/NCI) with a Cancer Center Support Grant to Memorial Sloan Kettering Cancer Center (P30 CA008748). The funding agreements ensured the authors’ independence in designing the study, interpreting the data, writing, and publishing the report.

Ethics Statement

Ethics approval was not required as the empirical component of this study was based on anonymized, publicly available data (the GUSTO-I trial).

Data-Sharing Statement

All the code and data from this program are publicly available from .

ORCID iDs

Mohsen Sadatsafavi

Andrew J. Vickers

Tae Yoon Lee

Paul Gustafson

References

Riley

Archer

Snell

KIE

, et al. Evaluation of clinical prediction models (part 2): how to undertake an external validation study. BMJ. 2024;384:e074820.

Damen

Pajouheshnia

Heus

, et al. Performance of the Framingham risk models and pooled cohort equations for predicting 10-year risk of cardiovascular disease: a systematic review and meta-analysis. BMC Med. 2019;17:109.

Steyerberg

Vergouwe

. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J. 2014;35:1925–31.

Fitzgerald

Saville

Lewis

. Decision curve analysis. JAMA. 2015;313:409–10.

Riley

Snell

KIE

Archer

, et al. Evaluation of clinical prediction models (part 3): calculating the sample size required for an external validation study. BMJ. 2024;384:e074821.

Riley

Debray

TPA

Collins

, et al. Minimum sample size for external validation of a clinical prediction model with a binary outcome. Stat Med. 2021;40:4230–51.

Kerr

Marsh

Janes

. The importance of uncertainty and opt-in v. Opt-out: best practices for decision curve analysis. Med Decis Making. 2019;39:491–2.

Capogrosso

Vickers

. A systematic review of the literature demonstrates some errors in the use of decision curve analysis but generally correct interpretation of findings. Med Decis Making. 2019;39:493–8.

Vickers

Van Claster

Wynants

Steyerberg

. Decision curve analysis: confidence intervals and hypothesis testing for net benefit. Diagn Progn Res. 2023;7:11.

10.

Felli

Hazen

. Sensitivity analysis and the expected value of perfect information. Med Decis Making. 1998;18:95–109.

11.

Jackson

Baio

Heath

Strong

Welton

Wilson

ECF

. Value of information analysis in models to inform health policy. Annu Rev Stat Appl. 2022;9:95–118.

12.

Heath

Kunst

Jackson

. Value of Information for Healthcare Decision-Making. 1st ed. Boca Raton (FL): Chapman; Hall/CRC, 2024.

13.

Tuffaha

Gordon

Scuffham

. Value of information analysis in healthcare: a review of principles and applications. J Med Econ. 2014;17:377–83.

14.

Sadatsafavi

Yoon Lee

Gustafson

. Uncertainty and the value of information in risk prediction modeling. Med Decis Making. 2022;42(5):661–71.

15.

Sadatsafavi

Lee

Wynants

Vickers

Gustafson

. Value-of-information analysis for external validation of risk prediction models. Med Decis Making. 2023;43:564–75.

16.

Vickers

Cronin

Elkin

Gonen

. Extensions to decision curve analysis, a novel method for evaluating diagnostic tests, prediction models and molecular markers. BMC Med Inform Decis Mak. 2008;8:53.

17.

Vickers

Kattan

Daniel

. Method for evaluating prediction models that apply the results of randomized trials to individual patients. Trials. 2007;8:14. DOI: 10.1186/1745-6215-8-14

18.

Vickers

Elkin

. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006;26:565–74.

19.

Sadatsafavi

Adibi

Puhan

Gershon

Aaron

Sin

. Moving beyond AUC: decision curve analysis for quantifying net benefit of risk prediction models. Eur Respir J. 2021;58:2101186.

20.

Lloyd-Jones

Braun

Ndumele

, et al. Use of risk assessment tools to guide decision-making in the primary prevention of atherosclerotic cardiovascular disease: a special report from the American Heart Association and American College of Cardiology. J Am Coll Cardiol. 2019;73:3153–67.

21.

Wynants

Riley

Timmerman

Van Calster

. Random-effects meta-analysis of the clinical utility of tests and prediction models. Stat Med. 2018;37:2034–52.

22.

Glynn

Nikolaidis

Jankovic

Welton

. Constructing relative effect priors for research prioritization and trial design: a meta-epidemiological analysis. Med Decis Making. 2023;43:553–63.

23.

Ibrahim

Chen

M-H

Gwon

Chen

. The power prior: theory and applications. Stat Med. 2015;34:3724–49.

24.

O’Hagan

. Uncertain Judgements: Eliciting Experts’ Probabilities. London: Wiley; 2006.

25.

Sadatsafavi

Marra

Bryan

. Two-level resampling as a novel method for the calculation of the expected value of sample information in economic trials. Health Econ. 2013;22:877–82.

26.

Rubin

. The Bayesian bootstrap. Ann Statist. 1981;9:130–4. DOI: 10.1214/aos/1176345338

27.

Schafer

. Multiple imputation: a primer. Stat Methods Med Res. 1999;8:3–15.

28.

Netto Flores Cruz

Korthauer

. Bayesian decision curve analysis with bayesdca. Stat Med. 2024;43(30):6042-6058.

29.

Broderick

Jordan

Pitman

. Beta processes, stick-breaking and power laws. Bayesian Anal. 2012;7(2):439–76. DOI: 10.1214/12-BA715

30.

Fine

. Assessing the dependence of sensitivity and specificity on prevalence in meta-analysis. Biostatistics. 2011;12:710–22.

31.

GUSTO Investigators. An international randomized trial comparing four thrombolytic strategies for acute myocardial infarction. N Engl J Med. 1993;329:673–82.

32.

Ennis

Hinton

Naylor

Revow

Tibshirani

. A comparison of statistical learning methods on the gusto database. Stat Med. 1998;17:2501–8.

33.

Steyerberg

Eijkemans

MJC

Boersma

Habbema

. Applicability of clinical prediction models in acute myocardial infarction: a comparison of traditional and empirical bayes adjustment methods. Am Heart J. 2005;150:920.

34.

Steyerberg

Eijkemans

MJC

Boersma

Habbema

. Equally valid models gave divergent predictions for mortality in acute myocardial infarction patients in a comparison of logistic [corrected] regression models. J Clin Epidemiol. 2005;58:383–90.

35.

R Core Team. R: A Language and Environment for Statistical Computing. Vienna (Austria): R Foundation for Statistical Computing; 2019. Available from: https://www.R-project.org/. [Accessed 19 November 2024].

36.

Willan

Goeree

Boutis

. Value of information methods for planning and analyzing clinical studies optimize decision making and research planning. J Clin Epidemiol. 2012;65:870–6.

37.

Virani

Alonso

Aparicio

, et al. Heart disease and stroke statistics-2021 update: a report from the American Heart Association. Circulation. 2021;143:e254–e743.

38.

Marsh

Janes

Pepe

. Statistical inference for net benefit measures in biomarker validation studies. Biometrics. 2020;76:843–52.

39.

Yoon Lee

Gustafson

Sadatsafavi

, Canadian Respiratory Research Network. Closed-form solution of the unit normal loss integral in 2 dimensions, with application in value-of-information analysis. Med Decis Making. 2023;43:621–6.

40.

Heath

Manolopoulou

Baio

. A review of methods for analysis of the expected value of information. Med Decis Making. 2017;37:747–58.

41.

Heath

Kunst

Jackson

, et al. Calculating the expected value of sample information in practice: considerations from 3 case studies. Med Decis Making. 2020;40:314–26.

42.

Steyerberg

Borsboom

van Houwelingen

Eijkemans

Habbema

. Validation and updating of predictive logistic regression models: a study on sample size and shrinkage. Stat Med. 2004;23:2567–86.

43.

Vergouwe

Nieboer

Oostenbrink

, et al. A closed testing procedure to select an appropriate method for updating prediction models. Stat Med. 2017;36:4529–39.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.11 MB

Expected Value of Sample Information Calculations for Risk Prediction Model Validation

Abstract

Background

Methods

Results

Conclusion

Highlights

Keywords

Methods

Context

NB Calculations for Risk Prediction Models

VoI Analysis

The Expected Value of Current Information

EVPI

EVSI

EVSI Computation Algorithms

A Bootstrap-Based Algorithm

A Fast Algorithm Based on Beta-Binomial Modeling for Binary Outcomes

A General Algorithm for Arbitrary Specification of P ( θ | d )

Case Study

Results

Discussion

Supplemental Material

sj-docx-1-mdm-10.1177_0272989X251314010 – Supplemental material for Expected Value of Sample Information Calculations for Risk Prediction Model Validation

Footnotes

Acknowledgements

Ethics Statement

Data-Sharing Statement

ORCID iDs

References

Supplementary Material

A General Algorithm for Arbitrary Specification of $P (θ | d)$